Re: Sample of symbols useful in Classics (was: Apple's Unicode)

From: Ronald S. Wood (wood@cs.dal.ca)
Date: Wed Aug 14 1996 - 08:59:57 EDT


Otto,

Thank you very much for your reply.

On Wed, 14 Aug 1996, Otto Stolz wrote:
> On Mon Aug 12, 17:07, Ronald S. Wood <wood@cs.dal.ca> has asked for the
> ISO 10646 / Unicode codings for some symbols used in classics.
>
> Unfortunately, the Unicode.COM listserver has omitted the MIME headers, so
> I had to guess, and manually insert, them in order to view the GIF attach-
> ment.

Yes, the server managed to mangle the MIME. Hopefully, the new server
will correct this. On the other hand, I was warned not to send
(long) attachments by the list keeper...

> I think that the symbols used in Classics have not been considered when
> Unicode has been defined, yet Unicode /ISO 10646 comprises characters
> suitable for some of them.
>
> Metrical symbols
> ----------------
>
> short 02D8 BREVE (sub SPACING MODIFIER LETTERS)
> long 00AF MACRON (sub LATIN-1 SUPPLEMENT)
> or 02C9 MODIFIER LETTER MACRON (sub SPACING MODIFIER
> LETTERS)
> or 203E OVERLINE (sub GENERAL PUNCTUATION)
> period end 2016 DOUBLE VERTICAL LINE (sub GENERAL PUNCTUATION)=
>

Yes, these are the easy cases. But should there not be a functional
separation of metrical signs and modifier letters?

>
> The following symbols could also be encoded by multiple characters:
> period end 2 =D7 007C VERTICAL LINE (sub BASIC LATIN)
> strophe end 3 =D7 007C VERTICAL LINE (sub BASIC LATIN)

Maybe, but it might be useful to have them as single characters to
facilitate the processing of metrical information, as presumably IPA
symbols can be used in processing (i.e. generating speech synthesis).

> usually long 02D8 BREVE (sub SPACING MODIFIER LETTERS)
> + 0332 COMBINING LOW LINE (sub COMBINING DIACRITICAL =
> MARKS)
> usually short 02D8 BREVE (sub SPACING MODIFIER LETTERS)
> + 0305 COMBINING OVERLINE (sub COMBINING DIACRITICAL =
> MARKS)
> contracted biceps 2025 TWO DOT LEADER (sub GENERAL PUNCTUATION)
> + 0332 COMBINING LOW LINE (sub COMBINING DIACRITICAL =
> MARKS)
>
> However, the vertical alignment of the metrical symbols (except the end
> symbols) may not be as expected.

Yes, combining spacing and non-spacing characters to compose characters
is possible, but my impression is that Unicode does have a lot of
characters that are composites of several characters but given a single
codepoint. Other objections as above.

> I have not found suitable encodings for
> resolvable long
> contractible biceps

They, too, could perhaps be constructed from other characters. These were
composed with four characters in the font I was using (2x [spacing
char + non-spacing char]).

>
> Signa textui inserta
> --------------------
>
> Are these widely agreed characters, at all? I guess, they are chosen ad hoc,
> just to be discernible within the limited context of a single edition. If
> so, any set of discernable character would fit in.

No, these characters are used only in this edition, as far as I know, but
it is very common to use these _types_ of symbols in Bible editions to
indicate variants (there are a lot of differing textual variants for the
original Greek New Testament). I included them because the average
computer user, say in the US, is not likely to use too many unusual
characters outside of Latin 1, except perhaps for Bible studies, which
are very common. There is a relatively large number of (small) companies
that provide electronic versions of the Bible. They would presumably want
to have this _kind_ of characters in order that they may provide their
product in Unicode form. Otherwise, they will stick with some proprietary
font and encode in 8 bits.

The proprietary nature and proliferation of idiosyncratic fonts is
supposed to be solved by Unicode, no?

> Nevertheless, some of the symbols used by the Deutsche Bibelgesellschaft =
> can
> be coded in Unicode / ISO 10646, viz.:
> alfa 03B1 GREEK SMALL LETTER ALPHA (sub BASIC GREEK)
> or 237A APL FUNCTIONAL SYMBOL ALPHA (sub MISCELLANEOUS TECHNI=
> CAL)

I only included lower case alpha to indicate that the following
characters were superscript. 0x03B1 is the only appropriate encoding for the
Greek language.

> circle 25CB WHITE CIRCLE (sub GEOMETRIC SHAPE)
> or 25E6 WHITE BULLET (sub GEOMETRIC SHAPE)
> or 25EF LARGE CIRCLE (sub GEOMETRIC SHAPE)
> box 25A1 WHITE SQUARE (sub GEOMETRIC SHAPE)
> or 25AB WHITE SMALL SQUARE (sub GEOMETRIC SHAPE)
> or BALLOT BOX (sub MISCELLANEOUS SYMBOLS)
> backslash 005C REVERSE SOLIDUS (sub BASIC LATIN)
> or 2216 SET MINUS (sub MATHEMATICAL OPERATORS)

I think I saw some of these in Michael Everson's fonts. They are,
however, not superscript, if even they are good fits. Since they are
intended for mathematics, I wonder if they should be recommended for
those uses alone, just as APL alpha is for APL programming and Greek
alpha is for languages using that alphabet.

> =46rom the picture supplied, I could not quite recognize the other symbol=
> s,
> perhaps:
> T-like 22A4 DOWN TACK (sub MATHEMATICAL OPERATORS)
> Gamma-like 2308 LEFT CEILING (sub MISCELLANEOUS TECHNICAL)
> (-like 2320 TOP HALF INTEGRAL (sub MISCELLANEOUS TECHNICAL)
> )-like -
> S-like 222B INTEGRAL (sub MATHEMATICAL OPERATORS)
> or 0283 LATIN SMALL LETTER ESH (sub IPA EXTENSIONS)
> S-mirrorlike 2240 WREATH PRODUCT (sub MATHEMATICAL OPERATORS)
> or 0285 LATIN SMALL SQUAT REVERSED LETTER ESH (sub IPA EXTENS=
> IONS)

All of these characters are used editorially to indicate where variants
begin and end. They are all superscript. Similarities are not reliable,
since each actual font may render them in a different way. But again,
most of the equivalents listed above are intended for use in different
domains. Mixing up IPA characters and editorial marks would be a bad
idea, since there is some likelihood that the uses would conflict in the
humanities, since that would make phonetic information ambiguous.

> In contrast to my expectations, ISO 10646 / Unicode apparently does *not*
> comprise a COMBINING MIDDLE DOT (though there are letters such as 013F
> LATIN CAPITAL LETTER L WITH MIDDLE DOT). The only combining character
> resembling a middle dot apparently is 05BC HEBREW POINT DAGESH OR MAPIQ;
> howewver, I would not dare to use it with non-hebrew characters. Perhaps,
> you could use the non-combining character 00B7 MIDDLE DOT to code the
> dottet Bibelgesellschaft signs.

No, Hebrew characters repurposed for editorial marks would be a bad idea,
especially when a Hebrew Bible edition may need editorial marks. (That,
however, gets into bi-directional problems.)

> Best wishes,
> Otto Stolz
>

Thank you again.

I just wanted to raise the question of characters of use to Classics and
the Humanities in general. The technical, mathematical and business
profession have (understandably) had their symbols included from the very
beginning, and these are much more likely to be used than specialized symbols
than those I have pointed out. I would like to see some of these
characters available so that I can be assured that the most important
characters are available to me for text archives and software.

Others are presumably interested, as well. The Summer Institute of
Linguistics (http://www.sil.org, I believe) has done a lot of work on
computer tools for the humanites, and would be an ideal source of
information on what characters are most important in these fields.

I am also concerned that the need for idiosyncratic fonts be reduced (if
not eliminated). For ancient Greek, I know of maybe 5-6 fonts with very
different sets of characters. Linguist Software's SymbolGreek uses spacing
and non-spacing characters to leave room for the Nestle-Aland editorial
symbols (LS distributes Bible e-texts). GreekKeys fonts take the Unicode
approach by encoding all characters/diacritical combinations, leaving no
space for even the most useful editorial symbols. And other fonts may
include editorial symbols, but put them in a different order, etc...

I would like to see a well-defined set of characters partly because I am
dealing with the problem of converting the texts of the Thesaurus Linguae
Graecae (a comprehensive CD-ROM database of Ancient Greek texts) into
displayable encodings. I am using Unicode as the intermediate
representation. Many of the symbols are obscure, but some have general
use. For the time being, I will use the private use area, but I would
hope that I could, at some time, exchange a Unicode file with a scholar
and know that she could read it.

I suspect that 128 codepoints wuld suffice for the most common symbols, but,
as I mentioned, I have not done a comprehensive survey.

Sorry for the length!

-Ronald S. Wood
 Halifax, NS, Canada



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT