Re: Sample of symbols useful in Classics (was: Apple's Unicode)

From: Ronald S. Wood (wood@cs.dal.ca)
Date: Wed Aug 14 1996 - 23:51:49 EDT


On Wed, 14 Aug 1996 Edward Cherlin wrote:

> Ronald Wood wrote:
> >On Wed, 14 Aug 1996, Otto Stolz wrote:
> >> On Mon Aug 12, 17:07, Ronald S. Wood <wood@cs.dal.ca> has asked for the
> >> ISO 10646 / Unicode codings for some symbols used in classics.
> [snip]
> >> I think that the symbols used in Classics have not been considered when
> >> Unicode has been defined, yet Unicode /ISO 10646 comprises characters
> >> suitable for some of them.
> [snip]
>
> My $.02 worth: These are a reasonably well-defined set of characters
> essential to a fairly large set of users. Let some of them come up with a
> proposal, argue the merits of all the niggling details, and give them a
> page.
>

You're right: I'm trying to gauge whether anyone on this list is
interested in these issues. I've raised the matter with the TLG and TLL
people, but they are not very up to speed on this, nor doe they appear to
have the time.

> >> Nevertheless, some of the symbols used by the Deutsche Bibelgesellschaft =
> >> can
> >> be coded in Unicode / ISO 10646, viz.:
> >> alfa 03B1 GREEK SMALL LETTER ALPHA (sub BASIC GREEK)
> >> or 237A APL FUNCTIONAL SYMBOL ALPHA (sub MISCELLANEOUS TECHNI=
> >> CAL)
>
> No, no, please, no. Please nobody use APL characters for something other
> than APL. They have weird spacing (monospaced) and semantics (written
> left-to-right, parsed right-to-left, with very specific meanings). We can
> argue whether this particular DB alpha is a Greek alpha, or yet another
> special alpha (like the math and APL versions) but it is NOT APL.
>
> >I only included lower case alpha to indicate that the following
> >characters were superscript. 0x03B1 is the only appropriate encoding for the
> >Greek language.
> [various cases and suggestions snipped]
>
> In the past, Unicode has been somewhat cramped, and the tendency has been
> to unify characters where possible without infringing on other standards.
> Now with UTF-16 we can afford to separate character sets for clearly
> different uses.

Yes, well, as Rick Gowan just argued, the principle of unification might
recommend itself. But I also argued that it may be useful to have
separate, but similar symbols to avoid ambiguity, especially where they
may have some functional significance; e.g. the speech or prosodic
generation example I gave. Unification appears to weighed against
functional goals, so that APL and Greek alphas are separate. (One may be
used in a sentence, the other may be interepreted in APL unambiguously.

> >I am also concerned that the need for idiosyncratic fonts be reduced (if
> >not eliminated). For ancient Greek, I know of maybe 5-6 fonts with very
> >different sets of characters. Linguist Software's SymbolGreek uses spacing
> >and non-spacing characters to leave room for the Nestle-Aland editorial
> >symbols (LS distributes Bible e-texts). GreekKeys fonts take the Unicode
> >approach by encoding all characters/diacritical combinations, leaving no
> >space for even the most useful editorial symbols. And other fonts may
> >include editorial symbols, but put them in a different order, etc...
>
> Actually, Unicode is going to increase the number of idiosyncratic fonts,
> but to a considerable extent will allow them to work together properly. We
> must remember that a font is not an input method, and that a keyboard
> layout or set of symbol menus can include a variety of characters from
> different fonts (as in math input editors). Thus you could set up your
> system to use GreekKeys letters and SymbolGreek editorial marks, because
> they would be identified by Unicode code points, not font positions. You
> could type letters and diacritical marks as separate elements or as
> precomposed combinations, no matter which approach your font takes. The Mac
> keyboard and the Windows International keyboard both allow typing of most
> diacritics separately, for display and printing using precomposed forms.

The examples I gave of Greek fonts showed different mappings for similar
characters, and presumably this would be avoided. Also I could expect
more characters to be consistently available in a Unicode scenario. The
idiosyncracy you mean is presumably the code planes that are actually
supported in a given font, which may be heavy on Asian languages or lean
in the direction of European languages, but few technical characters.
Still, things would be in their expected places, and input methods could
be implemented once and reliably map to the correct plane.

> >I would like to see a well-defined set of characters partly because I am
> >dealing with the problem of converting the texts of the Thesaurus Linguae
> >Graecae (a comprehensive CD-ROM database of Ancient Greek texts) into
> >displayable encodings. I am using Unicode as the intermediate
> >representation. Many of the symbols are obscure, but some have general
> >use. For the time being, I will use the private use area, but I would
> >hope that I could, at some time, exchange a Unicode file with a scholar
> >and know that she could read it.
>
> Does this mean that there will be a CD-ROM of TLG in Unicode? And is there
> any chance of the price coming down to the textbook level, say $40-50,
> instead of the $300 they have been charging, so that all beginning Classics
> students and amateurs could automatically buy one? I've had my eye on it
> ever since it was announced, but nobody I know around here has a copy at
> that price.

No. I have asked Theodore Brunner and they have no plans whatsoever for a
Unicode version. Nor are they much interested in structural encoding in
SGML/TEI. I don't think they are going to budge on their pricing policy.
They are sticking with 3-year (I think) licences. Individual licences are
more around $500. Still too much for me and most students. I wish they
would go for volume, rather than high licencing prices. On the other
hand, there are not a lot of classicists these days. It could be that
copyright agreements make it difficult for them, since easily available
electronic texts would cut into the sales of the book versions. (I'd like
to have both: who wants to read Plato on a screen, except for brief
stretches?)

I am using Unicode only to map between TLG beta code and the myriad
fonts. It's frustrating to see wildly varying character availability in
each font.

>
> >I suspect that 128 codepoints wuld suffice for the most common symbols, but,
> >as I mentioned, I have not done a comprehensive survey.
> >
> >Sorry for the length!
>
> Don't apologize. The correct length for a technical discussion is long
> enough to state all of the problems, proposed solutions' and pros and cons
> clearly (see my sig).
>
> >-Ronald S. Wood
> > Halifax, NS, Canada
>
>
>
> Edward Cherlin Helping Newbies to become "knowbies" Point Top 5%
> Vice President http://www.newbie.net/Mentors/Cherlin of Web sites
> NewbieNet, Inc. Everything should be made as simple as possible,
> cherlin@newbie.net __but no simpler__. Albert Einstein

Finally, to all appearances, a classicist that does software. I'm just
dipping my feet at the moment, trying to decide whether to realy leap
into programming (you know, no training in CS, self-taught, and all)...

Ciao!

-Ronald S. Wood.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT