Re: mixed-script writing systems

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Nov 18 2002 - 15:49:53 EST

Next message: Kenneth Whistler: "Re: The result of the Plane 14 tag characters review"

Previous message: John Hudson: "Re: Proposing two new Arabic ligature characters"
Maybe in reply to: Peter_Constable@sil.org: "mixed-script writing systems"
Next in thread: Dean Snyder: "Re: mixed-script writing systems"
Reply: Dean Snyder: "Re: mixed-script writing systems"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Andrew West wrote:

> On Mon, 18 Nov 2002 02:34:18 -0800 (PST), Kenneth Whistler wrote:
>
> > In point of fact,
> > people for centuries have been borrowing back and forth between
> > Latin, Greek, and Cyrillic in particular, so that in some respects
> > LGC is a kind of metascript and should be treated as such.
> >
>
> Latin, Greek, Cyrillic and Runic even (cf. Latin letters Thorn and Wynn).

Point taken. And don't forget Old Italic, which is now encoded as well.

>
> Gothic is a good example of a mixed-script writing system,

Not really -- a good example, that is.

> composed of a mixture
> of Latin, Greek and Runic letters. There is a "Gothicness" about the graphic
> forms of the glyphs of the Gothic alphabet, but IMHO this variation from
> "standard" (but what is "standard" in 4th century terms ?) Latin, Greek and
> Runic letters should be dealt with at the font level.

It isn't particularly helpful to go there, since it doesn't fit all that
well as merely a font variant of Latin or Greek or Runic. Certainly
it *could* be done that way, but for this particular case, the
committees were convinced that simply laying out Gothic as a distinct
script was more practical.

As it stands now, the Gothic bible can be correctly and unambiguously
represented in Unicode, using the Gothic script as defined. Not to
have encoded the Gothic script would have left us still arguing about
which letters from which script to use and how Gothic fonts should
be encoded.

> Nevertheless, Gothic has
> been encoded in Unicode, and this may provide an unwelcome precedent for
> encoding other mixed-script writing systems.

What you are getting at is the complicated problem of sorting out all
the historical connections between various related alphabets and trying
to sift them into categories which make sense as scripts and categories
which are simply font variants within a script. For modern scripts this
is less of a problem, since we have modern practice and typography to
rely on to help make the distinctions. For *historic* scripts, on the
other hand, it is murkier.

Old Italic is a good case in point. It *could* have been treated as
another archaic outlier of Greek. The problem with that is that it
would have added a few more archaic letters which never show up in
modern Greek fonts, and it would have forced distinct archaic fonts
to be able to represent Old Italic text reliably. Old Italic texts
don't get rendered with a modern Greek font -- it would look
ridiculous. Because of this usage pattern, it made sense to the
committees to coalesce the various southern Old Italic alphabets
(Oscan, Umbrian, Messapian, etc.) into a "script" which would incorporate
all the required letters for those alphabets, as *opposed* to Latin
or to Greek per se. It is likely that a similar decision will be
taken in the future to account for the Alpine alphabets of northern
Italy, which are intermediate between Italic and Runic alphabets.

What it comes down to is the fact that for historic scripts in
particular, there are no defined criteria that would enable us
to simply *discover* the right answer regarding the identity of
scripts. To a certain extent, the encoding committees need to
make arbitrary partitions of historic alphabets through time
and space, reflecting scholarly praxis as far as feasible, and
then live with the results. At least this procedure makes it
*possible* to represent the texts reliably, once the scripts
and their variants have been standardized.

>
> What about the now-defunct Zhuang alphabet (used between 1955 and 1981 in PRC)
> that was composed of a cumbersome mixture of Latin, Cyrillic and IPA letters ?
> Should the letters of this alphabet be encoded separately in "Zhuang" block,

Check the standard:

U+0185 LATIN SMALL LETTER TONE SIX
U+019C LATIN CAPITAL LETTER TURNED M
U+01A8 LATIN SMALL LETTER TONE TWO
etc.

This issue was decided already in 1989.

> or
> is it simply the fact that the borrowed letters do not exhibit any distinctive
> "Zhuangness" in their graphic form that precludes their being encoded separately
> in the same way that Gothic is ? (Or is it perhaps a Eurocentric bias in Unicode
> ?)

It is getting rather tiresome to have "Eurocentric bias" brandished
as a disparagement of an encoding standard, 87% of whose content consists
of Han or Hangul characters, and whose maintaining committees are busy
finalizing the addition of Limbu, Tai Le, Osmanya, Ugaritic Cuneiform,
and Linear B. The UTC met just last week, and voted to start the process
of adding the Karoshti script. Yeah, definitely a Eurocentric bias
detectable there in that collection of additions.

--Ken

>
> Andrew
>

Next message: Kenneth Whistler: "Re: The result of the Plane 14 tag characters review"
Previous message: John Hudson: "Re: Proposing two new Arabic ligature characters"
Maybe in reply to: Peter_Constable@sil.org: "mixed-script writing systems"
Next in thread: Dean Snyder: "Re: mixed-script writing systems"
Reply: Dean Snyder: "Re: mixed-script writing systems"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Nov 18 2002 - 16:32:32 EST