Re: mixed-script writing systems

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Nov 18 2002 - 15:49:53 EST

  • Next message: Kenneth Whistler: "Re: The result of the Plane 14 tag characters review"

    Andrew West wrote:

    > On Mon, 18 Nov 2002 02:34:18 -0800 (PST), Kenneth Whistler wrote:
    >
    > > In point of fact,
    > > people for centuries have been borrowing back and forth between
    > > Latin, Greek, and Cyrillic in particular, so that in some respects
    > > LGC is a kind of metascript and should be treated as such.
    > >
    >
    > Latin, Greek, Cyrillic and Runic even (cf. Latin letters Thorn and Wynn).

    Point taken. And don't forget Old Italic, which is now encoded as well.

    >
    > Gothic is a good example of a mixed-script writing system,

    Not really -- a good example, that is.

    > composed of a mixture
    > of Latin, Greek and Runic letters. There is a "Gothicness" about the graphic
    > forms of the glyphs of the Gothic alphabet, but IMHO this variation from
    > "standard" (but what is "standard" in 4th century terms ?) Latin, Greek and
    > Runic letters should be dealt with at the font level.

    It isn't particularly helpful to go there, since it doesn't fit all that
    well as merely a font variant of Latin or Greek or Runic. Certainly
    it *could* be done that way, but for this particular case, the
    committees were convinced that simply laying out Gothic as a distinct
    script was more practical.

    As it stands now, the Gothic bible can be correctly and unambiguously
    represented in Unicode, using the Gothic script as defined. Not to
    have encoded the Gothic script would have left us still arguing about
    which letters from which script to use and how Gothic fonts should
    be encoded.

    > Nevertheless, Gothic has
    > been encoded in Unicode, and this may provide an unwelcome precedent for
    > encoding other mixed-script writing systems.

    What you are getting at is the complicated problem of sorting out all
    the historical connections between various related alphabets and trying
    to sift them into categories which make sense as scripts and categories
    which are simply font variants within a script. For modern scripts this
    is less of a problem, since we have modern practice and typography to
    rely on to help make the distinctions. For *historic* scripts, on the
    other hand, it is murkier.

    Old Italic is a good case in point. It *could* have been treated as
    another archaic outlier of Greek. The problem with that is that it
    would have added a few more archaic letters which never show up in
    modern Greek fonts, and it would have forced distinct archaic fonts
    to be able to represent Old Italic text reliably. Old Italic texts
    don't get rendered with a modern Greek font -- it would look
    ridiculous. Because of this usage pattern, it made sense to the
    committees to coalesce the various southern Old Italic alphabets
    (Oscan, Umbrian, Messapian, etc.) into a "script" which would incorporate
    all the required letters for those alphabets, as *opposed* to Latin
    or to Greek per se. It is likely that a similar decision will be
    taken in the future to account for the Alpine alphabets of northern
    Italy, which are intermediate between Italic and Runic alphabets.

    What it comes down to is the fact that for historic scripts in
    particular, there are no defined criteria that would enable us
    to simply *discover* the right answer regarding the identity of
    scripts. To a certain extent, the encoding committees need to
    make arbitrary partitions of historic alphabets through time
    and space, reflecting scholarly praxis as far as feasible, and
    then live with the results. At least this procedure makes it
    *possible* to represent the texts reliably, once the scripts
    and their variants have been standardized.

    >
    > What about the now-defunct Zhuang alphabet (used between 1955 and 1981 in PRC)
    > that was composed of a cumbersome mixture of Latin, Cyrillic and IPA letters ?
    > Should the letters of this alphabet be encoded separately in "Zhuang" block,

    Check the standard:

    U+0185 LATIN SMALL LETTER TONE SIX
    U+019C LATIN CAPITAL LETTER TURNED M
    U+01A8 LATIN SMALL LETTER TONE TWO
    etc.

    This issue was decided already in 1989.

    > or
    > is it simply the fact that the borrowed letters do not exhibit any distinctive
    > "Zhuangness" in their graphic form that precludes their being encoded separately
    > in the same way that Gothic is ? (Or is it perhaps a Eurocentric bias in Unicode
    > ?)

    It is getting rather tiresome to have "Eurocentric bias" brandished
    as a disparagement of an encoding standard, 87% of whose content consists
    of Han or Hangul characters, and whose maintaining committees are busy
    finalizing the addition of Limbu, Tai Le, Osmanya, Ugaritic Cuneiform,
    and Linear B. The UTC met just last week, and voted to start the process
    of adding the Karoshti script. Yeah, definitely a Eurocentric bias
    detectable there in that collection of additions.

    --Ken

    >
    > Andrew
    >



    This archive was generated by hypermail 2.1.5 : Mon Nov 18 2002 - 16:32:32 EST