mixed-script writing systems

From: Peter_Constable@sil.org
Date: Fri Nov 15 2002 - 12:17:57 EST

  • Next message: David Starner: "Re: Every character code in the world"

    One of the Unicode design principles is unification: "unify across
    languages, but not across scripts". As a result, the "A" used in all
    Latin-based writing systems is the same character, but that character is
    different from the "A" used in Cyrillic- or Greek-based writing systems.

    There are a very small number of cases of truly ecclectic writing systems:
    the IPA transcription system uses mostly Latin characters, but also uses a
    few Greek characters, and Japanese writing mixes three scripts (complete
    repertoires of two scripts, and a large portion of a third script). (One
    might debate whether we should describe Japanese writing in terms of a
    single writing system involving three scripts, or simultaneous use of three
    writing systems. I have been inclined toward the former, but that's another
    topic.) Of course, digits and punctuation get shared, but the norm is that
    a writing system for a given language is based on a single script, and IPA
    and Japanese are clearly exceptions.

    That intro may well spawn a number of sub-threads, but I'm interested in
    only one question. It has to do with an Asian language, Wakhi
    (http://www.ethnologue.com/show_language.asp?code=WBL). This is spoken in
    Afghanistan, China, Pakistan and Tajikistan (reportedly, similar
    populations in each country). I don't know if the same writing system is
    used in all countries, but there is at least one writing system, which is
    Latin-based. (There appears also to be a distinct Cyrillic-based writing
    system in use.)

    What is unusual about this Latin-based writing system is that its creators
    (I don't know who they were) were a little bit ecclectic: whereas most of
    the characters are from the Latin script, it also uses three Greek
    characters and one Cyrillic character: gamma, delta, theta, and Cyrillic
    yeru (U+042B, U+044B). I've attached a GIF showing a sample of a page from
    a publication showing all four of these characters (though not both upper
    and lower case; note that the gamma is also used with combining caron to
    create another grapheme).

    (The gamma is designed like the Greek gamma, U+0393 / U+03B3, and not the
    Latin gamma, U+0194 / U+0263. Also, it uses an ezh, which could possible be
    represented as the Cyrillic characters "Abkhasian Dze / dze" U+04E0 /
    U+04E1, but given that the vast majority of characters are Latin, is makes
    mroe sense to consider these to be the Latin characters Ezh / ezh, U+01B7 /
    U+0292.)

    So, the question is this: Should we say that this writing system is
    completely Latin (keeping the norm that orthographic writing systems use a
    single script) and apply the principle of unification -- across languages
    but not across scripts -- to imply that we need to encode new characters,
    Latin delta, Latin theta and Latin yeru? Or, do we say that this writing
    system is only *mostly* Latin-based, and that it mixes in a few characters
    from other scripts?

    I have an idea what I think is the better thing to do, but I'm curious to
    see if it matches others' opinions.

    - Peter

    ---------------------------------------------------------------------------
    Peter Constable

    Non-Roman Script Initiative, SIL International
    7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
    Tel: +1 972 708 7485
    E-mail: <peter_constable@sil.org>

    (See attached file: Luqo Injil_38.gif)



    Luqo_Injil_38.gif

    This archive was generated by hypermail 2.1.5 : Fri Nov 15 2002 - 13:15:40 EST