RE: Precomposed Tibetan

From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Wed Dec 18 2002 - 09:54:45 EST

  • Next message: Carl W. Brown: "RE: Precomposed Ethiopic (Was: Precomposed Tibetan)"

    On Wed, 18 Dec 2002 06:00:42 -0800 (PST), "Kent Karlsson" wrote:

    > Are you saying that the reading (as in pronouncing) order for the letters does
    > not actually match the storage order (which I supposed was to be "logical"
    > order).
    > Similarly, are you saying that for collation order (dictionaries and the like),
    > that the storage order for characters in a string of Tibetan does not directly
    > match the "significance" that should be given to characters; but that some
    > kind of rearrangement (as for Thai, Lao, and Khmer [ROBAT]) is needed? If so,
    > which rearrangement?

    The Unicode storage order follows the writing order of the word, but does not
    necessarily follow the collation order used in dictionaries.

    The reason is that a Tibetan stack is composed of a stem consonant (30 native
    consonants plus 5 reversed letters for transliterating Sanskrit) with :
    - zero or one superjoined consonants RA, LA or SA (these are silent in modern
    Lhasa dialect, but may affect the tone of the syllable)
    - zero or many (very very rarely more than two) subjoined consonants (only WA,
    YA, RA, LA in normal Tibetan)
    - zero or one vowel lengthener sign (the a-chung)
    - zero or one other Sanskrit signs such as the Anusvara
    - zero or many vowel signs (more than one vowel sign is only found in shorthand
    contractions of polysyllabic words)

    In addition, in native Tibetan words zero or one of the five letters GA, DA, BA,
    MA or 'A may prefix the stack. In modern Lhasa dialect these letters are silent,
    although they may affect the pronunciation of the preceding syllable (adding a
    nasal or plosive sound to an open syllable for example).

    The stack may also be suffixed by the letters GA, NGA, DA, NA, BA, MA, 'A, RA,
    LA, SA or the compound suffixes -GS, -NGS, -BS, -MS (and -RD and -ND in literary
    Tibetan).

    Thus, for example, the word BrdaRten (two syllables) may be analysed as :
    B = a prefix
    R = a superjoined letter
    D = the stem letter
    A = an implicit vowel (no vowel sign for A)
    R = a superjoined letter
    T = the stem letter
    E = a vowel
    N = a suffix

    The collation order is based on the stem consonant of the word. Thus in a
    dictionary the word Brda would be listed under D, and the word Rten would be
    listed under T.

    I think that the intuitive way to have encoded Tibetan would have been to have
    35 stem consonants, with three superjoined letters (RA, LA, SA) and 35 subjoined
    letters.

    The way that Tibetan is actually encoded is to have 35 full-size consonants
    (plus 6 ligatures and a fixed-form RA) [U+0F40 - U+0F6A] and 44 subjoined
    consonants (including ligatures and fixed forms) [U+0F90 - U+0FBC]. The rule is
    that the first consonant in the stack, which will either be a stem consonant or
    a superfixed RA, LA or SA, is encoded with the full-sized consonant [U+0F40 -
    U+0F6A], and all other consonants in the stack are encoded with the
    corresponding subjoined form [U+0F90 - U+0FBC]. This means that the stem
    consonant in a Tibetan word may be encoded as full-size or subjoined consonant;
    and conversely, the full-size consonant [U+0F40 - U+0F6A] may not necessarily be
    the stem consonant.

    In practice this character encoding model works fine, but I can understand why
    it does not immediately appeal to Tibetan speakers.

    Andrew



    This archive was generated by hypermail 2.1.5 : Wed Dec 18 2002 - 10:30:50 EST