Re: Logical Storage Order For Complex Vowels in Tai Tham

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sun Jan 30 2011 - 17:48:16 CST

  • Next message: Ed: "Re: Logical Storage Order For Complex Vowels in Tai Tham"

    On Fri, 28 Jan 2011 10:23:17 -0600
    Ed <ed.trager@gmail.com> wrote:

    > Hi, Everyone,
    >
    > In ISO/IEC JTC1/SC2/WG2 document N3121, "Proposal for encoding the
    > Lanna script in the BMP of the UCS", the table of examples on pages
    > 2-3 of section 5 "Dependent vowel signs" appears to imply (but note
    > that the text does not *explicitely* state) that the decompositions
    > shown are in fact the logical storage order.
    >
    > For most of the examples shown, the logical order makes sense. But
    > for combinations containing U+1A6C OA BELOW, it appears that an
    > arbitrary choice has been made regarding the logical storage position
    > of U+1A6C.
    >
    > In the examples in N3121, U+1A6C OA BELOW appears after U+1A6E VOWEL E
    > (which makes sense to me) but (for example) before U+1A65 VOWEL I
    > --and the latter does not make sense to me.

    The combining *vowels* in a syllable have been written in accordance
    with the rule for Thai-script character stacks, namely (pre-vocalic)
    consonants, and then vowels and tone-marks from bottom to top. For
    example, <U+0E4D THAI CHARACTER NIKHAHIT> follows <U+0E38 THAI CHARACTER
    SARA U> when writing Pali.

    If Unicode hadn't balked at the idea of decomposing characters of
    non-zero combining class (e.g. U+0D4B MALAYALAM VOWEL SIGN OO, which
    consequently wound as class 0), then we might have assigned U+1A6C OA
    BELOW class 220 and U+1A65 VOWEL I class 230. In accordance with this
    principle, I assume that multiple vowels below would be ordered from
    top to bottom as with European scripts, but I haven't found any
    examples that would make this an issue.

    A surprise from the Thai point of view is the treatment of the -ua and
    -ia vowels - these are <U+1A60 SAKOT, U+1A45 WA, 1A6B VOWEL O> and
    <U+1A60 SAKOT, U+1A3F LOW YA, U+1A6E VOWEL E>. The order put forward
    was based on native intuition as reported by Martin Hosken.

    > As shown in the attached image, I would have expected that
    > U+1A65 VOWEL I appear *BEFORE* U+1A6C OA BELOW . My expectation
    > follows from the order in which I write the marks: That is, I write
    > Tai Tham on paper from left to write, and from top to bottom.

    Only Indo-Chinese Indic scripts have permission to follow
    the handwriting order, publishing in Tai Tham isn't strictly legal
    in Thailand, and Lao use was dismissed with contempt.

    More seriously, alternations between vertical and horizontal stacking
    of marks above indicate that if left-to-right ordering means anything,
    their order is from bottom to top rather than top to bottom. In
    particular, the normal order goes pure vowel, mai kang, tone mark.

    (There may be constraints on vertical stacking. Printed Tai Khuen is
    restricted to three rows - above, base consonant line, and below.
    Multiple characters above or below may invade the territory of the
    following consonant, with some bizarre consequences. Much Northern
    Thai also only allows one row below - the 'Northern Dictionary of
    Palm-Leaf Manuscripts' is a good example, and the more deeply
    descending sequences are often deliberately avoided. For example, some
    books' drills in final consonants imply that vowels below are not
    followed by subscript final consonants.)

    > So I
    > write vowel marks appearing *ABOVE* base consonants before I write
    > vowel marks *BELOW*.
    >
    > U+1A6C OA BELOW is the most common vowel sign that can result in this
    > kind of confusion. However it may not be the only one. There are a
    > number of dipthong and tripthong vowels which occur in the various Tai
    > languages and these are of course written using various combinations
    > of 2 or more Tai Tham vowel signs.1A6A;TAI THAM VOWEL SIGN UU
    >
    > It appears that N3121 was not the "final" version document used when
    > Tai Tham was approved for encoding; but I am not clear what the
    > subsequent document(s) were?
    >
    > In any case, the examples provided in N3121 seem to me insufficient
    > and, as already noted, nowhere does it explicitely state in N3121 that
    > the decompositions represent the backing store order.

    I trust you intend to be able to render words such as ᨻᩦ᩠᩵ᨶᩬ᩶ᨦ <U+1A3B
    LOW PA, U+1A66 SIGN II, U+1A75 TONE-1, U+1A60 SAKOT, U+1A36 NA, U+1A6C
    OA BELOW, U+1A76 TONE-2, U+1A26 LETTER NGA> piinɔɔŋ, which in a font
    without overhang support or proper vertical kerning/ligaturing one might
    attempt to write as <U+1A3B, U+1A6C, U+1A60, U+1A36, U+1A66, U+1A75,
    U+1A62, U+1A76>.

    > Perhaps there is a need for a separate document to clarify what the
    > backing store order should be for dipthong and tripthong vowels, inter
    > alia, for Tai languages/dialects using Tai Tham script?

    Don't forget the issue of consecutive syllables sharing the initial
    consonants. Not every style writes <U+1A7B TAI THAM SIGN MAI SAM> to
    indicate the duplication, so in principle you could get vowels in
    either order. Now, I presume the careful one-akshara spellinɡ of
    ᨡᩮᩢ᩶᩻ᩬᩣ᩠ᨦ khaokhɔɔŋ 'possessions' would be <U+1A21 HIGH KHA, U+1A6E
    SIGN E, U+1A62 MAI SAT, U+1A76 TONE-2, U+1A7B MAI SAM, U+1A6C OA BELOW,
    U+1A63 SIGN AA, U+1A60 SAKOT, U+1A26 NGA>. Now, if one omits the mai
    sam, as even the Maefahluang dictionary does, does the spelling simply
    change by omitting U+1A7B, or does it rearrange to <U+1A21 HIGH KHA,
    U+1A6E SIGN E, U+1A6C OA BELOW, U+1A62 MAI SAT, U+1A76 TONE-2, U+1A63
    SIGN AA, U+1A60 SAKOT, U+1A26 NGA? A similar case with the vowel
    below coming first is given by the contraction ᨧᩩ᩵ᩢᨦᨾᩦ <U+1A27 HIGH
    CA, U+1A69 SIGN U, U+1A75 TONE-1, U+1A62 MAI SAT, U+1A26 NGA, U+1A3E
    MA, U+1A66 SIGN II> of ᨧᩩ᩵ᩢᨦᨾᩦ <U+1A27, U+1A69, U+1A75, U+1A26, U+1A27,
    U+1A62, U+1A60, U+1A20 HIGH KA, U+1A3E, U+1A66> cuŋ cak mii.

    Richard.



    This archive was generated by hypermail 2.1.5 : Sun Jan 30 2011 - 17:53:23 CST