Re: Logical Storage Order For Complex Vowels in Tai Tham

From: Ed (
Date: Sun Jan 30 2011 - 22:44:12 CST

  • Next message: Richard Wordingham: "Re: Logical Storage Order For Complex Vowels in Tai Tham"

    On Sun, Jan 30, 2011 at 6:48 PM, Richard Wordingham
    <> wrote:
    > On Fri, 28 Jan 2011 10:23:17 -0600
    > Ed <> wrote:
    >> Hi, Everyone,
    >> In ISO/IEC JTC1/SC2/WG2 document N3121, "Proposal for encoding the
    >> Lanna script in the BMP of the UCS", the table of examples on pages
    >> 2-3 of section 5 "Dependent vowel signs" appears to imply (but note
    >> that the text does not *explicitely* state) that the decompositions
    >> shown are in fact the logical storage order.
    >> For most of the examples shown, the logical order makes sense.  But
    >> for combinations containing U+1A6C OA BELOW, it appears that an
    >> arbitrary choice has been made regarding the logical storage position
    >> of U+1A6C.
    >> In the examples in N3121, U+1A6C OA BELOW appears after U+1A6E VOWEL E
    >> (which makes sense to me) but (for example) before U+1A65 VOWEL I
    >> --and the latter does not make sense to me.
    > The combining *vowels* in a syllable have been written in accordance
    > with the rule for Thai-script character stacks, namely (pre-vocalic)
    > consonants, and then vowels and tone-marks from bottom to top.  For
    > example, <U+0E4D THAI CHARACTER NIKHAHIT> follows <U+0E38 THAI CHARACTER
    > SARA U> when writing Pali.

    OK ...

    > If Unicode hadn't balked at the idea of decomposing characters of
    > non-zero combining class (e.g. U+0D4B MALAYALAM VOWEL SIGN OO, which
    > consequently wound as class 0), then we might have assigned U+1A6C OA
    > BELOW class 220 and U+1A65 VOWEL I class 230.  In accordance with this
    > principle, I assume that multiple vowels below would be ordered from
    > top to bottom as with European scripts, but I haven't found any
    > examples that would make this an issue.
    > A surprise from the Thai point of view is the treatment of the -ua and
    > -ia vowels - these are <U+1A60 SAKOT, U+1A45 WA, 1A6B VOWEL O>

    ... I can live with that one

    > and
    > <U+1A60 SAKOT, U+1A3F LOW YA, U+1A6E VOWEL E>.

    ... but this one makes no sense at all to me.

    > The order put forward
    > was based on native intuition as reported by Martin Hosken.

    Whose native intuition?

    >> As shown in the attached image, I would have expected that
    >> U+1A65 VOWEL I appear *BEFORE* U+1A6C OA BELOW .  My expectation
    >> follows from the order in which I write the marks: That is, I write
    >> Tai Tham on paper from left to write, and from top to bottom.
    > Only Indo-Chinese Indic scripts have permission to follow
    > the handwriting order, publishing in Tai Tham isn't strictly legal
    > in Thailand, and Lao use was dismissed with contempt.
    > More seriously, alternations between vertical and horizontal stacking
    > of marks above indicate that if left-to-right ordering means anything,
    > their order is from bottom to top rather than top to bottom.  In
    > particular, the normal order goes pure vowel, mai kang, tone mark.
    > (There may be constraints on vertical stacking.  Printed Tai Khuen is
    > restricted to three rows - above, base consonant line, and below.

    Thai Khuen seems to have developed typographical conventions that solve
    some of the problems ...

    > Multiple characters above or below may invade the territory of the
    > following consonant, with some bizarre consequences.

    ... whereas I think Lanna Thai perhaps does not yet have a strongly-developed
    typographical tradition, so certain aspects of the writing system
    (like deep vertical stacking) which are not difficult to do when
    writing on paper turn out to be quite troublesome for typography.

    > Much Northern
    > Thai also only allows one row below - the 'Northern Dictionary of
    > Palm-Leaf Manuscripts' is a good example, and the more deeply
    > descending sequences are often deliberately avoided.  For example, some
    > books' drills in final consonants imply that vowels below are not
    > followed by subscript final consonants.)
    >> So I
    >> write vowel marks appearing *ABOVE* base consonants before I write
    >> vowel marks *BELOW*.
    >> U+1A6C OA BELOW is the most common vowel sign that can result in this
    >> kind of confusion.  However it may not be the only one.  There are a
    >> number of dipthong and tripthong vowels which occur in the various Tai
    >> languages and these are of course written using various combinations
    >> of 2 or more Tai Tham vowel signs.1A6A;TAI THAM VOWEL SIGN UU
    >> It appears that N3121 was not the "final" version document used when
    >> Tai Tham was approved for encoding; but I am not clear what the
    >> subsequent document(s) were?
    >> In any case, the examples provided in N3121 seem to me insufficient
    >> and, as already noted, nowhere does it explicitely state in N3121 that
    >> the decompositions represent the backing store order.
    > I trust you intend to be able to render words such as ᨻᩦ᩠᩵ᨶᩬ᩶ᨦ <U+1A3B
    > LOW PA, U+1A66 SIGN II, U+1A75 TONE-1, U+1A60 SAKOT, U+1A36 NA, U+1A6C
    > OA BELOW, U+1A76 TONE-2, U+1A26 LETTER NGA> piinɔɔŋ,

    Yes, I have seen this one and many other examples that have made me aware of
    the difficulties that we confront trying to construct a working Tai Tham font.

    > which in a font
    > without overhang support or proper vertical kerning/ligaturing one might
    > attempt to write as <U+1A3B, U+1A6C, U+1A60, U+1A36, U+1A66, U+1A75,
    > U+1A62, U+1A76>.
    >> Perhaps there is a need for a separate document to clarify what the
    >> backing store order should be for dipthong and tripthong vowels, inter
    >> alia, for Tai languages/dialects using Tai Tham script?
    > Don't forget the issue of consecutive syllables sharing the initial
    > consonants.  Not every style writes <U+1A7B TAI THAM SIGN MAI SAM> to
    > indicate the duplication, so in principle you could get vowels in
    > either order.  Now, I presume the careful one-akshara spellinɡ of
    > ᨡᩮᩢ᩶᩻ᩬᩣ᩠ᨦ khaokhɔɔŋ 'possessions' would be <U+1A21 HIGH KHA, U+1A6E
    > SIGN E, U+1A62 MAI SAT, U+1A76 TONE-2, U+1A7B MAI SAM, U+1A6C OA BELOW,
    > U+1A63 SIGN AA, U+1A60 SAKOT, U+1A26 NGA>.  Now, if one omits the mai
    > sam, as even the Maefahluang dictionary does, does the spelling simply
    > change by omitting U+1A7B, or does it rearrange to <U+1A21 HIGH KHA,
    > U+1A6E SIGN E, U+1A6C OA BELOW, U+1A62 MAI SAT, U+1A76 TONE-2, U+1A63
    > SIGN AA,  U+1A60 SAKOT, U+1A26 NGA?  A similar case with the vowel
    > below coming first is given by the contraction ᨧᩩ᩵ᩢᨦᨾᩦ <U+1A27 HIGH
    > CA, U+1A69 SIGN U, U+1A75 TONE-1, U+1A62 MAI SAT, U+1A26 NGA, U+1A3E
    > MA, U+1A66 SIGN II> of ᨧᩩ᩵ᩢᨦᨾᩦ  <U+1A27, U+1A69, U+1A75, U+1A26, U+1A27,
    > U+1A62, U+1A60, U+1A20 HIGH KA, U+1A3E, U+1A66> cuŋ cak mii.
    > Richard.

    This archive was generated by hypermail 2.1.5 : Sun Jan 30 2011 - 22:49:24 CST