RE: Why is U+17C1 of General category Mc while U+0E40 and U+0EC) are of category Lo ?

From: Kent Karlsson (
Date: Wed Mar 31 2004 - 06:55:47 EST

  • Next message: Peter Kirk: "Re: Fixed Width Spaces (was: Printing and Displaying DependentVowels)" wrote:
    > Thai (and Lao, whose encoding closely parallels that of Thai) are
    > encoded in Unicode on unique principles: by a straight left-to-right
    > typewriter-style encoding. This was done for compatibility with the
    > pervasive Thai 8-bit standard. It also means that for collation
    > what are historically left-side vowels must be moved after
    > the following consonant.

    For more on collation of Thai, Lao, and Khmer, see the proposed update
    ISO/IEC 14651 CTT (and the UAX 10 DUCET), and a tailoring for the CTT,
    in the two documents:

    (Note that the "swapping" part for Thai/Lao of the tailoring is dealt
    by other means (in the prehandling) in the Unicode collation algorithm.)

    > Note that the Thai characters are not labeled LETTER or VOWEL SIGN or
    > what have you, but simply CHARACTER.

    Yes, but that has no particular consequence. Note that the vowel signs
    are in the documents referenced above treated as vowel signs, regardless
    of if they are called "LETTER", "VOWEL SIGN", or "CHARACTER" (and,
    actually, regardless of their general category, as it happens). There is
    also the complication that some of the consonant characters are
    logically used as vowel (parts), but the modern convention is to ignore
    that in the collation rules, and always treat them as consonants in

                    /kent k

    This archive was generated by hypermail 2.1.5 : Wed Mar 31 2004 - 07:35:11 EST