Major Defect in Combining classes of Tibetan Vowels

From: Christopher John Fynn (cfynn@gmx.net)
Date: Sat Jun 21 2003 - 21:06:40 EDT

  • Next message: Christopher John Fynn: "Major Defect in Combining Classes of Tibetan Vowels"

    In Unicode's UnicodeData.txt (
    http://www.unicode.org/Public/UNIDATA/UnicodeData.txt )
    0F7E has a Canonical Combining Class Value (CCCV) of 0;
    0F71 a CCCV of 129;
    0F72 0F7A 0F7B 0F7C 0F7D and 0F80 a CCCV of 130;
    0F74 a CCCV of 132;
    and 0F82 and 0F83 have a CCCV of 230.

    By normal Tibetan & Dzongkha spelling, writing, and input rules
    Tibetan script stacks should be entered and written: 1 headline
    consonant (0F40->0F6A), any subjoined consonant(s) (0F90->
    0F9C), achung (0F71), shabkyu (0F74), any above headline
    vowel(s) (0F72 0F7A 0F7B 0F7C 0F7D and 0F80) ; any ngaro (0F7E,
    0F82 and 0F83)

    So following normal Tibetan & Dzongkha input and spelling rules
    the relative ordering of these characters should be:
    A. 0F71
    B. 0F74
    C. 0F72 0F7A 0F7B 0F7C 0F7D and 0F80
    D. 0F7E, 0F82 and 0F83

    The fact that, in a process of "canonical decomposition" or
    "normalisation", these combining characters can get reordered
    in a bizarre order relative to each other causes difficulties
    with culturally correct collation (where 0F7E, 0F82 and 0F83
    should have an equal value) - and especially it necessitates
    making lookups in smart fonts far more complex and inefficient
    than they should have to be.

    (In Tibetan script fonts 0F71 and 0F74 are often ligated with
    preceding consonant (+ subjoined consonants) combined as a
    single glyph whereas above headline vowels are almost always
    treated as non spacing combining marks.)

    Currently there seems to be no easy or standardized work around
    for these problems and the standard seems to say that the
    relative values of assigned Canonical Combining Class Values
    cannot be changed.

    Any suggestions as to how to create a standardized work around
    for these incorrect values?

    - Chris



    This archive was generated by hypermail 2.1.5 : Sat Jun 21 2003 - 21:42:42 EDT