Re: PRC asking for 956 precomposed Tibetan characters

From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Tue Jan 07 2003 - 09:29:34 EST

  • Next message: Manoj Jain: "Unicode Standards for Indic Scripts"

    On Tue, 07 Jan 2003 06:16:43 -0800 (PST), "Robert R. Chilton" wrote:

    > I understand your interest in preserving the semantic or lexical
    > distinction between an instance of a contracted series of single vowels
    > and a true usage of the double vowel. However, the procedure of
    > normalization is designed to collapse all the variant encodings for a
    > particular presentation form into a single, "normalized" encoding.
    > ...
    > Canonical combining classes are defined for combining characters (such
    > as macron and dot-under, or the vowel signs of Tibetan) in order to
    > support normalization of identical presentation forms to a single
    > encoding. So in the cases you cite, of "graphically identical but
    > semantically different" instances, consistency in searching, sorting,
    > etc. requires that all "graphically identical" presentation forms be
    > normalized to a single normalized encoding.
    >

    O.K. Your explanation of normalisation makes sense, and I'll change the encoding
    of double and triple E and O vowel signs accordingly on my web pages. The only
    query I still have is why a triple E vowel sign should be normalised to <U+0F7B,
    U+0F7A> rather than <U+0F7A, U+0F7B> ? What determines that the former sequence
    is better than the latter sequence ?

    Andrew



    This archive was generated by hypermail 2.1.5 : Tue Jan 07 2003 - 10:37:27 EST