Re: PRC asking for 956 precomposed Tibetan characters

From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Thu Jan 02 2003 - 07:54:59 EST

  • Next message: Anto'nio Martins-Tuva'lkin: "Re: Fw: Karelian ASSR"

    On Thu, 02 Jan 2003 04:43:23 -0800 (PST), "Chris Fynn" wrote:

    >
    > ----- Original Message -----
    > From: "Robert R. Chilton" <acip@well.com>
    > To: <tibex@unicode.org>
    > Sent: Sunday, December 29, 2002 9:34 AM
    > Subject: [tibex] Re: PRC asking for 956 precomposed characters
    >
    >
    > > I had heard some rumors about this proposal over the past year and I was
    > > interested to finally see n2558. Sadly, this proposal is flawed on many
    > > counts. It seems that this proposal is motivated solely by
    > > typographical considerations without concern for broader character data
    > > processing needs. Although this character set might be fine for
    > > computer-based typesetting of the modern Tibetan materials now being
    > > printed in the Peoples' Republic of China, it is somewhat lacking as a
    > > basis for interchange and processing of Tibetan-script data.
    >
    > > Most notably this proposal represents the repertoire of a particular
    > > sub-language (modern Tibetan as used in the PRC) rather than a script.
    > > There are many examples of Tibetan-script words in classical Tibetan
    > > works, as well as in Dzongkha and other Tibetan-script languages of
    > > South Asia, that cannot be represented by this character set.
    >

    ...

    Whilst I agree in general with Robert's point-by-point refutation of document
    n2558, I still think that the Chinese proposal is being unfairly misrepresented
    when he states that it only "represents the repertoire of a particular
    sub-language (modern Tibetan as used in the PRC)". Although it is true that the
    PRC proposal is biased towards PRC Tibetan orthography (e.g. includes glyphs for
    representing the Non-Tibetan sounds FA, FI, FU, FE and FO as used in the PRC,
    but not the glyphs that are created by adding the TSA -PHRU mark [U+0F39] to the
    consonants PHA and BA, and which are used outside the PRC for representing the
    sounds FA etc. and VA etc.), it seems to me that the glyph repertoire covers not
    only Modern Tibetan, but also includes glyphs that are normally only found in
    early Tibetan texts (note for example the large number of the Reversed I
    glyphs), as well as the vast majority of commonly encountered Sanskrit-usage
    stacks. Admittedly the proposal does not cover all conceivable consonant-vowel
    stacks, but I still maintain that it has fairly comprehensive coverage of the
    glyphs that are likely to be encountered in the vast majority of Tibetan texts,
    both secular and religious, ancient and modern.

    Nevertheless, whether the Chinese proposal fails to include certain
    transliteration letters or obscure Sanskrit-usage stacks or special letters used
    for writing Dzongkha (although as far as I know Dzongkha is just a dialect of
    Tibetan - or a separate language for political reasons - and written Dzongkha is
    much the same as written Tibetan ... no doubt someone will correct me on this)
    is largely irrelevant. The proposal could easily be expanded to include the
    non-PRC usage letters, or a separate "Extended Brdarten" block could be
    proposed. The key point is that the existing Tibetan encoding model works just
    fine for all varieties of Tibetan, and there is simply no need for precomposed
    Tibetan characters.

    I've posted my analysis of document n2558, together with a table mapping the
    proposed glyphs to existing Unicode sequences, at
    http://uk.geocities.com/babelstone1357/Tibetan/brdarten.html

    These are my main observations :

    1. The proposal includes a single, apparently arbitrary, example of a consonant
    plus triple E vowel (Glyph 107) that is found only in Tibetan shorthand
    abbreviations, but many other consonant plus multiple vowel sign shorthand
    abbreviations that are frequently encountered in prayer flags and elsewhere are
    not covered by this proposal. (See
    http://uk.geocities.com/babelstone1357/Tibetan/shorthand.html for some
    illustrated examples of shorthand abbreviations.)

    2. The proposal includes two examples of letters (KA and KHA) with a superfixed
    TIBETAN SIGN LCE TSA CAN [U+0F88] (Glyphs 029 and 100). This sign is most
    commonly used in Kalachakra literature, and there are presumably other instances
    of its usage combined with different letters that are not covered by this
    proposal. I'm not entirely sure how these glyphs should be encoded using the
    existing Unicode character encoding model - I assume that the sign LCE TSA CAN
    [U+0F88] should be encoded immediately following the base consonant with which
    it is associated (i.e. <U+0F40, U+0F88> for Glyph 029 and <U+0F41, U+0F88> for
    Glyph 100). Please correct me if I'm wrong.

    3. The proposal includes two examples of letters (PA and PHA) with a superfixed
    TIBETAN MARK PALUTA [U+0F85] (Glyphs 435 and Glyph 486). Presumably there are
    other instances of its usage combined with different letters that are not
    covered by this proposal. Again I'm not entirely sure how these glyphs should be
    encoded using the existing Unicode character encoding model - I assume that the
    paluta [U+0F85] should be encoded immediately following the base consonant with
    which it is associated (i.e. <U+0F54, U+0F85> for Glyph 435 and <U+0F55, U+0F85>
    for Glyph 486). Please correct me if I'm wrong.

    4. Glyph 687 [Tibetan BrdaRten Character ZHA], Glyph 698 [Tibetan BrdaRten
    Character ZA] and Glyph 713 [Tibetan BrdaRten Character AHA] in the proposal are
    respectively the letters ZHA [U+0F5E], ZA [U+0F5F] and -A [U+0F60] with a dot
    slightly right of centre over the top of the letter. I do not recognise this
    dot-like mark, and the names given in Document N2558 do not explain what it
    signifies. Can anyone enlighten me ?

    Andrew



    This archive was generated by hypermail 2.1.5 : Thu Jan 02 2003 - 08:45:55 EST