Re: PRC asking for 956 precomposed Tibetan characters

From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Tue Jan 07 2003 - 05:49:17 EST

  • Next message: Andrew C. West: "Re: PRC asking for 956 precomposed Tibetan characters"

    I've just realised that Robert's postings to the Unicode list are not getting
    through, and so I'm forwarding the original message which I only excerpted in my
    reply yesterday.

    ------- Start of forwarded message -------

    From: "Robert R. Chilton" <acip@well.com>
    Date: Sat, 04 Jan 2003 00:13:45 -0500
    Cc: unicode@unicode.org, cfynn@gmx.net, tibex@unicode.org
    Subject: Re: PRC asking for 956 precomposed Tibetan characters
    To: "Andrew C. West" <andrewcwest@alumni.princeton.edu>

    Andrew C. West wrote:
    >
    > ...
    >
    > Nevertheless, whether the Chinese proposal fails to include certain
    > transliteration letters or obscure Sanskrit-usage stacks or special letters
    used
    > for writing Dzongkha (although as far as I know Dzongkha is just a dialect of
    > Tibetan - or a separate language for political reasons - and written Dzongkha
    is
    > much the same as written Tibetan ... no doubt someone will correct me on this)
    > is largely irrelevant. The proposal could easily be expanded to include the
    > non-PRC usage letters, or a separate "Extended Brdarten" block could be
    > proposed. The key point is that the existing Tibetan encoding model works just
    > fine for all varieties of Tibetan, and there is simply no need for precomposed
    > Tibetan characters.

    I agree that the main objection to n2558 is that it is simply
    unnecessary; the existing Tibetan encoding model is not only sufficient
    but enables a far greater range of Tibetan-script orthography than the
    character set proposed in n2558.

    Moreover, for the authors of n2558 to argue that a non-combining model
    of Tibetan is necessary for compatibility with "traditional education,
    publication and electronic desktop publishing systems" to is to entirely
    discount the use of other complex scripts --such as the Indic scripts
    which employ a combining model-- in such "systems". Clearly, the
    direction of such a rationale runs entirely opposite to the basic
    principles of Unicode/ISO-10646.

    > I've posted my analysis of document n2558, together with a table mapping the
    > proposed glyphs to existing Unicode sequences, at
    > <a
    href="http://mail.alumni.princeton.edu/jump/http://uk.geocities.com/babelstone1357/Tibetan/brdarten.html">http://uk.geocities.com/babelstone1357/Tibetan/brdarten.html>

    Although I have not yet had time to check through Andrew's table mapping
    the proposed glyphs in n2558 to existing Unicode sequences, I can
    respond to his observations, below.

    > These are my main observations :
    >
    > 1. The proposal includes a single, apparently arbitrary, example of a consonant
    > plus triple E vowel (Glyph 107) that is found only in Tibetan shorthand
    > abbreviations, but many other consonant plus multiple vowel sign shorthand
    > abbreviations that are frequently encountered in prayer flags and elsewhere are
    > not covered by this proposal. (See
    > <a
    href="
    http://mail.alumni.princeton.edu/jump/http://uk.geocities.com/babelstone1357/Tibetan/shorthand.html">http://uk.geocities.com/babelstone1357/Tibetan/shorthand.html> for some
    > illustrated examples of shorthand abbreviations.)

    Such cases of triple (or quadruple) vowels E or O are best normalized to
    double vowel plus single (or double) vowel to aid in collation and other
    character data processing functions. Thus, Glyph 107 is best encoded as
    (or normalized to) <U+0F41, U+0FB1, U+0F7B, U+0F7A>.

    > 2. The proposal includes two examples of letters (KA and KHA) with a superfixed
    > TIBETAN SIGN LCE TSA CAN [U+0F88] (Glyphs 029 and 100). This sign is most
    > commonly used in Kalachakra literature, and there are presumably other
    instances
    > of its usage combined with different letters that are not covered by this
    > proposal. I'm not entirely sure how these glyphs should be encoded using the
    > existing Unicode character encoding model - I assume that the sign LCE TSA CAN
    > [U+0F88] should be encoded immediately following the base consonant with which
    > it is associated (i.e. <U+0F40, U+0F88> for Glyph 029 and <U+0F41, U+0F88> for
    > Glyph 100). Please correct me if I'm wrong.
    >
    > 3. The proposal includes two examples of letters (PA and PHA) with a superfixed
    > TIBETAN MARK PALUTA [U+0F85] (Glyphs 435 and Glyph 486). Presumably there are
    > other instances of its usage combined with different letters that are not
    > covered by this proposal. Again I'm not entirely sure how these glyphs should
    be
    > encoded using the existing Unicode character encoding model - I assume that the
    > paluta [U+0F85] should be encoded immediately following the base consonant with
    > which it is associated (i.e. <U+0F54, U+0F85> for Glyph 435 and <U+0F55,
    U+0F85>
    > for Glyph 486). Please correct me if I'm wrong.

    Assuming that there have been no changes in the combining classes of
    these characters since Unicode 3.0, the 2 characters <U+0F88> and
    <U+0F89> are spacing, non-combining characters. Therefore, the only
    possible encoding that will place the "base consonant" under these signs
    (i.e., will result in these signs being "superfixed" to the letters KA,
    KHA, PA, PHA, etal.) is for these characters to appear in the data
    stream just prior to the "base consonant", such base consonant being
    encoded in subjoined position. [It is not really correct to say that
    "The Unicode Standard does not explicitly specify the coding sequence
    for letters that are combined with any of the transliteration characters
    U+0F88 through U+0F8B" since the combining class of the characters is
    determinative.]

    Thus, to encode Glyphs 029 and 100 use <U+0F88, U+0F90> and <U+0F88,
    U+0F91>, respectively. Likewise, to encode Glyphs 435 and 486 use
    <U+0F89, U+0FA4> and <U+0F89, U+0FA5>, respectively. Note that these
    latter two glyphs are *NOT* a case of superfixed TIBETAN MARK PALUTA but
    rather a case of superfixed TIBETAN SIGN MCHU CAN. The PALUTA has a
    different function (of transliterating the Sanskrit apostrophe in
    Tibetan script) and is not found in superfixed position. [Note also
    that a naive reader might mistake the TIBETAN SIGN MCHU CAN for a
    superfixed NYA, just as one might confuse the NYA and the PALUTA.]

    > 4. Glyph 687 [Tibetan BrdaRten Character ZHA], Glyph 698 [Tibetan BrdaRten
    > Character ZA] and Glyph 713 [Tibetan BrdaRten Character AHA] in the proposal
    are
    > respectively the letters ZHA [U+0F5E], ZA [U+0F5F] and -A [U+0F60] with a dot
    > slightly right of centre over the top of the letter. I do not recognise this
    > dot-like mark, and the names given in Document N2558 do not explain what it
    > signifies. Can anyone enlighten me ?

    Though I confess that I am not familiar with these orthographies, the
    glyphs cited are cases of TIBETAN MARK TSA -PHRU [U+0F39] being affixed
    to letters ZHA, ZA, and -A, respectively. They would be encoded as
    <U+0F5E, U+0F39>, <U+0F5F, U+0F39> and <U+0F60, U+0F39>.

    I hope this is useful.

    New Year's greetings to all,

    Robert Chilton
    Technical Director
    The Asian Classics Input Project

    ------- End of forwarded message -------



    This archive was generated by hypermail 2.1.5 : Tue Jan 07 2003 - 06:41:08 EST