Re: PRC asking for 956 precomposed Tibetan characters

From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Tue Jan 07 2003 - 05:49:17 EST

Next message: Andrew C. West: "Re: PRC asking for 956 precomposed Tibetan characters"

Previous message: Andrew C. West: "Fwd: Re: PRC asking for 956 precomposed Tibetan characters"
Maybe in reply to: Andrew C. West: "Re: PRC asking for 956 precomposed Tibetan characters"
Next in thread: Andrew C. West: "Re: PRC asking for 956 precomposed Tibetan characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I've just realised that Robert's postings to the Unicode list are not getting
through, and so I'm forwarding the original message which I only excerpted in my
reply yesterday.

------- Start of forwarded message -------

From: "Robert R. Chilton" <acip@well.com>
Date: Sat, 04 Jan 2003 00:13:45 -0500
Cc: unicode@unicode.org, cfynn@gmx.net, tibex@unicode.org
Subject: Re: PRC asking for 956 precomposed Tibetan characters
To: "Andrew C. West" <andrewcwest@alumni.princeton.edu>

Andrew C. West wrote:
>
> ...
>
> Nevertheless, whether the Chinese proposal fails to include certain
> transliteration letters or obscure Sanskrit-usage stacks or special letters
used
> for writing Dzongkha (although as far as I know Dzongkha is just a dialect of
> Tibetan - or a separate language for political reasons - and written Dzongkha
is
> much the same as written Tibetan ... no doubt someone will correct me on this)
> is largely irrelevant. The proposal could easily be expanded to include the
> non-PRC usage letters, or a separate "Extended Brdarten" block could be
> proposed. The key point is that the existing Tibetan encoding model works just
> fine for all varieties of Tibetan, and there is simply no need for precomposed
> Tibetan characters.

I agree that the main objection to n2558 is that it is simply
unnecessary; the existing Tibetan encoding model is not only sufficient
but enables a far greater range of Tibetan-script orthography than the
character set proposed in n2558.

Moreover, for the authors of n2558 to argue that a non-combining model
of Tibetan is necessary for compatibility with "traditional education,
publication and electronic desktop publishing systems" to is to entirely
discount the use of other complex scripts --such as the Indic scripts
which employ a combining model-- in such "systems". Clearly, the
direction of such a rationale runs entirely opposite to the basic
principles of Unicode/ISO-10646.

> I've posted my analysis of document n2558, together with a table mapping the
> proposed glyphs to existing Unicode sequences, at
> <a
href="http://mail.alumni.princeton.edu/jump/http://uk.geocities.com/babelstone1357/Tibetan/brdarten.html">http://uk.geocities.com/babelstone1357/Tibetan/brdarten.html>

Although I have not yet had time to check through Andrew's table mapping
the proposed glyphs in n2558 to existing Unicode sequences, I can
respond to his observations, below.

> These are my main observations :
>
> 1. The proposal includes a single, apparently arbitrary, example of a consonant
> plus triple E vowel (Glyph 107) that is found only in Tibetan shorthand
> abbreviations, but many other consonant plus multiple vowel sign shorthand
> abbreviations that are frequently encountered in prayer flags and elsewhere are
> not covered by this proposal. (See
> <a
href="http://mail.alumni.princeton.edu/jump/http://uk.geocities.com/babelstone1357/Tibetan/shorthand.html">http://uk.geocities.com/babelstone1357/Tibetan/shorthand.html> for some
> illustrated examples of shorthand abbreviations.)

Such cases of triple (or quadruple) vowels E or O are best normalized to
double vowel plus single (or double) vowel to aid in collation and other
character data processing functions. Thus, Glyph 107 is best encoded as
(or normalized to) <U+0F41, U+0FB1, U+0F7B, U+0F7A>.

> 2. The proposal includes two examples of letters (KA and KHA) with a superfixed
> TIBETAN SIGN LCE TSA CAN [U+0F88] (Glyphs 029 and 100). This sign is most
> commonly used in Kalachakra literature, and there are presumably other
instances
> of its usage combined with different letters that are not covered by this
> proposal. I'm not entirely sure how these glyphs should be encoded using the
> existing Unicode character encoding model - I assume that the sign LCE TSA CAN
> [U+0F88] should be encoded immediately following the base consonant with which
> it is associated (i.e. <U+0F40, U+0F88> for Glyph 029 and <U+0F41, U+0F88> for
> Glyph 100). Please correct me if I'm wrong.
>
> 3. The proposal includes two examples of letters (PA and PHA) with a superfixed
> TIBETAN MARK PALUTA [U+0F85] (Glyphs 435 and Glyph 486). Presumably there are
> other instances of its usage combined with different letters that are not
> covered by this proposal. Again I'm not entirely sure how these glyphs should
be
> encoded using the existing Unicode character encoding model - I assume that the
> paluta [U+0F85] should be encoded immediately following the base consonant with
> which it is associated (i.e. <U+0F54, U+0F85> for Glyph 435 and <U+0F55,
U+0F85>
> for Glyph 486). Please correct me if I'm wrong.

Assuming that there have been no changes in the combining classes of
these characters since Unicode 3.0, the 2 characters <U+0F88> and
<U+0F89> are spacing, non-combining characters. Therefore, the only
possible encoding that will place the "base consonant" under these signs
(i.e., will result in these signs being "superfixed" to the letters KA,
KHA, PA, PHA, etal.) is for these characters to appear in the data
stream just prior to the "base consonant", such base consonant being
encoded in subjoined position. [It is not really correct to say that
"The Unicode Standard does not explicitly specify the coding sequence
for letters that are combined with any of the transliteration characters
U+0F88 through U+0F8B" since the combining class of the characters is
determinative.]

Thus, to encode Glyphs 029 and 100 use <U+0F88, U+0F90> and <U+0F88,
U+0F91>, respectively. Likewise, to encode Glyphs 435 and 486 use
<U+0F89, U+0FA4> and <U+0F89, U+0FA5>, respectively. Note that these
latter two glyphs are *NOT* a case of superfixed TIBETAN MARK PALUTA but
rather a case of superfixed TIBETAN SIGN MCHU CAN. The PALUTA has a
different function (of transliterating the Sanskrit apostrophe in
Tibetan script) and is not found in superfixed position. [Note also
that a naive reader might mistake the TIBETAN SIGN MCHU CAN for a
superfixed NYA, just as one might confuse the NYA and the PALUTA.]

> 4. Glyph 687 [Tibetan BrdaRten Character ZHA], Glyph 698 [Tibetan BrdaRten
> Character ZA] and Glyph 713 [Tibetan BrdaRten Character AHA] in the proposal
are
> respectively the letters ZHA [U+0F5E], ZA [U+0F5F] and -A [U+0F60] with a dot
> slightly right of centre over the top of the letter. I do not recognise this
> dot-like mark, and the names given in Document N2558 do not explain what it
> signifies. Can anyone enlighten me ?

Though I confess that I am not familiar with these orthographies, the
glyphs cited are cases of TIBETAN MARK TSA -PHRU [U+0F39] being affixed
to letters ZHA, ZA, and -A, respectively. They would be encoded as
<U+0F5E, U+0F39>, <U+0F5F, U+0F39> and <U+0F60, U+0F39>.

I hope this is useful.

New Year's greetings to all,

Robert Chilton
Technical Director
The Asian Classics Input Project

------- End of forwarded message -------

Next message: Andrew C. West: "Re: PRC asking for 956 precomposed Tibetan characters"
Previous message: Andrew C. West: "Fwd: Re: PRC asking for 956 precomposed Tibetan characters"
Maybe in reply to: Andrew C. West: "Re: PRC asking for 956 precomposed Tibetan characters"
Next in thread: Andrew C. West: "Re: PRC asking for 956 precomposed Tibetan characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Jan 07 2003 - 06:41:08 EST