Re: PRC asking for 956 precomposed Tibetan characters

From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Wed Jan 08 2003 - 04:47:37 EST

Next message: Michael Everson: "Re: Unicode Standards for Indic Scripts"

Previous message: Manoj Jain: "Unicode Standards for Indic Scripts"
Maybe in reply to: Andrew C. West: "Re: PRC asking for 956 precomposed Tibetan characters"
Next in thread: Andrew C. West: "Re: PRC asking for 956 precomposed Tibetan characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

------- Start of forwarded message -------

From: "Robert R. Chilton" <acip@well.com>
Date: Wed, 08 Jan 2003 00:16:35 -0500
Cc: unicode@unicode.org, tibex@unicode.org
Subject: Re: PRC asking for 956 precomposed Tibetan characters
To: "Andrew C. West" <andrewcwest@alumni.princeton.edu>

Andrew C. West wrote:
>
> On Tue, 07 Jan 2003 06:16:43 -0800 (PST), "Robert R. Chilton" wrote:
>
> > I understand your interest in preserving the semantic or lexical
> > distinction between an instance of a contracted series of single vowels
> > and a true usage of the double vowel. However, the procedure of
> > normalization is designed to collapse all the variant encodings for a
> > particular presentation form into a single, "normalized" encoding.
> > ...
> > Canonical combining classes are defined for combining characters (such
> > as macron and dot-under, or the vowel signs of Tibetan) in order to
> > support normalization of identical presentation forms to a single
> > encoding. So in the cases you cite, of "graphically identical but
> > semantically different" instances, consistency in searching, sorting,
> > etc. requires that all "graphically identical" presentation forms be
> > normalized to a single normalized encoding.
> >
>
> O.K. Your explanation of normalisation makes sense, and I'll change the
encoding
> of double and triple E and O vowel signs accordingly on my web pages. The only
> query I still have is why a triple E vowel sign should be normalised to
<U+0F7B,
> U+0F7A> rather than <U+0F7A, U+0F7B> ? What determines that the former sequence
> is better than the latter sequence ?
>
> Andrew

In the normalization process instances of a sequence of either two E
vowels or two O vowels may be normalized to double E vowel or double O
vowel. Thus, in a case of three E or three O vowels in sequence the
first two would be normalized to the double vowel with the single vowel
trailing.

Unfortunately, since the single and double vowel characters are assigned
the same canonical combining class of 130, a further step of processing
is required in order that any sequence of e.g., <U+0F7A, U+0F7B> be
normalized to <U+0F7B, U+0F7A>. So here again is a case where it would
be desirable to alter some of the canonical combining classes that have
been assigned to characters in the Tibetan block.

If it is indeed not possible to assign new canonical combining classes
to U+0F7B and U+0F7D then it may be preferrable to specify (or otherwise
obtain by processing) a canonical or compatibility decomposition for
these two characters as <U+0F7A, U+0F7A> and <U+0F7C, U+0F7C>,
respectively, and deprecate the use of these double-vowel characters.

Kind regards,
Robert

------- End of forwarded message -------

Next message: Michael Everson: "Re: Unicode Standards for Indic Scripts"
Previous message: Manoj Jain: "Unicode Standards for Indic Scripts"
Maybe in reply to: Andrew C. West: "Re: PRC asking for 956 precomposed Tibetan characters"
Next in thread: Andrew C. West: "Re: PRC asking for 956 precomposed Tibetan characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Jan 08 2003 - 05:38:19 EST