From: Andrew C. West (email@example.com)
Date: Tue Jan 07 2003 - 09:29:34 EST
On Tue, 07 Jan 2003 06:16:43 -0800 (PST), "Robert R. Chilton" wrote:
> I understand your interest in preserving the semantic or lexical
> distinction between an instance of a contracted series of single vowels
> and a true usage of the double vowel. However, the procedure of
> normalization is designed to collapse all the variant encodings for a
> particular presentation form into a single, "normalized" encoding.
> Canonical combining classes are defined for combining characters (such
> as macron and dot-under, or the vowel signs of Tibetan) in order to
> support normalization of identical presentation forms to a single
> encoding. So in the cases you cite, of "graphically identical but
> semantically different" instances, consistency in searching, sorting,
> etc. requires that all "graphically identical" presentation forms be
> normalized to a single normalized encoding.
O.K. Your explanation of normalisation makes sense, and I'll change the encoding
of double and triple E and O vowel signs accordingly on my web pages. The only
query I still have is why a triple E vowel sign should be normalised to <U+0F7B,
U+0F7A> rather than <U+0F7A, U+0F7B> ? What determines that the former sequence
is better than the latter sequence ?
This archive was generated by hypermail 2.1.5 : Tue Jan 07 2003 - 10:37:27 EST