Telugu U+0C48, UAX#15 decomposition/canonical recomposition, and identifier-start exclusions

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Nov 22 2003 - 21:43:26 EST

Next message: Jungshik Shin: "Re: Ternary search trees for Unicode dictionaries"

Previous message: John Delacour: "Re: Digest doesn't display unicode properly?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I note that the following character is the only one in the 4.0.1 UCD which
is a combining character of class 0 and is however canonically decomposable
but not excluded from recomposition:

<0C48;TELUGU VOWEL SIGN AI;Mn;0;NSM;0C46 0C56;;;;N;;;;;>

Note that its Bidi behavior is also correctly NSM for non-spacing marks. Its
full canonical decomposition is also these two non-spacing marks:

<0C46;TELUGU VOWEL SIGN E;Mn;0;NSM;;;;N;;;;;>
<0C56;TELUGU AI LENGTH MARK;Mn;91;NSM;;;;N;;;;;>

Probably the <0C48;TELUGU VOWEL SIGN AI;Mn;0;...> should have been excluded
from composition, but this is now impossible due to normalized forms
stability.

Also I can't figure out why the Annex 7 of UAX#15 (normalization) do not
list these two canonical-starter vowel signs as <identifier_extend> instead
of <identifier_start> along with the four other indicated combining-like
letters, like <0EB3;LAO VOWEL SIGN AM;Lo;0;...>

I must have missed something about gc=Mn Telugu vowel signs (and with its
four gc=Mc ones: U, UU, vocalic R and RR), and why they are given general
categories in Mn instead of Mc in other Indic scripts, (The compaitiblity
interactions with ISCII seems quite strange with Telugu.)

Well I can use it the way it is defined, but I fear some problems here with
the unique feature of this recomposable character.

__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com

Next message: Jungshik Shin: "Re: Ternary search trees for Unicode dictionaries"
Previous message: John Delacour: "Re: Digest doesn't display unicode properly?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Nov 22 2003 - 22:28:50 EST