From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Nov 22 2003 - 21:43:26 EST
I note that the following character is the only one in the 4.0.1 UCD which
is a combining character of class 0 and is however canonically decomposable
but not excluded from recomposition:
<0C48;TELUGU VOWEL SIGN AI;Mn;0;NSM;0C46 0C56;;;;N;;;;;>
Note that its Bidi behavior is also correctly NSM for non-spacing marks. Its
full canonical decomposition is also these two non-spacing marks:
<0C46;TELUGU VOWEL SIGN E;Mn;0;NSM;;;;N;;;;;>
<0C56;TELUGU AI LENGTH MARK;Mn;91;NSM;;;;N;;;;;>
Probably the <0C48;TELUGU VOWEL SIGN AI;Mn;0;...> should have been excluded
from composition, but this is now impossible due to normalized forms
stability.
Also I can't figure out why the Annex 7 of UAX#15 (normalization) do not
list these two canonical-starter vowel signs as <identifier_extend> instead
of <identifier_start> along with the four other indicated combining-like
letters, like <0EB3;LAO VOWEL SIGN AM;Lo;0;...>
I must have missed something about gc=Mn Telugu vowel signs (and with its
four gc=Mc ones: U, UU, vocalic R and RR), and why they are given general
categories in Mn instead of Mc in other Indic scripts, (The compaitiblity
interactions with ISCII seems quite strange with Telugu.)
Well I can use it the way it is defined, but I fear some problems here with
the unique feature of this recomposable character.
__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com
This archive was generated by hypermail 2.1.5 : Sat Nov 22 2003 - 22:28:50 EST