From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Wed Nov 24 2004 - 13:19:03 CST
On Wednesday, November 24th, 2004 16:26Z Tim Greenwood va escriure:
> All of the spacing combining marks (general category Mc) except
> musical symbols have a canonical combining class of 0.
> Why is this?
About the Indic vowel signs, I assume it is this way to avoid them being
reordered (in weird ways), particularly when there are multi piece vowels
involved.
> The Canonical
> Combining Class Values in UCD.html has entries and values for left
> attached and right attached - but no characters have these values.
They (the Indic vs) happen to have >0 class before v.2.1.8 (1998).
I believe UCD.html still reflects this past state.
For example, the accompagning README tells us:
Note that as of the 2.1.8 update of the Unicode Character Database,
the decompositions in the UnicodeData.txt file can be used to recursively
derive the full decomposition in canonical order, without the need
to separately apply canonical reordering. However, canonical reordering
of combining character sequences must still be applied in decomposition
when normalizing source text which contains any combining marks.
I assume it has to do with the work of TR15 that you might consult
(http://www.unicode.org/reports/tr15/tr15-10.html) for enlightment.
Antoine
This archive was generated by hypermail 2.1.5 : Thu Nov 25 2004 - 01:49:10 CST