From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Jul 14 2010 - 13:43:32 CDT
Arno Schmitt noted:
> The marks in the Arabic bloc are not well organized;
A well-known fact that has resulted from the prior legacy for
Arabic encoding brought into Unicode, followed by twenty years
of incremental encoding of additional marks, as evidence has
been brought to bear and proposals for encoding have been made.
> they belong to eleven mark classes, eight for marks above the
> base character, three for marks below.
> In Unicode logic marks within a class with a lower number should
> be closer to the base character than those within a class with a
> higher number.
This is not true. The fact that it is not true essentially moots
most of the argumentation that follows. For Arabic and Hebrew
in particular, the history of canonical combining class assignments for
marks has been complicated, but all of the "fixed position"
combining class assignments were originally made in full knowledge
that they did not (and could not) be used to untangle the mutual
stacking and placement rules for harakat or other kinds of marks.
The positioning of vowels and other marks for Arabic and Hebrew
was assumed to be "fixed" by the layout rules of the script,
which had to be implemented by rendering engines. The canonical combining
classes themselves certainly do not force particular placement
of marks with respect to each other.
> Should we try to remedy this?
It *cannot* be remedied.
> Is there any software that uses the mark classes directly?
Yes. Their *only* significant function is in the Canonical Ordering
portion of the Unicode Normalization Algorithm. And because of
the stability guarantees for normalization, no canonical combining class
assignment for any character can be changed, once assigned:
http://www.unicode.org/policies/stability_policy.html#Normalization
> Let's look at the position of all the marks irrespective for
> their current Unicode mark class.
[ Snipping the following interesting discussion about actual placement
of marks in Arabic, which people may wish to comment on separately. ]
--Ken
This archive was generated by hypermail 2.1.5 : Wed Jul 14 2010 - 13:48:28 CDT