Re: Arab Ma[r]ks

From: Kenneth Whistler (
Date: Wed Jul 14 2010 - 13:43:32 CDT

  • Next message: Philippe Verdy: "RE: UTS#10 (collation) : French backwards level 2, and word-breakers."

    Arno Schmitt noted:

    > The marks in the Arabic bloc are not well organized;

    A well-known fact that has resulted from the prior legacy for
    Arabic encoding brought into Unicode, followed by twenty years
    of incremental encoding of additional marks, as evidence has
    been brought to bear and proposals for encoding have been made.

    > they belong to eleven mark classes, eight for marks above the
    > base character, three for marks below.
    > In Unicode logic marks within a class with a lower number should
    > be closer to the base character than those within a class with a
    > higher number.

    This is not true. The fact that it is not true essentially moots
    most of the argumentation that follows. For Arabic and Hebrew
    in particular, the history of canonical combining class assignments for
    marks has been complicated, but all of the "fixed position"
    combining class assignments were originally made in full knowledge
    that they did not (and could not) be used to untangle the mutual
    stacking and placement rules for harakat or other kinds of marks.
    The positioning of vowels and other marks for Arabic and Hebrew
    was assumed to be "fixed" by the layout rules of the script,
    which had to be implemented by rendering engines. The canonical combining
    classes themselves certainly do not force particular placement
    of marks with respect to each other.

    > Should we try to remedy this?

    It *cannot* be remedied.

    > Is there any software that uses the mark classes directly?

    Yes. Their *only* significant function is in the Canonical Ordering
    portion of the Unicode Normalization Algorithm. And because of
    the stability guarantees for normalization, no canonical combining class
    assignment for any character can be changed, once assigned:

    > Let's look at the position of all the marks irrespective for
    > their current Unicode mark class.

    [ Snipping the following interesting discussion about actual placement
    of marks in Arabic, which people may wish to comment on separately. ]


    This archive was generated by hypermail 2.1.5 : Wed Jul 14 2010 - 13:48:28 CDT