Merging combining classes, was: New contribution N2676

From: Peter Kirk (
Date: Sat Oct 25 2003 - 06:00:39 CST

On 25/10/2003 04:11, Philippe Verdy wrote:

>From: "Peter Kirk" <>
>>Have combining classes actually been defined for these characters?
>>This is of course exactly the same problem as with Hebrew vowel points
>>and accents, except that this time it applies to real living languages.
>>Perhaps it is time to do something about these combining classes which
>>conflict with the standard.
>Do you mean officially documenting the correct (and strict) use of CGJ as
>the only way to bypass the default order required by the combining classes
>in normalized forms? It would be a good idea to document officially which
>use of CGJ is superfluous and should be avoided in NF forms, and which use
>is required.
This isn't what I meant, but I agree that some such definition would be
a good idea.

What I had in mind was a probably hopeless plea for the wrongly assigned
combining classes to be corrected. After all, the current assignments
manifestly breach the standard, because marks with different classes
interact typographically.

I wonder if it would in fact be possible to merge certain adjacent
combining classes, as from a future numbered version N of the standard.
That would not affect the normalisation of existing text; text
normalised before version N would remain normalised in version N and
later, although not vice versa. I know that this would break the letter
of the current stability policy, but is this kind of backward
compatibility actually necessary? The change could be sold to others as
required for the internal consistency of Unicode.

If this were possible, the Hebrew and Arabic problem could be partly
solved, in a non-optimal way but one which is less messy than the
current situation. The idea would be for all Hebrew marks (i.e. all
combining marks in 05B0-05C2) to be merged into one combining class, and
similarly all Arabic harakat etc. including the new Arabic tone signs.
This would make significant the relative orderings of multiple vowels
(and meteg), and avoid the need for CGJ hacks. It would also allow the
logical order of shadda, dagesh and sin and shin dots to be the
canonical one, with significant advantages for collation etc as well as
for rendering.

Peter Kirk (personal) (work)

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST