Re: Merging combining classes, was: New contribution N2676

From: Peter Kirk (peterkirk@qaya.org)
Date: Mon Oct 27 2003 - 11:09:51 CST


On 27/10/2003 08:45, Mark Davis wrote:

>>Thank you for the interesting thoughts. As I understand your suggestion,
>>and bearing in mind that dagesh (and the rare rafe) are also consonant
>>modifiers, you are effectively suggesting an order (already normalised):
>>
>>consonant dagesh rafe shin/sin-dot CGJ right-meteg CGJ vowel accent CGJ
>>vowel2 accent2
>>
>>with each element being optional, and CGJ being omitted when it is at
>>the beginning or the end of the string of combining marks, or doubled.
>>
>>This would, I think, work, and at least come close to being rendered
>>correctly with current fonts modified to ignore CGJ (which actually they
>>should do anyway as CGJ is default ignorable). The down side is the
>>
>>
>
>There are two very different cases that appear to be conflated by the above
>example.
>
>
The issue is not just one of rendering. See below.

>1. Current engines incorrectly rendering canonically equivalent text.
>
>If a rendering engine renders X Y Z correctly, but doesn't render a
>canonically-equivalent X Z Y correctly, then there is a problem in the engine.
>[Note: this would be for sequences X Y Z that would actually occur in practice.]
>
>Using CGJ for this would simply be a mechanism to get by current deficiencies in
>the engines.
>
>
No, it is more than this. It is also a mechanism to ensure that the
string X Y Z is collated as the string X Y Z, and that this string is
matched by a search for X Y, which is rather difficult if the canonical
order is actually X Z Y, and the Z can be three or more characters which
are moved into the middle of the string X Y.

Note the following required collation order:

X Z1
X Z2
...
X Zn
X Y1 Z1
X Y1 Z2
...
X Y1 Zn
X Y2 Z1
X Y2 Z2
...
X Y2 Zn

This collation is simple when the string is ordered like this. But
consider the problem of generating this collation when the strings are
canonically reordered as follow, and Z is an arbitrary combination of 12
different marks (I think the collation algorithm can do this only if
every possible Y Z combination is listed as a collation contraction):

X Z1
X Z2
...
X Zn
X Z1 Y1
X Z2 Y1
...
X Zn Y1
X Z1 Y2
X Z2 Y2
...
X Zn Y2

>2. Unicode not making a distinction between X Y Z and X Z Y.
>
>Where there are cases where canonically-equivalent X Y Z and X Z Y should be
>rendered differently, then CGJ could be used to preserve the distinction, as per
>the UTC decision:
>
>[96-C20] Consensus: Add text to Unicode 4.0.1 which points out that combining
>grapheme joiner has the effect of preventing the canonical re-ordering of
>combining marks during normalization. [L2/03-235, L2/03-236, L2/03-234]
>
>[96-A72] Action Item for Ken Whistler: Draft language for consensus 96-C20 (on
>the effect of combining grapheme joiner to prevent canonical re-ordering of
>combining marks during normalization) for inclusion into Unicode 4.0.1 and
>create a FAQ describing this effect as well. [L2/03-235, L2/03-236, L2/03-234]
>
>
Agreed. Has any text been drafted?

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/


This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST