Re: Merging combining classes, was: New contribution N2676

From: Peter Kirk (
Date: Tue Oct 28 2003 - 04:47:54 CST

On 27/10/2003 18:06, Philippe Verdy wrote:

>From: "Peter Kirk" <>
>>Thanks for the clarification. In principle we might be able to go a
>>little further: we could define both <c, CCO> and <CCO, c> as
>>canonically equivalent to c for all c in combining class zero. This
>>would have to be some kind of decomposition exception so that c is never
>>decomposed by adding CCO before or after it. This would not remove CCO
>>between two combining characters, so, if 0<c1<c2, <c1, c2> and <c1, CCO,
>>c2> would remain not canonically equivalent while logically equivalent.
>>In practice this would be a small price to pay as it is relevant only in
>>the almost unique case of two vowels on one consonant which actually
>>happen to be in canonical order.
>Why that?
>As CCO is not defined in any past versions, the stability pact does
>not say that we must forbid its _removal_ when computing NFC or NFD
>or NFKC or NFKD forms. It just says that we must _not insert_ it in a
>source string <c1, c2> where c1 and c2 are already assigned.
>So we are fine: we can define a canonical equivalence between
><c1, CCO, c2> and <c1, c2> where the later is simultaneously in
>NFC, NFD, NFKC and NFKD forms, for all (c1, c2) pair such that
>CC(c1)<=CC(c2) or CC(c2)=0.
>But we cannot define it within the UCD, but algorithmically, like for
>Hangul syllables/jamos...
My point here was that we might be able to do this within the existing
normalisation algorithm, or with a minor change to add decomposition
exclusions. I am not sure that I want to push a major change to
normalisation to support three character canonical equivalence, and I
would predict that we would find it hard to get it through the UTC for
such a marginal case. My simple two character composition <c, CCO> => c
and <CCO, c> => c, where cc(c)=0, is adequate for removal of the vast
majority of superfluous CCO's. And the only real issue is to ensure that
this composition is not reversed on decomposition.

Peter Kirk (personal) (work)

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST