Re: Merging combining classes, was: New contribution N2676

From: Philippe Verdy (
Date: Mon Oct 27 2003 - 20:06:52 CST

From: "Peter Kirk" <>

> Thanks for the clarification. In principle we might be able to go a
> little further: we could define both <c, CCO> and <CCO, c> as
> canonically equivalent to c for all c in combining class zero. This
> would have to be some kind of decomposition exception so that c is never
> decomposed by adding CCO before or after it. This would not remove CCO
> between two combining characters, so, if 0<c1<c2, <c1, c2> and <c1, CCO,
> c2> would remain not canonically equivalent while logically equivalent.
> In practice this would be a small price to pay as it is relevant only in
> the almost unique case of two vowels on one consonant which actually
> happen to be in canonical order.

Why that?

As CCO is not defined in any past versions, the stability pact does
not say that we must forbid its _removal_ when computing NFC or NFD
or NFKC or NFKD forms. It just says that we must _not insert_ it in a
source string <c1, c2> where c1 and c2 are already assigned.

So we are fine: we can define a canonical equivalence between
<c1, CCO, c2> and <c1, c2> where the later is simultaneously in
NFC, NFD, NFKC and NFKD forms, for all (c1, c2) pair such that
CC(c1)<=CC(c2) or CC(c2)=0.

But we cannot define it within the UCD, but algorithmically, like for
Hangul syllables/jamos...

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST