Re: Merging combining classes, was: New contribution N2676

From: Peter Kirk (peterkirk@qaya.org)
Date: Tue Oct 28 2003 - 04:47:54 CST


On 27/10/2003 18:06, Philippe Verdy wrote:

>From: "Peter Kirk" <peterkirk@qaya.org>
>
>
>
>>Thanks for the clarification. In principle we might be able to go a
>>little further: we could define both <c, CCO> and <CCO, c> as
>>canonically equivalent to c for all c in combining class zero. This
>>would have to be some kind of decomposition exception so that c is never
>>decomposed by adding CCO before or after it. This would not remove CCO
>>between two combining characters, so, if 0<c1<c2, <c1, c2> and <c1, CCO,
>>c2> would remain not canonically equivalent while logically equivalent.
>>In practice this would be a small price to pay as it is relevant only in
>>the almost unique case of two vowels on one consonant which actually
>>happen to be in canonical order.
>>
>>
>
>Why that?
>
>As CCO is not defined in any past versions, the stability pact does
>not say that we must forbid its _removal_ when computing NFC or NFD
>or NFKC or NFKD forms. It just says that we must _not insert_ it in a
>source string <c1, c2> where c1 and c2 are already assigned.
>
>So we are fine: we can define a canonical equivalence between
><c1, CCO, c2> and <c1, c2> where the later is simultaneously in
>NFC, NFD, NFKC and NFKD forms, for all (c1, c2) pair such that
>CC(c1)<=CC(c2) or CC(c2)=0.
>
>But we cannot define it within the UCD, but algorithmically, like for
>Hangul syllables/jamos...
>
>
>
My point here was that we might be able to do this within the existing
normalisation algorithm, or with a minor change to add decomposition
exclusions. I am not sure that I want to push a major change to
normalisation to support three character canonical equivalence, and I
would predict that we would find it hard to get it through the UTC for
such a marginal case. My simple two character composition <c, CCO> => c
and <CCO, c> => c, where cc(c)=0, is adequate for removal of the vast
majority of superfluous CCO's. And the only real issue is to ensure that
this composition is not reversed on decomposition.

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/


This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST