Re: Merging combining classes, was: New contribution N2676

From: Peter Kirk (
Date: Mon Oct 27 2003 - 17:50:58 CST

On 27/10/2003 10:31, Philippe Verdy wrote:

> ...
>The bad thing is that there's no way to say that a superfluous
>CGJ character can be "safely" removed if CC(char1) <= CC(char2),
>so that it will preserve the semantic of the encoded text even
>though such filtered text would not be canonically equivalent.
Philippe, you have some interesting ideas here and in your previous posting.

I wonder if it would be possible to define a character with combining
class zero which is automatically removed during normalisation when it
is superfluous, in the sense that you define here. Of course that means
a change to the normalisation algorithm, but one which does not cause
backward compatibility issues.

I guess what is more likely to be acceptable, as it doesn't require but
only suggests a change to the algorithm, is a character which can
optionally be removed, when superfluous, as a matter of canonical or
compatibility equivalence. If we call this character CCO, we can define
that a sequence <c1, CCO, c2> is canonically or compatibly equivalent to
<c1, c2> if cc(c1) <= cc(c2), or if either cc(c1) or cc(c2) = 0. I am
deliberately now not using CGJ as this behaviour might destabilise the
normalisation of current text using CGJ. But there would be no stability
impact if this is a new character.

The advantage of doing this is that a text could be generated with lots
of CCOs which could then be removed automatically if they are superfluous.

I am half feeling that there must be some objections to this, but it's
too late at night here to put my finger on them, so I will send this out
and see what response it generates.

Peter Kirk (personal) (work)

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST