Re: Merging combining classes, was: New contribution N2676

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Oct 27 2003 - 12:31:37 CST


From: "Peter Kirk" <peterkirk@qaya.org>

> I don't see any difference between your proposed generic CCO and CGJ. As
> you say, the same function may be needed in several scripts, including
> perhaps IPA which uses complex diacritic stacking. So why not simply use
> CGJ?

Why not effectively, but the current CGJ usage rules are still relaxed.
I just note the recent consensus adopted for Unicode 4.0.1 on its
usability to prevent canonical reordering of logically encoded text.
This is new, as states Mark Davis:

> [96-C20] Consensus: Add text to Unicode 4.0.1 which points out
> that combining grapheme joiner has the effect of preventing the
> canonical re-ordering of combining marks during normalization.
> [L2/03-235, L2/03-236, L2/03-234]

I had not noticed this in the recently published minutes from the
last Unicode meeting. When will this note be published officially?
just after the meeting next week, if it approves the exact terms
of the proposed note?

Will this note be published along with strong warnings about its
usage? My opinion is that the usage rule should say that any
text that contains this fragment:
    {char1, CGJ, char2}
should only occur provided that:
- both <char1> and <char2> are combining characters:
    CC(char1) > 0, CC(char2) > 0,
- and
    CC(char1) > CC(char2)

The bad thing is that there's no way to say that a superfluous
CGJ character can be "safely" removed if CC(char1) <= CC(char2),
so that it will preserve the semantic of the encoded text even
though such filtered text would not be canonically equivalent.



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST