RE: CGJ - Combining Class Override

From: Jony Rosenne (rosennej@qsm.co.il)
Date: Sat Oct 25 2003 - 21:09:39 CST


Sorry, Philippe, I had meant a separate character for a "right Meteg", not a
separate control character. Does this mean we agree?

Jony

> -----Original Message-----
> From: Philippe Verdy [mailto:verdy_p@wanadoo.fr]
> Sent: Saturday, October 25, 2003 5:58 PM
> To: Jony Rosenne
> Cc: unicode@unicode.org
> Subject: Re: CGJ - Combining Class Override
>
>
> From: "Jony Rosenne" <rosennej@qsm.co.il>
>
> > For the record, I repeat that I am not convinced that the CGJ is an
> > appropriate solution for the problems associated with the
> right Meteg.
> > I tend to think we need a separate character.
>
> Yes, it's possible to devize another character explicitly to
> override very precisely the ordering of combining classes.
> But this still does not change the problem, as all the
> existing NF* forms in existing documents using any past or
> present version of Unicode MUST remain in NF* form with
> further additions.
>
> If one votes for a separate control character, it should come
> with precise rules describing how such override can/must be
> used, so that we won't break existing implementations. This
> character will necessary have a combining class 0, but will
> still have a preceding context. Strict conformance for the
> new NF* forms must still obey to the precise ordering rules,
> and this character, whatever its form, shall not be used
> everytime it is not needed, i.e. when the existing
> NF* forms still produce the correct logical order (that's why
> its use should then be restricted to a list of known
> combining characters that may need this override).
>
> Call it <CCO> "Combining Class Override" ? This does not
> change the problem: this character should be used only
> between pairs of combining characters, such as the encoded sequence:
> {c1, CCO, c2}
> shall conform to the rules:
> (1) CC(c1) > CC(c2) > 0,
> (2) c1 is known (listed by Unicode?) to require this override
> to keep the logical ordering needed for correct text semantics.
>
> The second requirement should be made to avoid abuses of this
> character. But it is not enforceable if CGJ is kept for this function.
>
> The CCO character should then be made "ignorable" for
> collation or text breaks, so that collation keys will become:
> [ CK(c1), CK(c2) ] for {c1, CCO, c2}
> [ CK(c2), CK(c1) ] for {c2, c1} and {c1, c2} if normalized
>
> Legacy applications will detect a separate combining sequence
> starting at CCO, but newer applications will still know that
> both sequences are describing a single grapheme cluster.
>
> This knowledge should not be necessary except in grapheme
> renderers, or in some input methods that will allow users to
> enter:
> (1) keys <c2><c1> producing the normalized text {c2, c1}
> as before;
> (2) keys <c1><c2> producing the normalized text {c1, CCO, c2}
> instead of {c2, c1} as before;
> (3) optionally support a keystroke or selection system to swap
> combining characters.
>
> If this is too complex, the only way to manage the situation
> is to duplicate existing combining characters that cause this
> problem, and I think this may go even worse as this
> duplication may need to be combinatorial and require a lot of
> new codepoint assignments.
>
>
>



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST