Re: ZWJ, ZWNJ, CGJ and combination

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Nov 10 2003 - 08:29:26 EST

  • Next message: Philippe Verdy: "Re: Tamil 0BB3 and 0BD7"

    From: "Peter Kirk" <peterkirk@qaya.org>
    > This does not affect my argument. A combining character sequence, as
    > defined, does not perfectly fit your definition "an unordered set of
    > sequences of characters having the same combining class." But it is
    > preserved under canonical normalisation. Well, perhaps that depends what
    > you mean by "preserved". If you mean that its code point representation
    > is unchanged, that is not true your starter sequences either. If it
    > means that its semantics are unchanged, it is true by definition of any
    > string of Unicode characters that its semantics are unchanged by
    > canonical normalisation, or indeed by any transformation into a
    > canonically equivalent form.

    I did not say the opposite (that normalization could change semantics).
    But normalization does not work at the combining character sequence
    level but at the starter sequence level, and it bases the character identity
    on this model, that excludes sequences of non-defective starter
    sequences (such as non-defective combining sequences) of any semantic
    constraints. Normalization effectively ignores combining sequences which
    are only a part of the text.

    > >I still maintain that there's no terminology to designate what I call a
    > >starter sequence.
    > >
    > >
    > >
    > Agreed. But does it matter? It does so only if this is a meaningful unit
    > within Unicode. On my understanding, a sequence of combining characters
    > all of class >0 is meaningful because this is what canonical reordering
    > operates on. But such a sequence does not necessarily form a unit with
    > the preceding character.

    It does matter, because this causes ambiguity in the terminology used
    when speaking about normalization constraints, under the stability policy,
    and applies constraints on the way that combining sequences can be safely
    encoded (when this needs multiple starters, like a base character and CGJ).



    This archive was generated by hypermail 2.1.5 : Mon Nov 10 2003 - 08:58:34 EST