Re: ZWJ, ZWNJ, CGJ and combination

From: Philippe Verdy (
Date: Mon Nov 10 2003 - 01:45:23 EST

  • Next message: John Hudson: "Re: Transliterating font"

    From: "Peter Kirk" <>

    > On 09/11/2003 14:55, Philippe Verdy wrote:
    > > ...
    > >
    > >And canonical normalization _guarantees_ to preserve *only* "starter
    > >sequences" (defective or not), but not necessarily "combining character
    > >sequences" (defective or not), and that's where care must be taken when
    > >encoding text...
    > >
    > >
    > >
    > >
    > Surely not. A combining character sequence consists of an optional base
    > character followed by one or more combining characters. Canonical
    > normalisation preserves the sequence of combining characters only,
    > although it may reorder this sequence. It also preserves without
    > reordering the juxtaposition of this seuqence to the optional base
    > character. Therefore the combining character sequence is preserved.

    That's where we differ:
    The combining character sequence differs from what I define a starter
    (1) by the fact it can contain more than one class 0 characters (starters),
    namely all class 0 combining characters (gc=Mn), and
    (2) by the fact that a combining character sequence cannot contain some
    class 0 characters (like unagreed PUAs controls and line/paragraph
    separators which are treated individually, but not as a combining character

    The second difference is less critical for us (what it does is that it
    creates occurences of defective combining character sequences in the middle
    of the text), but the first one is critical here...

    I still maintain that there's no terminology to designate what I call a
    starter sequence.

    This archive was generated by hypermail 2.1.5 : Mon Nov 10 2003 - 02:37:43 EST