From: Philippe Verdy (email@example.com)
Date: Mon Nov 10 2003 - 01:45:23 EST
From: "Peter Kirk" <firstname.lastname@example.org>
> On 09/11/2003 14:55, Philippe Verdy wrote:
> > ...
> >And canonical normalization _guarantees_ to preserve *only* "starter
> >sequences" (defective or not), but not necessarily "combining character
> >sequences" (defective or not), and that's where care must be taken when
> >encoding text...
> Surely not. A combining character sequence consists of an optional base
> character followed by one or more combining characters. Canonical
> normalisation preserves the sequence of combining characters only,
> although it may reorder this sequence. It also preserves without
> reordering the juxtaposition of this seuqence to the optional base
> character. Therefore the combining character sequence is preserved.
That's where we differ:
The combining character sequence differs from what I define a starter
(1) by the fact it can contain more than one class 0 characters (starters),
namely all class 0 combining characters (gc=Mn), and
(2) by the fact that a combining character sequence cannot contain some
class 0 characters (like unagreed PUAs controls and line/paragraph
separators which are treated individually, but not as a combining character
The second difference is less critical for us (what it does is that it
creates occurences of defective combining character sequences in the middle
of the text), but the first one is critical here...
I still maintain that there's no terminology to designate what I call a
This archive was generated by hypermail 2.1.5 : Mon Nov 10 2003 - 02:37:43 EST