From: Peter Kirk (email@example.com)
Date: Sat Apr 24 2004 - 11:24:25 EDT
On 24/04/2004 06:30, Peter Constable wrote:
>If data is always encoded in canonical order, then having a VS within
>the combining mark sequence wouldn't create any normalization problems,
>that's true. But you well know that people do not want their Hebrew data
>in canonical order. Even if they did, it couldn't be guaranteed.
Yes, canonical ordering cannot be guaranteed. But ordering rules can be
specified, and departures from them treated as spelling errors. I can't
help thinking that it would have been much simpler for everybody if
Unicode had simply done that rather than permitting canonical
reordering; but that is obviously a battle already lost.
>There's a problem not only in cases of the form B M1 M2 VS, but also in
>cases of the form B M1 VS M2. Of course, the issues are different. The
>first may normalize to B M2 M1 VS; the second perhaps *ought* to
>normalize to B M2 M1 VS, but that won't happen.
Well, perhaps the best thing here is to specify that the mark to which
the VS applies should always come first after the base character and
followed by the VS, irrespective of the normal canonical order. At least
that would be unambiguous, and stable under normalisation (since the
only relevant precomposed characters are composition exceptions). Other
orderings should simply be considered spelling errors.
-- Peter Kirk firstname.lastname@example.org (personal) email@example.com (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Sat Apr 24 2004 - 11:56:32 EDT