Re: Unicode Normalisaton Optimisation Experiments

From: Peter Kirk (peterkirk@qaya.org)
Date: Fri Sep 26 2003 - 07:37:47 EDT

Next message: Peter Kirk: "Re: Fun with proof by analogy, was Re: Mojibake on my Web pages"

Previous message: Marco Cimarosti: "RE: Internal Representation of Unicode"
In reply to: jon@spin.ie: "Re: Unicode Normalisaton Optimisation Experiments"
Next in thread: John Cowan: "Re: Unicode Normalisaton Optimisation Experiments"
Reply: John Cowan: "Re: Unicode Normalisaton Optimisation Experiments"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 26/09/2003 10:52, jon@spin.ie wrote:

> ...
>
>If there is a problem with this then it goes deeper than just NFC, but to the rules of how combining characters can or cannot be reordered, and the meaning that the resulting strings have. If there is a problem with that then the problem lies with those rules, rather than NFC which uses them.
>
>
>
Actually, in my opinion based on experience with problem combinations in
Hebrew and Arabic, the problem is not so much with the reordering rules
as with the way that some canonical decompositions and combining classes
have been inappropriately defined, and with the stability policy which
decrees that even the most obvious mistakes cannot be corrected.

The problem is that the definitions in the Unicode Standard conflict
with the stability policy. For example, from p.83 of TUS 4.0:

> D46 Combining class: A numeric value given to each combining Unicode
> character that
> determines with which other combining characters it typographically
> interacts.
> • See Section 4.3, Combining Classes—Normative, for information about
> the combining
> classes for Unicode characters.
> Characters have the same class if they interact typographically, and
> different classes if they
> do not.

This is simply untrue and so needs to be changed. I have well documented
examples from Hebrew and from Arabic of combining characters which do
not have the same combining class but do interact typographically.
(Unless D46 is read as a counter-intuitive definition of
"typographically interacts".) The obvious way of correcting this error,
to adjust the combining classes, is ruled out by the stability policy.
So the text of the standard, which can be changed in a new version,
needs to be changed to read something like:

Characters have the same class if according to the best information
available in 2001 (?) they were thought to interact typographically, and
different classes if they were thought not to.

Or else simply state that combining classes are assigned arbitrarily -
as also needs to happen with Unicode character names which similarly
contain uncorrectable errors.

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/

Next message: Peter Kirk: "Re: Fun with proof by analogy, was Re: Mojibake on my Web pages"
Previous message: Marco Cimarosti: "RE: Internal Representation of Unicode"
In reply to: jon@spin.ie: "Re: Unicode Normalisaton Optimisation Experiments"
Next in thread: John Cowan: "Re: Unicode Normalisaton Optimisation Experiments"
Reply: John Cowan: "Re: Unicode Normalisaton Optimisation Experiments"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Sep 26 2003 - 08:23:21 EDT