Re: Unicode Normalisaton Optimisation Experiments

From: jon@spin.ie
Date: Thu Sep 25 2003 - 07:32:32 EDT

Next message: Peter Kirk: "Re: Fun with proof by analogy, was Re: Mojibake on my Web pages"

Previous message: jon@spin.ie: "Re: need help understanding diacritical encoding"
Maybe in reply to: Jon Hanna: "Unicode Normalisaton Optimisation Experiments"
Next in thread: Peter Kirk: "Re: Unicode Normalisaton Optimisation Experiments"
Reply: Peter Kirk: "Re: Unicode Normalisaton Optimisation Experiments"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> Is this actually correct? For example, if I have in my data the string
> <U+0104, U+05B0> (which I know is garbage, but that is irrelevant), that
>
> will decompose and reorder to <U+0041, U+05B0, U+0328>, as U+05B0 has a
>
> higher combining class (202) than U+05B0 (10). What does this become in
> NFC? Is the reordering reversed and the combination reapplied?

First an attempt is made to compose U+0041 and U+05B0. There is no character allowing for this, so that attempt will fail. Then an attempt is made to compose U+0041 and U+0328 which will produce U+0104. U+0041 is replaced with U+0104 and U+0328 is removed resulting in <U+0104, U+05B0>.

It's not a reordering per se, as the first combining character is given the first "opportunity" to combine.

> This is not only a theoretical issue as the same applies to some real
> combinations. There was discussion only last week on the bidi list of a
> form which might be encoded <U+064A, U+0652, U+0654> but which would be
>
> messed up if composed into <U+0626, U+0652>.

Yes, NFC would perform that composition. Are you sure it would be an issue? Applying bidi rules doesn't seem to make this an issue.
<U+064A, U+0652, U+0654>
bidi: Al, NSM, NSM
applying rule W1 from USA9:
Al, NSM, NSM -> Al, Al, NSM -> Al, Al, Al.

<U+0626, U+0652>
bidi: Al, NSM
applying rule W1:
Al, NSM -> Al, Al

Or is the issue with something else, but it came up on the bidi list?

Next message: Peter Kirk: "Re: Fun with proof by analogy, was Re: Mojibake on my Web pages"
Previous message: jon@spin.ie: "Re: need help understanding diacritical encoding"
Maybe in reply to: Jon Hanna: "Unicode Normalisaton Optimisation Experiments"
Next in thread: Peter Kirk: "Re: Unicode Normalisaton Optimisation Experiments"
Reply: Peter Kirk: "Re: Unicode Normalisaton Optimisation Experiments"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Sep 25 2003 - 08:22:31 EDT