Re: BOM as WJ?

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Nov 19 2003 - 19:26:03 EST

Next message: Markus Scherer: "Re: Ternary search trees for Unicode dictionaries"

Previous message: Frank Yung-Fong Tang: "Re: creating a test font w/ CJKV Extension B characters."
In reply to: Philippe Verdy: "Re: BOM as WJ?"
Next in thread: Peter Kirk: "Re: BOM as WJ?"
Reply: Peter Kirk: "Re: BOM as WJ?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "Philippe Verdy" <verdy_p@wanadoo.fr>
> So, <NBSP,CC> must not be treated as if it was:
> <WJ,SP,WJ,CC>
> but really rather as:
> <WJ,SP,CC,WJ>
> Note here the inversion.

The inversion here acts as if WJ was a combining character of combining
class 256 (i.e. with a class higher than the combining class of all other
"Mn" combining characters) and a canonical reordering was applied to the
sequence.

Of course this is not a standard normalization form, but using this pseudo
combining class may help render the last two coded strings (in my quote
above) equivalently in renderers.
This works even in the case where there are multiple diacritics (noted CC1
and CC2 below):
    <NBSP,CC1,CC2>
is then treated as if it was:
    <WJ,SP,WJ,CC1,CC2>
and then the pseudo-normalization had given:
    <WJ,SP,CC1,CC2,WJ>
or:
    <WJ,SP,CC2,CC1,WJ>
(depending on the canonical reordering of CC1 and CC2, i.e. of their
relative combining class)

Next message: Markus Scherer: "Re: Ternary search trees for Unicode dictionaries"
Previous message: Frank Yung-Fong Tang: "Re: creating a test font w/ CJKV Extension B characters."
In reply to: Philippe Verdy: "Re: BOM as WJ?"
Next in thread: Peter Kirk: "Re: BOM as WJ?"
Reply: Peter Kirk: "Re: BOM as WJ?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Nov 19 2003 - 20:10:10 EST