Re: Variation selectors and vowel marks

From: Ernest Cline (ernestcline@mindspring.com)
Date: Sun Apr 25 2004 - 00:01:05 EDT

Next message: Philippe Verdy: "Re: A locale@unicode.org mailing list ? (was: Standardize TimeZone ID)"

Previous message: Asmus Freytag: "Re: Standardize TimeZone ID"
Maybe in reply to: jcowan@reutershealth.com: "Variation selectors and vowel marks"
Next in thread: Peter Kirk: "Re: Variation selectors and vowel marks"
Reply: Peter Kirk: "Re: Variation selectors and vowel marks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: Asmus Freytag <asmusf@ix.netcom.com>
>
> At 05:33 PM 4/24/2004, Ernest Cline wrote:
> >There are problems. Suppose, we define a new variation selector that
> >will stay with the preceding mark under normalization.
> >
> >Now consider what happens when implementations conforming to
> >a standard of Unicode that does not know about the new character
> >normalizes the sequence BC CM180 CM160 NVS
> > BC = Base Character
> > CM# = Combining Mark of ccc #
> > NVS = New Variation Selector.
> >
> >As far as it knows, the new variation selector is an undefined character
> >with a ccc of 0, so when normalizing this it will reorder it as:
> >BC CM160 CM180 NVS
> >Now lets have this "normalized" string be passed on to an
> >implementation which knows about this NVS, There were two
> >schemes I proposed for implementing this NVS. Both have problems,
> >as I will point out below.
>
> No implementation supporting version X can normalize all data
> containing characters from a later release. If that was a requirement
> we could never add combining characters. What is required is that
> all later implementation normalize any data from version X
> the same way a version X implementation would have done and to
> not change already normalized data. I think it's strictly speaking
> the latter aspect that's guaranteed.

The best that a version can do with an unknown character is treat it
as a non-decomposable character with a ccc of 0. With the way
that normalization works, at worst it preserves it so that an
implementation that does know the character can correctly
normalize it. My point here was that adding a category of characters
that was tightly bound to the preceding character without using the
existing combining class mechanism would cause problems
for normalization that could not be avoided, and as such, it is
impossible to add variation selectors for combining marks
unless the variation selector for a combining mark is of the
same canonical combining class. That would cause any
proposal for such variation selectors to have to add variation
selectors for each canonical combining class, and thus
increase the cost of implementing such a proposal.

It might make sense to relax the restriction on allowable
variation sequences to include combining marks of class 0,
and maybe even to provide variation selectors for the two
big classes of combing characters, 220 and 230, given
that those two classes are far and away the largest non-0
classes at present and are likely to remain so.

Next message: Philippe Verdy: "Re: A locale@unicode.org mailing list ? (was: Standardize TimeZone ID)"
Previous message: Asmus Freytag: "Re: Standardize TimeZone ID"
Maybe in reply to: jcowan@reutershealth.com: "Variation selectors and vowel marks"
Next in thread: Peter Kirk: "Re: Variation selectors and vowel marks"
Reply: Peter Kirk: "Re: Variation selectors and vowel marks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Apr 25 2004 - 00:36:58 EDT