Re: Variation selectors and vowel marks

From: Ernest Cline (ernestcline@mindspring.com)
Date: Sun Apr 25 2004 - 00:01:05 EDT

  • Next message: Philippe Verdy: "Re: A locale@unicode.org mailing list ? (was: Standardize TimeZone ID)"

     From: Asmus Freytag <asmusf@ix.netcom.com>
    >
    > At 05:33 PM 4/24/2004, Ernest Cline wrote:
    > >There are problems. Suppose, we define a new variation selector that
    > >will stay with the preceding mark under normalization.
    > >
    > >Now consider what happens when implementations conforming to
    > >a standard of Unicode that does not know about the new character
    > >normalizes the sequence BC CM180 CM160 NVS
    > > BC = Base Character
    > > CM# = Combining Mark of ccc #
    > > NVS = New Variation Selector.
    > >
    > >As far as it knows, the new variation selector is an undefined character
    > >with a ccc of 0, so when normalizing this it will reorder it as:
    > >BC CM160 CM180 NVS
    > >Now lets have this "normalized" string be passed on to an
    > >implementation which knows about this NVS, There were two
    > >schemes I proposed for implementing this NVS. Both have problems,
    > >as I will point out below.
    >
    > No implementation supporting version X can normalize all data
    > containing characters from a later release. If that was a requirement
    > we could never add combining characters. What is required is that
    > all later implementation normalize any data from version X
    > the same way a version X implementation would have done and to
    > not change already normalized data. I think it's strictly speaking
    > the latter aspect that's guaranteed.

    The best that a version can do with an unknown character is treat it
    as a non-decomposable character with a ccc of 0. With the way
    that normalization works, at worst it preserves it so that an
    implementation that does know the character can correctly
    normalize it. My point here was that adding a category of characters
    that was tightly bound to the preceding character without using the
    existing combining class mechanism would cause problems
    for normalization that could not be avoided, and as such, it is
    impossible to add variation selectors for combining marks
    unless the variation selector for a combining mark is of the
    same canonical combining class. That would cause any
    proposal for such variation selectors to have to add variation
    selectors for each canonical combining class, and thus
    increase the cost of implementing such a proposal.

    It might make sense to relax the restriction on allowable
    variation sequences to include combining marks of class 0,
    and maybe even to provide variation selectors for the two
    big classes of combing characters, 220 and 230, given
    that those two classes are far and away the largest non-0
    classes at present and are likely to remain so.



    This archive was generated by hypermail 2.1.5 : Sun Apr 25 2004 - 00:36:58 EDT