Re: Variation selectors and vowel marks

From: Ernest Cline (ernestcline@mindspring.com)
Date: Sat Apr 24 2004 - 20:33:49 EDT

  • Next message: Michael Everson: "Re: Variation selectors and vowel marks"

    > [Original Message]
    > From: Peter Kirk <peterkirk@qaya.org>
    >
    > On 24/04/2004 15:16, Ernest Cline wrote:
    > >
    > >In order to get Variation Selectors even able to be applied to
    > >other combining marks one would need to change the way
    > >Variation Selectors work, and doing that is what would
    > >complicate things too much.
    >
    > I agree that a change is necessary. I disagree that it would
    > complicate things too much.
    >
    > >There are tons of problems once one adds in other combining marks
    > >being applied to the character as well, because then under normalization,
    > >unless the mark you were applying the variation selector to is of
    > >combining class 0, you can't assure that the variation selector will
    > >stay with the mark. Having the existing Variation Selectors behave
    > >in that way would break the normalization stability guarantee, ...
    >
    > This is untrue. Normalisation stability does not apply when the text is
    > changed, and inserting a variation selector is a change to the text. I
    > have never suggested changing the combining class or other normalisation
    > properties of existing VSs. The way to ensure that a VS stays with the
    > mark it applies to is to ensure that in the part of the combining
    > character sequence before the VS all combining characters are already in
    > canonical order. Well, I can see that there are potential problems where
    > there are canonical decompositions (which are not composition
    > exclusions), but that does not apply to the cases I am interested in.
    >
    > >... so that
    > >can't be done, so you would need to introduce new Variation
    > >Selectors that would behave in this novel fashion.
    > >
    > >In order to do so, under the existing combining class framework you
    > >would need to add variation selectors with the same combining class
    > >as the mark it works with. An alternative would be to add yet another
    > >property for these new Variation Selectors so as to have it go outside
    > >the existing canonical combining class rules when it comes to
    > >canonical ordering.. Either way, it won't work properly with existing
    > >implementations, involves a lot more work than adding another
    > >vowel mark, and will not solve the problem of legacy data using the
    > >vowel mark for both the main version and its variant. ...
    > >
    >
    > The former, VSs with various combining classes, would work perfectly
    > well with existing implementations as soon as they have been updated
    > with character data for these new characters. Adding a new mark has no
    > advantage over this, as it also cannot be used until the character data
    > is updated, and the disadvantage that (once the character data has been
    > updated) the VS, being default ignorable, is simply ignored when a font
    > which does not support it is used, whereas the new mark is supported
    > only when it is included in a font. There will always be a legacy data
    > problem, but the VS mechanism was defined precisely to minimise this
    > problem, and as such it has the potential of minimising it for combining
    > characters just as it does for base characters.

    There are problems. Suppose, we define a new variation selector that
    will stay with the preceding mark under normalization.

    Now consider what happens when implementations conforming to
    a standard of Unicode that does not know about the new character
    normalizes the sequence BC CM180 CM160 NVS
      BC = Base Character
     CM# = Combining Mark of ccc #
      NVS = New Variation Selector.

    As far as it knows, the new variation selector is an undefined character
    with a ccc of 0, so when normalizing this it will reorder it as:
    BC CM160 CM180 NVS
    Now lets have this "normalized" string be passed on to an
    implementation which knows about this NVS, There were two
    schemes I proposed for implementing this NVS. Both have problems,
    as I will point out below.

    One involves giving it the novel characteristic of ignoring
    the canonical combing classes and always sticking with the
    character. Under this scheme the NVS will stick with the CM180
    which means that the character sequence the implementation
    receives will not be the one originally intended. This problem
    is too severe to be ignored. This scheme would have made
    sense if it had been available from the start of Unicode,
    but to add it now would cause too many problems with the
    interoperability of data.

    The other, and the one which you preferred anyway, involves
    using different variation selectors for each combining class.
    At least with this solution, when an implementation that did know
    about the character encountered the data "normalized" by an
    unaware implementation, it would be able to renormalize it.
    Given the nature of Hebrew vowel points, where each existing
    point is its own one character canonical combining class,
    employing variation selectors with non-zero ccc's will
    require just as many new characters which if you want them
    in an appropriate block and have the Default Ignorable
    property in unaware implementations will require placing
    then in the SSP. Even then they will only be ignored
    by implementations conforming to Unicode 3.2 or later.

    Adding Variation Selectors with non-zero canonical
    combining classes is possible, but I fail to see the benefits
    from adding new Variation Selectors on the SSP outweighing
    the benefits of defining new vowel marks in the Hebrew
    block. It's not as if the Hebrew block does not have the
    space to add additional vowel points, and frankly,
    anything on Plane 0 is likelier to be implemented sooner
    and on a wider set of platforms..



    This archive was generated by hypermail 2.1.5 : Sat Apr 24 2004 - 21:01:03 EDT