Re: Variation selectors and vowel marks

From: Ernest Cline (ernestcline@mindspring.com)
Date: Sat Apr 24 2004 - 20:33:49 EDT

Next message: Michael Everson: "Re: Variation selectors and vowel marks"

Previous message: Mark Davis: "Re: Standardize TimeZone ID"
Maybe in reply to: jcowan@reutershealth.com: "Variation selectors and vowel marks"
Next in thread: Michael Everson: "Re: Variation selectors and vowel marks"
Reply: Michael Everson: "Re: Variation selectors and vowel marks"
Reply: Asmus Freytag: "Re: Variation selectors and vowel marks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> [Original Message]
> From: Peter Kirk <peterkirk@qaya.org>
>
> On 24/04/2004 15:16, Ernest Cline wrote:
> >
> >In order to get Variation Selectors even able to be applied to
> >other combining marks one would need to change the way
> >Variation Selectors work, and doing that is what would
> >complicate things too much.
>
> I agree that a change is necessary. I disagree that it would
> complicate things too much.
>
> >There are tons of problems once one adds in other combining marks
> >being applied to the character as well, because then under normalization,
> >unless the mark you were applying the variation selector to is of
> >combining class 0, you can't assure that the variation selector will
> >stay with the mark. Having the existing Variation Selectors behave
> >in that way would break the normalization stability guarantee, ...
>
> This is untrue. Normalisation stability does not apply when the text is
> changed, and inserting a variation selector is a change to the text. I
> have never suggested changing the combining class or other normalisation
> properties of existing VSs. The way to ensure that a VS stays with the
> mark it applies to is to ensure that in the part of the combining
> character sequence before the VS all combining characters are already in
> canonical order. Well, I can see that there are potential problems where
> there are canonical decompositions (which are not composition
> exclusions), but that does not apply to the cases I am interested in.
>
> >... so that
> >can't be done, so you would need to introduce new Variation
> >Selectors that would behave in this novel fashion.
> >
> >In order to do so, under the existing combining class framework you
> >would need to add variation selectors with the same combining class
> >as the mark it works with. An alternative would be to add yet another
> >property for these new Variation Selectors so as to have it go outside
> >the existing canonical combining class rules when it comes to
> >canonical ordering.. Either way, it won't work properly with existing
> >implementations, involves a lot more work than adding another
> >vowel mark, and will not solve the problem of legacy data using the
> >vowel mark for both the main version and its variant. ...
> >
>
> The former, VSs with various combining classes, would work perfectly
> well with existing implementations as soon as they have been updated
> with character data for these new characters. Adding a new mark has no
> advantage over this, as it also cannot be used until the character data
> is updated, and the disadvantage that (once the character data has been
> updated) the VS, being default ignorable, is simply ignored when a font
> which does not support it is used, whereas the new mark is supported
> only when it is included in a font. There will always be a legacy data
> problem, but the VS mechanism was defined precisely to minimise this
> problem, and as such it has the potential of minimising it for combining
> characters just as it does for base characters.

There are problems. Suppose, we define a new variation selector that
will stay with the preceding mark under normalization.

Now consider what happens when implementations conforming to
a standard of Unicode that does not know about the new character
normalizes the sequence BC CM180 CM160 NVS
BC = Base Character
CM# = Combining Mark of ccc #
NVS = New Variation Selector.

As far as it knows, the new variation selector is an undefined character
with a ccc of 0, so when normalizing this it will reorder it as:
BC CM160 CM180 NVS
Now lets have this "normalized" string be passed on to an
implementation which knows about this NVS, There were two
schemes I proposed for implementing this NVS. Both have problems,
as I will point out below.

One involves giving it the novel characteristic of ignoring
the canonical combing classes and always sticking with the
character. Under this scheme the NVS will stick with the CM180
which means that the character sequence the implementation
receives will not be the one originally intended. This problem
is too severe to be ignored. This scheme would have made
sense if it had been available from the start of Unicode,
but to add it now would cause too many problems with the
interoperability of data.

The other, and the one which you preferred anyway, involves
using different variation selectors for each combining class.
At least with this solution, when an implementation that did know
about the character encountered the data "normalized" by an
unaware implementation, it would be able to renormalize it.
Given the nature of Hebrew vowel points, where each existing
point is its own one character canonical combining class,
employing variation selectors with non-zero ccc's will
require just as many new characters which if you want them
in an appropriate block and have the Default Ignorable
property in unaware implementations will require placing
then in the SSP. Even then they will only be ignored
by implementations conforming to Unicode 3.2 or later.

Adding Variation Selectors with non-zero canonical
combining classes is possible, but I fail to see the benefits
from adding new Variation Selectors on the SSP outweighing
the benefits of defining new vowel marks in the Hebrew
block. It's not as if the Hebrew block does not have the
space to add additional vowel points, and frankly,
anything on Plane 0 is likelier to be implemented sooner
and on a wider set of platforms..

Next message: Michael Everson: "Re: Variation selectors and vowel marks"
Previous message: Mark Davis: "Re: Standardize TimeZone ID"
Maybe in reply to: jcowan@reutershealth.com: "Variation selectors and vowel marks"
Next in thread: Michael Everson: "Re: Variation selectors and vowel marks"
Reply: Michael Everson: "Re: Variation selectors and vowel marks"
Reply: Asmus Freytag: "Re: Variation selectors and vowel marks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Apr 24 2004 - 21:01:03 EDT