Re: Variation selectors and vowel marks

From: Peter Kirk (
Date: Sat Apr 24 2004 - 19:00:18 EDT

  • Next message: Peter Constable: "RE: Common Locale Data Repository Project"

    On 24/04/2004 15:16, Ernest Cline wrote:

    >>[Original Message]
    >>From: Peter Kirk <>
    >>On 24/04/2004 11:22, Ernest Cline wrote:
    >>>As someone who has put a lot of thought into variation selectors, let me
    >>>point out something. In the case of B M1 M2 VS what would the variation
    >>>selector indicating as being varied if such a thing were to be allowed? ...

    I have re-read section 15.6 of the standard. It is absolutely clear that
    a VS applies only to the immediately preceding character, and not to a
    complete combining sequence:

    > A variation sequence, which always consists of a base character
    > followed by the variation selector,...

    There is no suggestion that more than a single character may precede the VS.

    >>>...Since variation selectors are combining marks, then just like any other
    >>>combining marks they should be viewed as being applied to the entire
    >>>combining sequence up to that point, and hence should be viewed as
    >>>indicating a variant of B M1 M2, and not of just the preceding mark. ...

    Whether or not this applies to other combining marks, it explicitly does
    not apply to VSs. Well, it is of course also explicit that any sequence
    of a combining mark followed by a VS is not sanctioned for standard use.

    >>>... Any other treatment complicates things too much.

    Some other treatment is clearly what the UTC had in mind.

    >>I always assumed that VS's are intended to apply to just the immediately
    >>preceding character, and not to a whole combining character sequence. In
    >>my opinion, "Any other treatment complicates things too much." But
    >>perhaps there are others who can tell us what the UTC intended for this.
    >Which is why as things currently stand, the standard calls for the only
    >sequences to involve base characters only. To quote from Section 15.6:
    >"The base character in a variation sequence is never a combining
    >character or a decomposable character. The variation selectors
    >themselves are combining marks of combining class 0 ..."
    >In order to get Variation Selectors even able to be applied to
    >other combining marks one would need to change the way
    >Variation Selectors work, and doing that is what would complicate
    >things too much.
    I agree that a change is necessary. I disagree that it would complicate
    things too much.

    >>>Thus in the case of the vowel marks, one could add a series of variation
    >>>sequences with one for each base character that the variant vowel
    >>>mark would be used with. If this causes too many other problems, ...
    >>It would indeed if someone considers that every such combining sequence
    >>has to be enumerated and defined individually. But if one simply says
    >>that every combining sequence containing e.g. the sequence <QAMATS, VS1>
    >>is legal and represents use of the variant qamats glyph, then there is
    >>no problem.
    >There are tons of problems once one adds in other combining marks
    >being applied to the character as well, because then under normalization,
    >unless the mark you were applying the variation selector to is of
    >combining class 0, you can't assure that the variation selector will
    >stay with the mark. Having the existing Variation Selectors behave
    >in that way would break the normalization stability guarantee, ...

    This is untrue. Normalisation stability does not apply when the text is
    changed, and inserting a variation selector is a change to the text. I
    have never suggested changing the combining class or other normalisation
    properties of existing VSs. The way to ensure that a VS stays with the
    mark it applies to is to ensure that in the part of the combining
    character sequence before the VS all combining characters are already in
    canonical order. Well, I can see that there are potential problems where
    there are canonical decompositions (which are not composition
    exclusions), but that does not apply to the cases I am interested in.

    >... so that
    >can't be done, so you would need to introduce new Variation
    >Selectors that would behave in this novel fashion.
    >In order to do so, under the existing combining class framework you
    >would need to add variation selectors with the same combining class
    >as the mark it works with. An alternative would be to add yet another
    >property for these new Variation Selectors so as to have it go outside
    >the existing canonical combining class rules when it comes to
    >canonical ordering.. Either way, it won't work properly with existing
    >implementations, involves a lot more work than adding another
    >vowel mark, and will not solve the problem of legacy data using the
    >vowel mark for both the main version and its variant. ...

    The former, VSs with various combining classes, would work perfectly
    well with existing implementations as soon as they have been updated
    with character data for these new characters. Adding a new mark has no
    advantage over this, as it also cannot be used until the character data
    is updated, and the disadvantage that (once the character data has been
    updated) the VS, being default ignorable, is simply ignored when a font
    which does not support it is used, whereas the new mark is supported
    only when it is included in a font. There will always be a legacy data
    problem, but the VS mechanism was defined precisely to minimise this
    problem, and as such it has the potential of minimising it for combining
    characters just as it does for base characters.

    >... I just don't
    >see the benefits justifying the costs. If there were a number of use
    >cases for doing this, it might justify the effort required, but for only
    >a couple of vowel marks, I can't see it.

    Well, it is more than a couple, and anyway I don't see the costs as
    being high. On the Hebrew list I listed yesterday six candidates for
    definition as variation sequences, each of one Hebrew combining mark
    plus a variation selector. Five of these sequences have the potential of
    solving an issue for which a proposal either has been made or is being
    considered, and for which the alternative would probably be to define a
    new character. (The sixth had apparently been rejected as too marginal:
    it probably doesn't merit a separate character but might be worth
    defining as a variation sequence.) So potentially we save five new
    characters by using either an already defined VS or a special one
    defined for Hebrew. I have just thought of a seventh possible sequence,
    although in this case the alternate glyph is already encoded as an
    alphabetic presentation form (U+FB1E). There is also the possibility of
    using VSs to indicate alternative pointing schemes. These are all in
    Hebrew. There may well be similar examples in other scripts - in fact I
    vaguely remember seeing that some texts (German black letter, I think)
    distinguish umlaut from diaeresis, and this is something which could be
    handled by a combining character VS (although here there are problems
    with normalisation composition). So this is potentially a large field!

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Sat Apr 24 2004 - 19:33:12 EDT