Re: Variation selectors and vowel marks

From: Peter Kirk (
Date: Sun Apr 25 2004 - 16:03:29 EDT

  • Next message: Peter Kirk: "Re: [META] Should there be a separate public list for CLDR?"

    On 24/04/2004 21:01, Ernest Cline wrote:

    >... My point here was that adding a category of characters
    >that was tightly bound to the preceding character without using the
    >existing combining class mechanism would cause problems
    >for normalization that could not be avoided, and as such, it is
    >impossible to add variation selectors for combining marks
    >unless the variation selector for a combining mark is of the
    >same canonical combining class. That would cause any
    >proposal for such variation selectors to have to add variation
    >selectors for each canonical combining class, and thus
    >increase the cost of implementing such a proposal.

    Let us remember that problems arise with class 0 VSs only if preceded by
    more than one combining mark. So it would be possible to specify that
    VSs may be preceded by no more than one combining mark. Therefore, a
    base character with two combining marks, one of which has a variant
    glyph, must be encoded B CM2 VS CM1 - irrespective of the canonical
    order. This is stable under normalisation as the VS is class 0. Even if
    without the VS the canonical order is B CM1 CM2 (i.e. cc(CM2)>cc(CM1)),
    the sequences B CM1 CM2 VS and its unnormalised canonical equivalent B
    CM2 CM1 VS can be defined as illegal (just as at present any sequence of
    CM VS is illegal), and the sequence with the variant glyph would have to
    be B CM2 VS CM1. This avoids any problems with normalisation by defining
    the sequences which can be reordered as illegal. (There is a small
    problem when variants of BOTH combining marks are required, as B CM1 VS1
    CM2 VS2 and B CM2 VS2 CM1 VS1 are equivalent but not canonically
    equivalent. This could happen in Hebrew e.g. if a VS is used for dagesh
    hazaq as well as qamats qatan, but should be rare enough to be a
    marginal problem.)

    >It might make sense to relax the restriction on allowable
    >variation sequences to include combining marks of class 0,
    >and maybe even to provide variation selectors for the two
    >big classes of combing characters, 220 and 230, given
    >that those two classes are far and away the largest non-0
    >classes at present and are likely to remain so.
    In principle this makes sense. In practice it fails to solve the
    specific problem with Hebrew, because most of the combining marks which
    have variants are not in classes 220 or 230.

    Earlier, Ernest wrote:

    >Adding Variation Selectors with non-zero canonical
    >combining classes is possible, but I fail to see the benefits
    >from adding new Variation Selectors on the SSP outweighing
    >the benefits of defining new vowel marks in the Hebrew

    The benefits of using variation selectors rather than new code points in
    this case are exactly the same as those for variation selectors for base
    characters, as expressed in TUS section 15.3:

    > Occasionally the need arises in text processing to restrict or change
    > the set of glyphs that are to be used to represent a character. ... In
    > special circumstances, such a variation from the normal range of
    > appearance needs to be expressed side-by-side in the same document in
    > plain text contexts, where it is impossible or inconvenient to
    > exchange formatted text. ... The variation selectors are used when
    > characters have essentially the same semantic.

    > Variation selectors provide a mechanism for specifying a restriction
    > on the set of glyphs that are used to represent a particular
    > character. They also provide a mechanism for specifying variants ...
    > that have essentially the same semantic but substantially different
    > ranges of glyphs.

    I accept that there is some continuing debate (for which the Hebrew list
    is the proper place) over whether the particular variant characters I
    have in mind do "have essentially the same semantic". But in principle
    these conditions may be true of combining characters just as much as of
    base characters. And so the reasons for which VSs are defined for base
    characters are just as valid for combining characters.

    As for the new variant selectors being in the SSP, is this actually
    necessary, or could they be in the Hebrew block, space permitting? After
    all, if we are talking about VSs with the fixed combining classes of
    Hebrew points, they are useful only with Hebrew script.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Sun Apr 25 2004 - 16:32:20 EDT