[hebrew] Re: variation selectors for combining characters (was: Hebrew composition model, with cantillation marks)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Nov 04 2003 - 10:14:27 EST

  • Next message: Jill Ramonsky: "RE: UTF-16 inside UTF-8"

    From: "Philippe Verdy" <verdy_p@wanadoo.fr>
    > All that can be done is to create a new variation selector for combining
    > characters. It could be created:
    > - either within a new generic set of variation selectors for combining
    > characters (noted CVSn here) to produce sequences like <HEBREW POINT
    > METEG><CVSn>;
    > - or as Hebrew specific variation selectors for Hebrew combining
    > characters (noted HVSn here); this would produce sequences like <HEBREW
    > POINT METEG><HEBREW HVSn> which should be treated at <HEBREW POINT METEG>
    by
    > renderers or collators that do not implement this variation selector.
    >
    > In either case, such types of variation selector sequences needed to
    > override the rendered position of the previous combining character should
    be
    > allowed only for registered sequences, like with other base characters
    with
    > known variants.

    I forgot also the problem caused by the normalization of combining sequences
    which would include such variation selector sequences. Such sequence would
    need to be stable across normalization and should be treated equivalently in
    all canonically equivalent order.

    Suppose that other diacritics are coded before METEG:
        <BASE(cc=0)><diacritic(cc=x)><METEG(cc=y)><HVS (cc=0)>
    - if x > y, then normalization will reorder it to:
        <BASE(cc=0)><METEG(cc=y)><diacritic(cc=x)><HVS (cc=0)>
    and so the HVS will not work as expected to create a glyph variant of the
    METEG, but of the other diacritic.
    - if x < y, then normalization will keep the order, but the second sequence
    will still be canonically equivalent to the first one.

    So what would be needed is a set of variation selectors for each possible
    combining class value, so that a CVSn character can remain stable and
    attached to the right combining character in all canonically equivalent
    strings.

    So this would require encoding the new variation selector with the SAME
    combining class as the one for METEG to which it applies.

    Of course we have plenty of space in special plane 14 to allocate them. But
    this decision is architectural (and could be used also as a easy way to
    extend other scripts, for example to represent variations of Latin accents,
    like the presentation of the cedilla/comma-above, or the rounded/angular
    form of the circumflex, or the 9-shaped/stroke-shaped appearance of the
    accute accent, if this ever has some distinctful meaning in a multilanguage
    environment).

    The other question is how many selectors will be needed: we have 256
    selectors for base characters, will we need 256 selectors for each possible
    combining class except class 0 (this would nearly fill a complete plane)?

    If such choice is not made, I don't see the interest of encoding a variation
    selector for Hebrew, and in fact it may be much more simple to encode a new
    <HEBREW POINT MEDIAL METEG> combining character (maybe with a compatibility
    decomposition to <HEBREW POINT METEG> if this helps producing at least a
    approximate rendering on legacy renderers).



    This archive was generated by hypermail 2.1.5 : Tue Nov 04 2003 - 10:59:44 EST