Re: Variation Selectors

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Thu Mar 30 2006 - 16:38:09 CST

  • Next message: James Kass: "Re: Malayalam vowel sign AU (was Re: Representative glyphs for combining kannada sign)"

    On 3/30/2006 1:58 AM, Andrew West wrote:
    > On 29/03/06, Kent Karlsson <kent.karlsson14@comhem.se> wrote:
    >
    >>> The sort of case I am thinking of is that in which a letter L may have
    >>> two contextual forms, L1 and L2 which are selected in different
    >>> contexts (e.g. L1 before one set of vowels and L2 before another set
    >>> of vowels). However, when writing a foreign word L2 is always used,
    >>> regardless of context.
    >>>
    >> You are convincing me even more that these variants should have
    >> been encoded as separate characters, that should have separate
    >> shaping properties.
    >>
    >
    > There are some things about the Mongolian encoding model that I really
    > do not like, and which I think go against Unicode's fundamental
    > encoding principles, but variation selectors are not one of them. Your
    > suggestion of encoding contextual glyph variants separately goes
    > against both the character-glyph model and the Mongolian's own sense
    > of what letters their script is composed of.
    >
    That defines the issue very clearly.
    > Just to reiterate, variation selectors for Mongolian are used sparsely
    > in ordinary running text as the rendering system can select the
    > correct glyph form of a letter from context in most cases, and the
    > user (or IME) only needs to enter a VS when the context is ambiguous
    > or needs to be overridden.
    >
    This is no different from the use of ZWNJ in Persian to get a
    disconnected shape. The character
    glyph model correctly distinguishes between underlying entities
    (characters) and their shapes
    (glyphs), whether these shapes are selected by font switching, or by
    complex shaping algorithm.

    Algorithms other than display, tend to process text based on the
    underlying entities, not their
    surface appearance; therefore, using variation selectors (or other
    modifiers, such as ZWNJ)
    allows those algorithms to arrive at the correct results without costly
    mapping tables, but by
    simply treating the modifier as transparent.

    If the greek small letter final sigma had been realized with some sort
    of modifier, either a VS
    or perhaps even ZW(N)J, instead of being encoded separately, it would
    be possible to
    round-trip words between lower and upper case.

    Now, for Greek, where the history of computer implementations is firmly
    rooted in 8-bit,
    single language implementations, that would have meant dragging a lot of
    complexity into
    editing and rendering for a single character.

    (There's also the issue that this shape has been used contrastively in
    various notations that
    simply plundered the Greek type case, which makes a VS based approach
    less useful).

    For Mongolian, which not only has many characters affected by this
    issue, but already
    has one of the most complex shaping algorithms and needed de-novo
    implementations
    of rendering software in any case, those arguments don't apply and FVS
    is much preferable over burdening all the other algorithms with
    unnecessary complexity.

    In fact the FVS solution makes it possible for most generic
    implementation of algorithms
    to handle Mongolian data w/o having to carry a mapping table.
    > Incidentally, there are a couple of cases for Mongolian where
    > variation selectors are used to select simple glyph variants, which I
    > agree should better have been encoded as separate characters:
    >
    > U+1880 MONGOLIAN LETTER ALI GALI ANUSVARA ONE
    > U+1881 MONGOLIAN LETTER ALI GALI VISARGA ONE
    >
    > In fact, I think that the spurious "ONE" in the names of these
    > characters must be a relic of an early draft which included MONGOLIAN
    > LETTER ALI GALI ANUSVARA ONE, MONGOLIAN LETTER ALI GALI ANUSVARA TWO,
    > MONGOLIAN LETTER ALI GALI VISARGA ONE and MONGOLIAN LETTER ALI GALI
    > VISARGA TWO (just my hypothesis, but if Ken or anyone can confirm or
    > deny it ...).
    >
    I can't confirm that, but this reasoning is plausible. These are not of
    the same nature
    as the overridden automatic shapes.

    >> It's not really too late yet, I think, to deprecate
    >> the FVSs..
    >>
    >
    > Well, yes it is.
    >
    >
    I would have to firmly agree with Andrew on this conclusion. For the use
    in overriding
    automatic shaping, there's no way that we would deprecate the FVS.

    However, we have never ruled out adding additional character codes in
    cases where
    there is a *semantic* difference between two variant shapes. For
    example, should
    some of the mathematical variants become used in a (future) development
    of notation
    such that they acquire strongly contrastive meaning, we reserve the
    right to add a
    character code with the same representative shape as was previously
    covered by
    a variation sequence. This flexibility is necessary to guarantee that
    algorithms can
    continue to *ignore* VS (and FVS).

    Users relying on strong semantic differentiation would therefore need an
    actual character
    code, and we need to affirm our right to be able to accommodate future
    users in that
    regard. However, existing documents would simply continue to display
    correctly,
    but for them, the alternate shape would not carry the same semantic
    distinction.

    Ordinary variation selectors are a solution to a coding problem: what to
    do when the
    nature of the use of glyph variants is not well known, or ill-defined:
    they allow
    the (possibly) stylistic difference to be marked in the text, in case
    that there is
    a semantic difference to the reader.

    This is not the same problem as overriding automatic shaping, which is
    the primary
    role of the FVS. Here, there is normally no semantic difference, but
    there may be
    an orthographic distinction (foreign words). In these cases *some* Mongolian
    specific algorithms (spell-checkers, etc.) would need to process the
    FVS, while
    many others (sorting, etc.) would have no need.

    A./



    This archive was generated by hypermail 2.1.5 : Thu Mar 30 2006 - 16:40:32 CST