Variation Selectors

From: Asmus Freytag (
Date: Mon Mar 27 2006 - 03:32:45 CST

  • Next message: Antoine Leca: "Re: DIY OpenType Re-ordering"

    On 3/25/2006 6:09 PM, Richard Wordingham wrote::
    > At 00:15 +0000 2006-03-26, Richard Wordingham wrote:
    >>> Does anyone care to expound the theory of variation selectors? There
    >>> may be words in white in the TUS saying 'only for unifying CJK
    >>> variants that the Chinese (or Japanese, especially with surnames)
    >>> insist are different.'
    > I have [read TUS]; or at least, I have read TUS 4.0 Section 15.6
    > 'Variation Selectors'. Several times. (I can find no indication that
    > it is different to TUS 4.1 Section 15.6.) I have the nagging feeling
    > that I have missed something.
    > Richard.
    I don't know what you mean by theory of variation selectors. However, I
    think it might be useful to summarize some of the facts that can be
    gathered from reading TUS (and not only section 15.6)
    and add some observations along the way:

    Variation selectors work best when you have two shapes that can clearly
    be substituted for each other in the majority of cases, but where there
    are some (non-predictable) instances in which it is required to use only
    one of them to the exclusion of the other.

    Variation selectors are best considered a solution of last resort. It
    would be inappropriate to have them occur very frequently, that's not
    just because of the space they take up, but also because there will
    always be implementations of processes that will not handle them
    correctly (i.e. not ignore them).

    So far, variation sequences have been *standardized* for Math and
    Mongolian (apparently an "M" is required at the start of the name of the
    writing system ;-).

    For math, the variations allowed us to claim that certain minor shape
    variations are not semantically meaningful, without having to prove that
    proposition rigorously (by fully unifying the characters). [Rigorously
    establishing unifications in math can verge on the impossible, because
    the writing system is fundamentally open-ended.] At the same time, the
    variation sequences allow mapping to existing entity sets and character
    sets. So, in a way, they were primarily used to avoid creating
    compatibility characters and the need to map between them. Instead, if
    you just ignore the variation selector, the two base characters are
    already the same character - no cross mapping needed.

    For Mongolian, the FVS are needed to override the shaping mechanism in
    unusual cases. Think of them as super ZWJ/ZWNJ just as Mongolian shaping
    is Arabic-style shaping on steroids. By making the FVS script-specific,
    we give additional context: Mongolian layout engines need to consider
    them, practically all other processes ignore them (or let them pass

    The role of variants in the CJK system is a particularly well-understood
    one, and the variation selector mechanism models that understanding
    directly, which, in a sense, can be considered a good thing. As there
    may be many variants for each character, a major issue in the CJK
    environment is cataloging - we eventually came to the conclusion that
    standardization of variation sequences along the model outlined in
    Section 15.6 is a futile exercise. UTS#37 provides a way to register
    sets of variants.

    UTC and WG2 have left open the future use of variation selectors. If
    equally compelling needs arise (compared to the one I've summarized
    above) then variation selectors could once again be part of the
    solution. All things being equal, any solution that does not require
    them, will be automatically preferred.

    Hope you find this useful,

    This archive was generated by hypermail 2.1.5 : Mon Mar 27 2006 - 03:35:56 CST