From: Asmus Freytag (email@example.com)
Date: Mon Mar 27 2006 - 03:32:45 CST
On 3/25/2006 6:09 PM, Richard Wordingham wrote::
> At 00:15 +0000 2006-03-26, Richard Wordingham wrote:
>>> Does anyone care to expound the theory of variation selectors? There
>>> may be words in white in the TUS saying 'only for unifying CJK
>>> variants that the Chinese (or Japanese, especially with surnames)
>>> insist are different.'
> I have [read TUS]; or at least, I have read TUS 4.0 Section 15.6
> 'Variation Selectors'. Several times. (I can find no indication that
> it is different to TUS 4.1 Section 15.6.) I have the nagging feeling
> that I have missed something.
I don't know what you mean by theory of variation selectors. However, I
think it might be useful to summarize some of the facts that can be
gathered from reading TUS (and not only section 15.6)
and add some observations along the way:
Variation selectors work best when you have two shapes that can clearly
be substituted for each other in the majority of cases, but where there
are some (non-predictable) instances in which it is required to use only
one of them to the exclusion of the other.
Variation selectors are best considered a solution of last resort. It
would be inappropriate to have them occur very frequently, that's not
just because of the space they take up, but also because there will
always be implementations of processes that will not handle them
correctly (i.e. not ignore them).
So far, variation sequences have been *standardized* for Math and
Mongolian (apparently an "M" is required at the start of the name of the
writing system ;-).
For math, the variations allowed us to claim that certain minor shape
variations are not semantically meaningful, without having to prove that
proposition rigorously (by fully unifying the characters). [Rigorously
establishing unifications in math can verge on the impossible, because
the writing system is fundamentally open-ended.] At the same time, the
variation sequences allow mapping to existing entity sets and character
sets. So, in a way, they were primarily used to avoid creating
compatibility characters and the need to map between them. Instead, if
you just ignore the variation selector, the two base characters are
already the same character - no cross mapping needed.
For Mongolian, the FVS are needed to override the shaping mechanism in
unusual cases. Think of them as super ZWJ/ZWNJ just as Mongolian shaping
is Arabic-style shaping on steroids. By making the FVS script-specific,
we give additional context: Mongolian layout engines need to consider
them, practically all other processes ignore them (or let them pass
The role of variants in the CJK system is a particularly well-understood
one, and the variation selector mechanism models that understanding
directly, which, in a sense, can be considered a good thing. As there
may be many variants for each character, a major issue in the CJK
environment is cataloging - we eventually came to the conclusion that
standardization of variation sequences along the model outlined in
Section 15.6 is a futile exercise. UTS#37 provides a way to register
sets of variants.
UTC and WG2 have left open the future use of variation selectors. If
equally compelling needs arise (compared to the one I've summarized
above) then variation selectors could once again be part of the
solution. All things being equal, any solution that does not require
them, will be automatically preferred.
Hope you find this useful,
This archive was generated by hypermail 2.1.5 : Mon Mar 27 2006 - 03:35:56 CST