Re: Variant selectors in Mongolian

From: Kenneth Whistler (
Date: Wed Jul 10 2002 - 19:47:39 EDT

Martin Heijdra asked:

> The statement "For example, in languages employing the Mongolian script,
> sometimes a specific variant range of glyphs is needed for a specific
> textual purpose for which the range of "generic" glyphs is considered
> inappropriate" could be taken to mean this solution.


> However, the Mongolian table is very glyph-based, and says "The valid
> combinations are exhaustively listed and described in the following table."
> It seems to imply that medial dotted "n" is ALWAYS denoted by "n-/" (as is
> undotted initial "n"). That is, regular "ana" (dotted) would be "a-n-/-a",
> regular "anda" would "a-n-d-a" (undotted), irregular "aNa" would be encode
> "a-n-a" (undotted), and irregular "aNda" (dotted) would be "a-n-/-d-a". That
> is, there would be regular formations marked with the variant selector, and
> irregular ones unmarked.

No, I don't think that is the intent for Mongolian.

> Which of the two cases is meant by Unicode?

Mongolian variants *are* very confusing, and I'm not sure what the
best way to describe them is. Part of the problem is that there is
still some tension in the UTC regarding just how to define the affect
of the variation selectors.

Position A: A variation selector selects a particular, defined glyph.

That position would, for Mongolian, tend to support your second
interpretation. However, ...

Position B: A variation selector selects a variant form of a character,
which has a distinct rendering from that specified for the character
without a variant specification.

When applied to Mongolian (or in principle any script like Mongolian),
where a character is subject to positional shaping rules, you have
to consider that character X is associated with, for example, a
*set* of glyphs X -> {G1, G2, G3, G4} depending on positional contexts.
A variant of character X might be associated with a variant *set*
of glyphs, some of which could overlap, e.g. X-/ --> {G1, G2', G3', G4},
so that the glyphs for the variant might not contrast in all
positional (or other) contexts.

The reason the variation selectors were encoded in the first place
for Mongolian, I believe, was to try to preserve an Arabic-like
model, where the base character would get a character encoding,
and it would then be mapped to positionally determined glyphs.
But exceptional patterns of that positional determination required
some method of marking. The alternative which people saw would have
been to just encode all the glyphs: G1, G2, G2', G3, G3', G4, in
the above example -- and that approach would have radically departed
from the model of how Unicode should encode text. It also would
have significantly further complicated Mongolian text processing,
it seems to me, since distinct letters, in some positions, have
glyphic neutralizations. (Not that it is easy, anyhow!)


