Variant selectors in Mongolian

From: Martin Heijdra (
Date: Wed Jul 10 2002 - 11:26:39 EDT

A few years ago I asked about the way variant selectors are supposed to work
with Mongolian. In Unicode 3.2 there is an general explanation of variant
selectors, with a table of Mongolian variants. I must confess they left me
confused: it seems to me that the general explanation would point to one
solution which I would call intuitive, character-based (and, in the few
applications I have seen, existent), while the table would do it exactly the
other way around, and be more or less glyph-based.

Simply put, my question is: are the variant selectors to be used only when a
particular character is to be displayed with a glyph which is an exception
to the general rules of Mongolian writing, OR is the variant selector always
to be used with a particular glyph variant in a particular position, whether
that glyph is predictable or not?

To give an easy example (I suppose most (all?) cases would be similar):

In Mongolian, a medial "n" is regularly displayed with a dot before a vowel,
and without a dot before a consonant.The "n" in "ana" would be dotted (as
would be the "n" in initial "na"), the "n" in "anda" would not. A typical
Mongolian application would display those variants automatically, of course.
However, there are a few words/cases (foreign names, place names, or
actually grammar books when explaining Mongolian orthography etc.) where
this rule breaks down; for the ease of argument, let's say there is a word
"aNda" where the "n" would be dotted (I write "N" here for the unexpected
case). In a typical Mongolian application, the user would have to make a
special effort (different key/variant) to get at the right display. In
theory also an undotted "n" in "aNa" might occur.
(For some real examples, see where the capital
characters are used, as here, for irregular formations.)

Now, if "/" would be a sign of the variant selector, and "N" the sign of the
unexpected variant of "n", I would have expected the variant selector to be
used only in the unexpected cases, i.e., "N" would have the encoding "n-/".
Regular "ana" and "anda" would be unmarked (even if they display the "n"
with different glyphs), irregular "aNa" and "aNda" would be encoded
"a-n-/-a" and "a-n-/-d-a" (again, even if the "n-/" sequence would denote
different glyphs).

The statement "For example, in languages employing the Mongolian script,
sometimes a specific variant range of glyphs is needed for a specific
textual purpose for which the range of "generic" glyphs is considered
inappropriate" could be taken to mean this solution.

However, the Mongolian table is very glyph-based, and says "The valid
combinations are exhaustively listed and described in the following table."
It seems to imply that medial dotted "n" is ALWAYS denoted by "n-/" (as is
undotted initial "n"). That is, regular "ana" (dotted) would be "a-n-/-a",
regular "anda" would "a-n-d-a" (undotted), irregular "aNa" would be encode
"a-n-a" (undotted), and irregular "aNda" (dotted) would be "a-n-/-d-a". That
is, there would be regular formations marked with the variant selector, and
irregular ones unmarked.

Which of the two cases is meant by Unicode?

Martin Heijdra
Chinese Bibliographer
East Asian Library and the Gest Collection
Frist Campus Center, Room 317
Princeton University
33 Frist Campus Center
Princeton, NJ 08544 USA

This archive was generated by hypermail 2.1.2 : Wed Jul 10 2002 - 09:36:10 EDT