Re: Mongolian (was RE: Syriac and Mongolian joining behavior)

From: Kenneth Whistler (
Date: Tue Jan 04 2000 - 15:05:37 EST

Martin Heijdra asked:

> Since there have been questions re Mongolian, I thought I ask two questions I
> long have had:
> 1. I assume Kalmuck and Todd Mongolian are included in the Mongolian standard;
> but is Manchu (and Sibe) fully included? I can't see Manchu, with a huge
> written legacy, in any pipeline chart, but its treatment as an extended
> Mongolian might be the cause of that.

All of the known Mongolian extensions required to cover Sibe, Manchu, and
Ali Gali (Mongolian used to write Sanskrit texts) are included in the

See the UnicodeData.txt file and search for "SIBE", "MANCHU", "TODO", and
"ALI GALI" to find the extensions.

> 2. What would standard solution be for the Mongolian conundrum, that what
> otherwise would function as one and the same character (e.g. "o") in very rare
> circumstances has 3 different possible final glyphs (c.q. "foreign" o, "cut
> off" o, full round o); or alternatively, middle n which has in a few middle,
> pre-cononant cases non-predictable behavior.

These are covered by the use of the Mongolian "free variation selectors".
U+180B, U+180C, U+180D. The exact list of which combination of base letter
plus which free variation selector corresponds to which form can be derived
from the Mongolian Reference Table. Eventually we will get that worked up
as a Unicode Technical Report or otherwise as part of the standard, so that
Mongolian implementations will know how to use these variant marks.

> It seems in general that Mongolian is quite special in having rules of the
> type "in 99% of the cases, variant glyphs of one character are fully
> predictable, but in a very few cases they are not, yet not interchangeable.
> For a glyph-based approach this constitutes no problem (just give the user
> access to override the predicted glyph with another, separately encoded
> glyph), but in a Unicode character-based, glyph-rendered approach it actually
> creates quite unexpected problems: are such glyphs one or two characters? They
> are after all, in those few cases not predicatable but prescribed.

This is a known issue that the Chinese and Mongolian experts discussed at
great length. The three MSV characters were added as the most parsimonious
way to handle this problem in a standard way.

