From: Martin Heijdra (firstname.lastname@example.org)
Date: Mon Dec 16 2002 - 09:33:10 EST
A small group has been working on these and other questions for a while now,
after the last group of questions raised on Mongolian on this list. I will
get in contact with you separately with some of our work.
For the moment, in short: yes, use the TR170 document, especially its
detailed examples (which are fuller than the textual explanations, and have
implications not explicitly stated); there is a Chinese book called
Mengguwen bianma which at parts is fuller and more explicit. There are still
some rare cases not covered by either.
----- Original Message -----
From: "Andrew C. West" <andrewcwest@alumni.Princeton.EDU>
Sent: Monday, December 16, 2002 8:40 AM
Subject: Mongolian Encoding
> As promised, here are some questions on the encoding of Mongolian that
> arisen whilst writing an input method for the Mongolian script (the
> are relevant to the Todo, Manchu and Sibe scripts as well, but I'll
> myself to Mongolian for the moment). I don't know if anyone is able to
> all of my questions, but I hope that someone on the list will be able to
> some much needed advice.
> 1. Documentation
> Section 11.4 of the Unicode Standard notes that a group of experts from
> Mongolia, China and the West are to publish a document called "User's
> for System Implementation of the International Standard on Mongolian
> which will explicitly define Mongolian character shaping behaviour in
> document N1980 (http://std.dkuug.dk/jtc1/sc2/WG2/docs/n1980.doc) also
> that Mongolian, Chinese and English versions of the "User's Convention"
> prepared by Mongolia and China. I have been unable to locate this document
> the internet. Does it exist, and if so can it be made publicly available ?
> Without the aid of such a document it seems almost impossible to correctly
> implement the Unicode encoding of Mongolian.
> In its stead I have been using the document "Traditional Mongolian Script
> ISO/IEC 19646 and Unicode Standards" (UNU/IIST Report No. 170, August
> written by Myatav Erdenechimeg, Richard Moore and Yumbayar Namsrai as a
> Mongolian character shaping behaviour. It seems to provide all the
> would expect to see in the "User's Convention", but I am not sure how
> authoritive this paper is, and what its relationship is to the "User's
> Convention" (if any).
> 2. Free Variation Selectors
> The Mongolian Free Variation Selectors (U+180B, U+180C and U+180D) are
> distinguish variant graphic forms of the same positional forms of a
> would say that there are three cataegories of variant forms governed by
> variation selectors :
> A. Non-contextual variants, such as variant forms of letters that are used
> foreign words (e.g. the use of a "reclining" letter D -- U+1833 + FVS1 --
> foreign words), and graphic variations that are due to differences between
> traditional and modern orthography. Such variants must be explicitly
> use of the appropriate variation selector in order for the correct form to
> selected by the rendering engine.
> B. Contextual variants that are determined by the overall composition of
> word in which they are found, such as the use of the long-toothed forms of
> letters OE and UE (U+1825/1826 + FVS1) in the first syllable of a word
> the use of the feminine form of the letter G (U+182D + FVS3) between
> or the letter I (which is neutral) in a feminine word. In these cases I
> imagine that it is too much to ask the rendering engine to work out the
> variant form, and the correct variant should be explicitly encoded using
> appropriate variation selector.
> C. Contextual variants that can be determined from their neighbouring
> such as the medial form of the letter G with two dots that is used before
> vowel (U+182D + FVS2), or the form of the letter A that is written with a
> forward tail when occuring finally after the letters B, P, F and K (U+1820
> FVS1). In these cases is it necessary to explicitly encode the variant
> the appropriate variation selector ? The Standard says "For cases in which
> contextual sequence of basic letters is not sufficient for a rendering
> uniquely determine the appropriate glyph for a particular letter,
> format characters are provided so that the typist may specify the desired
> rendering". Should we assume that the rendering engine will correctly
> dotted form of medial G before a vowel and the dotless form before a
> or would it be wiser to explicitly encode the appropriate variation
> anyway ?
> 3. Mongolian Vowel Selector
> The Mongolian Vowel Selector (U+180E) is used to separate the vowels A and
> from certain preceding consonants (e.g. ...N + MVS + A =
> After MVS the vowels A and E use the forward tail variant which is
> offset from the preceding consonant by narrow whitespace. These variant
> A and E are selected by the presence of a preceding MVS, and there appears
> no need to to otherwise select the variant A or E by means of a variation
> However, not only does the MVS affect the following A or E, but the
> consonant may also take a variant form when followed by an offset A or E.
> is the case for the letters N, Q, G, J, Y and W. The variant forms of
> letters when preceding an offset A or E are given in Unicode's
> Variants document (N, Q, G, J and Y are given as medial variants, but W is
> as a final variant which is perhaps wrong). My question is, should the
> form of the consonant preceding the offset A or E be explicitly encoded
> the appropriate variation selector, or is the presence of the following
> sufficient for the rendering engine to select the correct variant form ?
> 4. Variant forms of the Mongolian Birga
> Appendix A of "Traditional Mongolian Script in the ISO/IEC 19646 and
> Standards" lists four variant forms of the Mongolian Birga (U+1800) :
> 1st variant form = U+1800 + FVS1
> 2nd variant form = U+1800 + FVS2
> 3rd variant form = U+1800 + FVS3
> 4th variant form = U+1800 + ZWJ
> Unicode's Standardized Variants document
> (http://www.unicode.org/Public/UNIDATA/StandardizedVariants.html) does not
> any variants for the Mongolian Birga. Moreover, it warns "All combinations
> listed here are unspecified and are reserved for future standardization;
> conformant process may interpret them as standardized variants." This
> means that these Birga variants should not currently be recognised. But
> that the Birga does occur in a number of forms, either Unicode should
> variants for them, or add some new characters to represent them.
> Nevertheless, assuming that Appendix A of "Traditional Mongolian Script"
> correct in providing a mechanism for distinguishing four variant forms of
> Mongolian Birga, is it acceptable to use the ZWJ as a variant selector (as
> the case for the 4th variant Birga) ? It's usage here seems a little
This archive was generated by hypermail 2.1.5 : Mon Dec 16 2002 - 10:08:25 EST