Re: Mongolian Encoding

From: Martin Heijdra (mheijdra@princeton.edu)
Date: Mon Dec 16 2002 - 09:33:10 EST

  • Next message: Andrew C. West: "Re: Mongolian Encoding"

    Andrew:

    A small group has been working on these and other questions for a while now,
    after the last group of questions raised on Mongolian on this list. I will
    get in contact with you separately with some of our work.

    For the moment, in short: yes, use the TR170 document, especially its
    detailed examples (which are fuller than the textual explanations, and have
    implications not explicitly stated); there is a Chinese book called
    Mengguwen bianma which at parts is fuller and more explicit. There are still
    some rare cases not covered by either.

    Martin Heijdra

    ----- Original Message -----
    From: "Andrew C. West" <andrewcwest@alumni.Princeton.EDU>
    To: <unicode@unicode.org>
    Sent: Monday, December 16, 2002 8:40 AM
    Subject: Mongolian Encoding

    > As promised, here are some questions on the encoding of Mongolian that
    have
    > arisen whilst writing an input method for the Mongolian script (the
    questions
    > are relevant to the Todo, Manchu and Sibe scripts as well, but I'll
    restrict
    > myself to Mongolian for the moment). I don't know if anyone is able to
    answer
    > all of my questions, but I hope that someone on the list will be able to
    give me
    > some much needed advice.
    >
    > 1. Documentation
    > Section 11.4 of the Unicode Standard notes that a group of experts from
    > Mongolia, China and the West are to publish a document called "User's
    Convention
    > for System Implementation of the International Standard on Mongolian
    Encoding"
    > which will explicitly define Mongolian character shaping behaviour in
    full. WG2
    > document N1980 (http://std.dkuug.dk/jtc1/sc2/WG2/docs/n1980.doc) also
    states
    > that Mongolian, Chinese and English versions of the "User's Convention"
    will be
    > prepared by Mongolia and China. I have been unable to locate this document
    on
    > the internet. Does it exist, and if so can it be made publicly available ?
    > Without the aid of such a document it seems almost impossible to correctly
    > implement the Unicode encoding of Mongolian.
    > In its stead I have been using the document "Traditional Mongolian Script
    in the
    > ISO/IEC 19646 and Unicode Standards" (UNU/IIST Report No. 170, August
    1999)
    > written by Myatav Erdenechimeg, Richard Moore and Yumbayar Namsrai as a
    guide to
    > Mongolian character shaping behaviour. It seems to provide all the
    information I
    > would expect to see in the "User's Convention", but I am not sure how
    > authoritive this paper is, and what its relationship is to the "User's
    > Convention" (if any).
    >
    > 2. Free Variation Selectors
    > The Mongolian Free Variation Selectors (U+180B, U+180C and U+180D) are
    used to
    > distinguish variant graphic forms of the same positional forms of a
    character. I
    > would say that there are three cataegories of variant forms governed by
    the
    > variation selectors :
    > A. Non-contextual variants, such as variant forms of letters that are used
    in
    > foreign words (e.g. the use of a "reclining" letter D -- U+1833 + FVS1 --
    in
    > foreign words), and graphic variations that are due to differences between
    > traditional and modern orthography. Such variants must be explicitly
    encoded by
    > use of the appropriate variation selector in order for the correct form to
    be
    > selected by the rendering engine.
    > B. Contextual variants that are determined by the overall composition of
    the
    > word in which they are found, such as the use of the long-toothed forms of
    the
    > letters OE and UE (U+1825/1826 + FVS1) in the first syllable of a word
    only, or
    > the use of the feminine form of the letter G (U+182D + FVS3) between
    consonants
    > or the letter I (which is neutral) in a feminine word. In these cases I
    would
    > imagine that it is too much to ask the rendering engine to work out the
    correct
    > variant form, and the correct variant should be explicitly encoded using
    the
    > appropriate variation selector.
    > C. Contextual variants that can be determined from their neighbouring
    letters,
    > such as the medial form of the letter G with two dots that is used before
    a
    > vowel (U+182D + FVS2), or the form of the letter A that is written with a
    > forward tail when occuring finally after the letters B, P, F and K (U+1820
    +
    > FVS1). In these cases is it necessary to explicitly encode the variant
    form with
    > the appropriate variation selector ? The Standard says "For cases in which
    the
    > contextual sequence of basic letters is not sufficient for a rendering
    engine to
    > uniquely determine the appropriate glyph for a particular letter,
    additional
    > format characters are provided so that the typist may specify the desired
    > rendering". Should we assume that the rendering engine will correctly
    select the
    > dotted form of medial G before a vowel and the dotless form before a
    consonant,
    > or would it be wiser to explicitly encode the appropriate variation
    selector
    > anyway ?
    >
    > 3. Mongolian Vowel Selector
    > The Mongolian Vowel Selector (U+180E) is used to separate the vowels A and
    E
    > from certain preceding consonants (e.g. ...N + MVS + A =
    U+1828,180E,1820 ).
    > After MVS the vowels A and E use the forward tail variant which is
    physically
    > offset from the preceding consonant by narrow whitespace. These variant
    forms of
    > A and E are selected by the presence of a preceding MVS, and there appears
    to be
    > no need to to otherwise select the variant A or E by means of a variation
    > selector.
    > However, not only does the MVS affect the following A or E, but the
    preceding
    > consonant may also take a variant form when followed by an offset A or E.
    This
    > is the case for the letters N, Q, G, J, Y and W. The variant forms of
    these
    > letters when preceding an offset A or E are given in Unicode's
    Standardized
    > Variants document (N, Q, G, J and Y are given as medial variants, but W is
    given
    > as a final variant which is perhaps wrong). My question is, should the
    variant
    > form of the consonant preceding the offset A or E be explicitly encoded
    using
    > the appropriate variation selector, or is the presence of the following
    MVS
    > sufficient for the rendering engine to select the correct variant form ?
    >
    > 4. Variant forms of the Mongolian Birga
    > Appendix A of "Traditional Mongolian Script in the ISO/IEC 19646 and
    Unicode
    > Standards" lists four variant forms of the Mongolian Birga (U+1800) :
    > 1st variant form = U+1800 + FVS1
    > 2nd variant form = U+1800 + FVS2
    > 3rd variant form = U+1800 + FVS3
    > 4th variant form = U+1800 + ZWJ
    >
    > Unicode's Standardized Variants document
    > (http://www.unicode.org/Public/UNIDATA/StandardizedVariants.html) does not
    list
    > any variants for the Mongolian Birga. Moreover, it warns "All combinations
    not
    > listed here are unspecified and are reserved for future standardization;
    no
    > conformant process may interpret them as standardized variants." This
    clearly
    > means that these Birga variants should not currently be recognised. But
    given
    > that the Birga does occur in a number of forms, either Unicode should
    define standardized
    > variants for them, or add some new characters to represent them.
    > Nevertheless, assuming that Appendix A of "Traditional Mongolian Script"
    is
    > correct in providing a mechanism for distinguishing four variant forms of
    the
    > Mongolian Birga, is it acceptable to use the ZWJ as a variant selector (as
    is
    > the case for the 4th variant Birga) ? It's usage here seems a little
    suspect to
    > me.
    >
    > Andrew
    >
    >



    This archive was generated by hypermail 2.1.5 : Mon Dec 16 2002 - 10:08:25 EST