Re: Variant selectors in Mongolian

From: Martin Heijdra (mheijdra@princeton.edu)
Date: Tue Jul 16 2002 - 09:03:22 EDT


Timothy:

I also cannot call myself an expert with Mongolian, although I have worked
with it to a limited extent, and have some access to people who have more.
The person you mentioned, Oliver Corff is also behind the gif I mentioned
the other day, and I have used that in the past. (See
http://userpage.fu-berlin.de/~corff/im/MLS/overview.MLS.html) Another
program I used is Xenotype. (http://www.xenotypetech.com/). However, in view
of the fact that the script is mainly in actual use in Inner Mongolia, and
since I cannot currently (but I will try) find details on what I know is the
most used program there, I would not feel comfortable to write something up
based upon Western programs only. (For the Chinese program, see
http://www.founder.com.cn/ics/gb/content/2001-10/31/content_310.htm)

I have been trying to collect enough information. In onse sense, information
about *regular* formation is common enough, as is information about the
somewhat grey area of "predictable irregular" behaviour. Exhaustive
information on what happens in foreign or archaic words (or sometimes, just
to distinguish between homophones) etc. is much more difficult to get by;
the Japanese entry for Mongolian in the writing volume of the Sekai gengo
daijiten does have some pertinant remarks and examples, but looking through
dictionaries one can find others.

One problem I have already encountered: in Unicode, Manchu (and Sibe etc.)
is considered part of Mongolian. That is, most "Mongolian letters" defined
as such are used for Manchu as well; those called "Manchu" are simply the
small subset not used for Mongolian at all.

For proscribed behaviour, and the use of variants, should one take Manchu
and Mongolian as a whole? That is, to give a real example, if there is a
final "n" and a final "N", and the latter takes two different glyph variants
in Manchu and Mongolian, is one variant selector sufficient (with meaning:
in Mongolian G', in Manchu G"), or are two necessary?

While the first would seem sufficient, the fact that there are THREE variant
selectors and the letters they are listed with in the table, I wonder
whether it's not the latter which is meant by Unicode. In practice, when
these little-used script are used, they are likely to be used by the same
people in the same contexts and same documents, and perhaps even the same
fonts, so the latter makes some practical even if not purely theoretically
necessary.

Martin Heijdra

----- Original Message -----
From: "Timothy Partridge" <timpart@perdix.demon.co.uk>
To: <kenw@sybase.com>
Cc: <mheijdra@Princeton.EDU>; <book@unicode.org>; <everson@evertype.com>
Sent: Monday, July 15, 2002 1:53 PM
Subject: Re: Variant selectors in Mongolian

> You recently said:
>
> > > I believe Unicode
> > > should take an explicit position on this as it has important
implications
> > > for successful rendering of plain text on various platforms.
> >
> > I think Tim Partridge and Martin Heijdra and anyone else actually
> > working on Mongolian with some implementation experience should
> > write up a technical note on this and present it to the UTC
> > so that it *could* take an explicit position based on input from
> > experts. Right now there *are* no Mongolian script experts involved
> > in the UTC or the editorial committee, and that is one of the
fundamental
> > reasons why the text seen in the standard isn't very clear yet.
> > It is going to take *somebody* who understands the Unicode text
> > model *and* Mongolian to write up such text.
>
> I am most definitely not a Mongolian expert, I simply have an interest in
> writing systems in general. I am familiar with the Unicode text model and
I
> am willing to work with script experts to develop something.
>
> I would suggest we could produce the following:
>
> A description of the normal behaviour of the Mongolian writing systems,
with
> examples of Unicode character sequences and their visual appearance. This
> would be at a similar level to the other script descriptions concentrating
> on rendering information rather than linguistic details.
>
> Some examples of unusual behaviours and how to obtain those results using
> appropriate character sequences.
>
> A machine readable cross reference between the characters and their
various
> glyphs (presentation forms). This would include the valid combinations of
> variation selectors. Many characters share the same glyph and I think it
> would help implementers to be certain that two glyphs were indeed the same
> rather then having a subtle difference.
>
> A proposed algorithm for converting character sequences to glyphs. Ideally
> this would
> Be simple to implement
> Cover all the normal behaviour correctly without use of varient
selectors
> Behave in a way that is readily predictable by someone familiar with
the script
>
> A simple prototype implementation of the algorithm would be useful for
> checking it behaved as expected.
>
> Martin, are you interested in working on such a project? It would be
helpful
> if one or more of the authors of UNU/IIST report 170 (Myatav Erdenechimeg,
> Richard Moore and Yumbayar Namsrai) were available for consultation. A
brief
> web search reveals that Oliver Corff has been working on rendering
Mongolian
> using TeX and he has tried some experimental Unicode support. Is anyone
> familiar with his work?
>
> Regards,
>
> Tim
>
> --
> Tim Partridge. Any opinions expressed are mine only and not those of my
employer
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 16 2002 - 07:18:05 EDT