Re: Mongolian Unicoding (was Re: Cuneiform Free Variation Selectors)

From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Mon Jan 19 2004 - 09:43:29 EST

  • Next message: Dean Snyder: "Re: Mongolian Unicoding (was Re: Cuneiform Free Variation Selectors)"

    On Mon, 19 Jan 2004 05:23:31 +0000, jameskass@att.net wrote:
    >
    > Dean Snyder wrote,
    >
    > > Tom Gewecke wrote at 2:26 PM on Sunday, January 18, 2004:
    > > ...
    > > >
    > > >Agreed. I can't imagine that anyone who has ever tried to actually do
    > > >anything with Unicode Mongolian would recommend variation selectors as an
    > > >encoding technique, unless perhaps they wanted to make sure the encoding
    > > >was never implemented.
    > >
    > > Could you please elaborate? Has this modle not been implemented? Either
    > > via Unicode or otherwise?
    > >
    >
    > Here's how it works: there are three factions involved. The OS and
    > rendering-engine developers, the editor/processor/input developers,
    > and the font developers. Each faction considers that the fancy stuff
    > needed for Mongolian rendering should properly be handled through
    > a combination provided by the other two factions.
    >

    An analogy for those not familiar with the Mongolian script is the much beloved
    long s, which is a positional glyph variant of the ordinary letter s for some
    languages at some periods of time. The long s does not need to be encoded as a
    separate character as there are well-known rules for when an s should be written
    long and when it should be written short (although these rules may vary from
    locale to locale and from time to time). If, for example, the rule for a given
    locale is short s finally and medially after another s, and long s initially and
    medially except after another s, then the user could type in a word using the
    ordinary letter s throughout, and the rendering system would select the long or
    short s glyph as appropriate depending on its position within the word. But say
    that the user wanted to go against the rendering rules, and write a long s in a
    position that is normally rendered as a short s, or if he wanted to refer to the
    long s in isolation, then this is where an FVS would come in. The FVS could be
    applied to the letter s to override its normal glyph shape, and force a long s
    even where the rules state that it should be a short s (and vice versa for short
    s).

    Now the Latin alphabet only has this one example (as far as I know) of a letter
    that has positional or contextual variant forms, and so it is simpler to just
    encode the long s separately. However, almost every letter in Mongolian and its
    related scripts has at least two positional and/or contextual forms, and some
    letters have up to four or five glyph forms. Encoding all the various glyph
    forms of each letter separately would be an unecessary burden on the user, who
    would have to manually select the correct glyph form for each letter even though
    they are conceived of as the same letter. It is far simpler (for the end-user at
    least) to let the rendering engine apply a set of rules to determine which glyph
    form is required in which position (isolate, initial, medial or final) or in
    which context (e.g. in "feminine" or "masculine" words). As Asmus pointed out
    the Mongolian FVSs would normally only be needed to override the rules, for
    example to display a particular glyph form in isolation (e.g. in metalanguage descriptions
    of the Mongolian script), or to write foreign words (which in Mongolian
    typically use unexpected glyph forms for certain letters); and so in normal
    running text with no foreign words the user would rarely need to use an FVS (and
    with a good IME the user probably wouldn't even need to know of their existence).

    The reason why everybody who has had anything to do with Mongolian encoding
    (including myself) shudder in fear at the mere mention of "Free Variation
    Selector" is not that they are a bad thing per se -- their use in Mongolian is
    very fit and proper -- but that the rules for selecting the appropriate
    positional or contextual form in Unicode have never been clearly formalised, and
    without the rules it's impossible to know how to correctly render running text
    or to know which FVS to apply to a given letter to override the rules. Once the
    rules have been established (hopefully soon), and incorporated into the fonts,
    rendering engines and IMEs, then everything should work like a well-oiled
    machine.

    Knowing nothing about Cuneiform, I can't say whether FVSs are a suitable option
    for Cuneiform or not, but if Dean is thinking about using FVSs like ordinary
    Variation Selectors (i.e. applied manually by the user to select a distinct
    character), then I agree with Michael that this is "pseudo-coding" and probably
    not appropriate.

    Andrew



    This archive was generated by hypermail 2.1.5 : Mon Jan 19 2004 - 10:21:38 EST