Myanmar Redesign Proposal (was: AA versus TALL AA)

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sat Mar 25 2006 - 18:15:15 CST

  • Next message: Michael Everson: "Re: [seasia] Myanmar Redesign Proposal (was: AA versus TALL AA)"

    Michael Everson wrote:

    >>Is this a Unicore escape?
    >
    > It was a typo on Ken's part. Discussion of this proposal with members of
    > the UTC is fairly intense.

    I'm not surprised. There are issues of principle involved and the proposal
    sets dangerous precedents. The debate could be worse than the row over
    Phoenician. I was a bit slow to understand the 'double hockeysticks' in
    this threads original title.

    I've posted this to both lists because there are matters of principle,
    matters of fact, and matters of Myanmar script practicalities. I think any
    replies on matter of principle belong on the general list, while replies
    about the other two belong on the SEAsia list. (Is it being moderated at
    the weekend? I don't think general list moderation status automatically
    transfers to the SE Asia list.)

    > I don't personally have the time or energy to get into too much of it on
    > this list.

    I'll list the major issues *I* can see. There may be others.

    Issue 1: Point of Principle/Pride

    The Burmese want their glyph-based input, and it's proposed that they get
    it, subject to the following restrictions:
    (a) 'Logical' order. (I'm pretty sure it isn't actually phonetic order!)
    (b) Smart fonts sort out positional variation and combination of subscript
    forms (and this concession is actually a clear gain)
    (c) Pali-only / archaic subscript and superscript consonants are written as
    subjoining character plus normal character and similar methods.

    Issue 2: Disunifying AA - Point of Principle

    > And then there are differences between Burmese and Mon, where Mon
    > regularly uses TALL AA with PHA, though modern Burmese doesn't. (An 1840
    > Burmese Bible does, however.)

    > It is proposed to disunify TALL AA from AA because (1) adding a
    > non-variable TALL AA for Karen use would introduce ambuguous encounters

    And be pointless. S'gaw Karen just uses different glyph variants to
    Burmese, and the tall AA and short AA glyphs happen to be the same.

    I can see two types of argument that would justify disunifying the variants.

    Single language argument: The rules for choosing between tall AA and short
    AA cannot reasonably be implemented in a rendering system. Cf. the two
    forms of Latin small 's' and the two forms of Greek small sigma, for which
    I think minimal pairs actually exist. The single language need not be
    Burmese; Mon would also do.

    Mixed-language argument: There are (or will be) many documents displaying
    two or more of Burmese, Mon and S'gaw Karen together, with the text in each
    language using that language's preferred systems of selecting between the AA
    forms. (This is akin to demonstrating that there are three separate
    scripts, and then saying that corresponding characters should mostly be
    unified.)

    The easiest examples, if they existed, would be lists of people's names from
    both languages that used the 'aa' form appropriate to the person's language.
    This would be both a single language argument and a mixed-language script.

    The proposal presents evidence for neither argument.

    > In scripts like Lanna and Myanmar, where it is really *not* possible to
    > contextually select the display, the only sensible thing is to encode both
    > AA and TALL AA and let users use the one they want when they want it.

    If you can justify that statement for the Myanmar script, then you have
    established the case for separate encoding of AA and TALL AA. That still
    leaves open the option of doing it by variation selectors, but they can be
    rendered pointless by the Burmese always using a variation selector.
    (Pressing an AA key can generate two characters - the generic AA and the
    appropriate variation selector.) Does anyone care to expound the theory of
    variation selectors? There may be words in white in the TUS saying 'only
    for unifying CJK variants that the Chinese (or Japanese, especially with
    surnames) insist are different.'

    At present there is the significant possibility of ISO/IEC opposition to
    this disunification.

    Issue 3: Abolition of Unicode Virama: Floodgate, Myanmar Stability, and
    Stability Pact

    The creation of ASAT in place of the 2-character visible virama and the
    restriction of virama to a subjoining role immediately invalidates most
    Unicoded Myanmar script text, including my paltry creations. In principle
    that's a BAD THING. For myself, I welcome it and look forward to the
    upgrade of SIL's Padauk font.

    However, I can see a clamour for other scripts to convert the Unicode virama
    to a historical footnote and separate the concepts of conjoining and visible
    virama. Unfortunately, this will cause two canonically inequivalent ways of
    doing the same thing. They have to be canonically inequivalent, because
    virama + ZWNJ has to continue to be in Normal Form C. Such requests are
    likely to be refused as a point of principle.

    Issue 4: Great SA: Mystification, precedent

    I am truly baffled as to why this conjunct needs its own encoding. It is
    currently encoded as SA, virama, SA, which will now come to represent the
    transparent, unligated form. However, no evidence was provided that this
    form actually occurs! If it did, I would suggest that SA, ZWNJ, VIRAMA, SA
    be the appropriate representation. (Remember that Myanmar VIRAMA would no
    longer be a Unicode virama!)

    This sets a dangerous precedent of allowing a separate character for every
    non-transparent conjunct. Codepoint for Devanagari KSHA? JNYA? Moreover,
    by the stability pact, these will be inequivalent to the current sequences.

    Issue 5: Medial WA: Practical Issue

    This is partly a question of phonology. Medial WA and WA as part of a 'true
    conjunct' will be encoded differently. How does one tell them apart when
    entering them? Much of the time, how will one tell them apart visibly?
    And, finally, are the people of Burma consistent in deciding whether a 'WA'
    in the middle of a word is medial or the second part of a conjunct?
    Fortunately, 'true conjunct' WA is fairly rare, but it occurs in the sort of
    word liable to be learnt from a book rather from speech.

    Quibbles:

    The new way of encoding kinzi seems unnaturally complicated, and quite
    inappropriate for repha. I will have to re-read that section again - it
    isn't making sense to me.

    I'm not sure that Graphite + Padauk is the only Unicode 4.1-compliant
    implementation of the Burmese script outside of Burma.

    Should U+1039 MYANMAR SIGN VIRAMA be the conjoiner or the visible sign? The
    (immutable) name implies the visible sign, but the proposal makes it the
    conjoiner.

    Richard.



    This archive was generated by hypermail 2.1.5 : Sat Mar 25 2006 - 18:20:04 CST