RE: Malayalam vowel sign AU

From: Kent Karlsson (
Date: Sat Apr 01 2006 - 09:57:35 CST

  • Next message: Michael Everson: "The Phaistos Disc"

    James Kass wrote:
    > > This has nothing to do with font switching at all. Not even
    > > Font switching must NEVER change apparent spelling.
    > Reproduce Table 9-11 on page 248 of TUS4.0 in plain text. The table
    > illustrates Malayalam Orthographic Reform.

    Note the table heading, which says *ORTHOGRAPHIC* (spelling) reform.
    What is not said is how the difference in orthography is encoded in
    the character stream. Since it is an orthographic reform, there must
    be some difference in the character stream. One plausible way is to
    use ZWJ/ZWNJ to mark the spelling difference. (Ideally, IMO there
    should have been OLD U/NEW U, OLD UU/NEW UU characters, rather
    than overloading U and UU with both old and new orthography.)

    This has NOTHING to do with font selection. Not at all! (Besides: that
    figure does not include AU.)

    When a new orthography was announced for German a few years ago,
    did you go and make two Latin fonts then, one for the old and one for
    the new orthography? I guess (and hope) not... When one for Finnish
    started to use and instead of sh and zh, did you go and make a
    font that displays sh as and zh as ? I guess and hope not.

    > > > What has been clear all along is that
    > > > U+0D57 should never be included in running text,
    > >
    > > I don't know where that idea comes from. ...
    > It comes from TUS4.0 page 249:
    > "U+0D57 MALAYALAM AU LENGTH MARK is provided as an encoding for
    > the right side of the two-part vowel U+0D4C MALAYALAM
    > So, if I wanted to encode the right side of this two part vowel, as in

    Including the modern spelling for the AU vowel (of course).

    How did you (and some others) manage to miss the rather clear
    statements (in several places) that 0D4C is a **TWO-PART** vowel??
    There is no such thing as a sometimes two-part, sometimes one-part
    vowel mark (nor should there be).

    > a plain text stand-alone representation of it, I'd use
    > U+0D57. But there's
    > only one MALAYALAM VOWEL SIGN AU *character* in the standard.

    That one is for the traditional spelling of the AU vowel. The modern
    spelling uses just U+0D57 MALAYALAM AU LENGTH MARK. (The name
    of the character does not matter for this.)

    So there are two "au vowel sign" characters for Malayalam, one which is
    called MALAYALAM VOWEL SIGN AU and the other happens to be called
    MALAYALAM AU LENGTH MARK. Granted that the second name does not
    catch that character's modern use, just its traditional use.

    > However, that same section points to a detailed discussion of these
    > two part vowels in the Tamil section. (on page 239) This states that
    > for Tamil, the single code point is the preferred form and is the form
    > in common use. But, it also says that the single code point
    > is equivalent to the string of two code points.

    Yes, no problem there. Precomposed versions tend to be preferred when
    available (NCF and such).

    > There is nothing, far as I can tell, suggesting that the single code
    > is equivalent to the other single code point. In other words, U+0BCA
    > equivalent to U+0BC6 plus U+0BBE.


    > It does not necessarily follow that U+0BCA is equivalent to U+0BBE.

    Recte: "not necessarily" -> "not". Indeed, they are not equivalent in
    sense of the word.

    > You seem to be suggesting this equivalence (for Malayalam),

    I do not, and they are not. It is the old spelling vs. the modern
    They are not equivalent. They apparently denote the same phoneme,
    but that is not the same as equivalent.

    > and if such is the case, it should be plainly stated in the standard.

    They are not, and it should not (since they are not).
    > Googling KA + AU(vs) gives two pages of hits, mostly purporting
    > to be Unicode Malayalam text. Searching KA + AU(lm) gives only
    > eight hits, none of which are Malayalam text.

    The dire effects of one buggy (ill-made) system.

    > Quoting from:

    >> ? should not have the ? symbol in the left (eg: ??). 'AU
    length-marker' is just
    >> for creating that symbol alone in all kinds of fonts. Or think it
    this way - if there
    >> a AnjaliNewLipi how would you avoid ? symbol in the left.

    Well, that's misleading. And probably a result of being mislead.

    >And the response was,
    >> Its the responsibility of the unisribe to put the AU marker. font is
    not doing
    >> anything to put symbols on both sides, itd automatically done by
    >> let me see if i can check that behaviour of uniscribe.

    I'm not sure what this tries to say.

    Anyhow, using 0D4C is to have exactly the same effect as using <0D46,
    Letting 0D4C display as just 0D57 is effectively to say that 0D46 is an
    character. And it is not, it's a *visible* ("graphic") character.
    Letting it sometimes
    be visible sometimes not is not tenable.

    >Quoting from
    >(section titled Unicode: Redefining AU length marker U+0D57)

    >> Current meaning of the two AU signs are described below:
    >> Two part symbol of AU is not used now-a-days.

    Recte: "of" -> "for", but otherwise ok.

    >> Could be represented by two part symbol in fonts supporting old

    This is not font dependent, and cannot be when correctly implemented. It
    ALWAYS two-part. This would be the case even if it didn't have a

    >> Could be represented by right part alone in fonts supporting new

    That is completely wrong, for a number of reasons; but I'm getting a bit
    of having to repeat them again and again.

    >> Should not be used as MALAYALAM VOWEL SIGN AU.

    The capital letters there are misleading. 0D57 *is* the modern spelling
    for AU
    in Malayalam.

    >> Represents the right half of 0D4C irrespective of the orthography
    >> by the font.

    That sentence does not make sense.

    >> Only required when the right part alone need to be specifically
    mentioned. eg: in
    >> a grammar book.

    Or when using the modern spelling for AU in Malayalam.

    >> Common day-to-day texts need not use this symbol at all.

    Of course it should.

    >> This assignment of meaning to these symbol causes lots of confusion.

    And the text you quoted adds to that confusion.

    >> Also, it can potentially violate Uniqueness Rule when people

    "Uniqueness Rule"???

    >> use 0D4C and 0D57 to denote AU symbol in new orthography.

    0D4C is old orthography for AU, 0D57 ("alone") is modern orthography for

    > The user community, far as I can tell, shuns the notion that U+0D4C
    > U+0D57 are equivalent.

    They are NOT equivalent. They are DIFFERENT spellings of AU in

                    /kent k

    This archive was generated by hypermail 2.1.5 : Sat Apr 01 2006 - 10:23:14 CST