Re: Malayalam vowel sign AU

From: James Kass (jameskass@worldnet.att.net)
Date: Sun Apr 02 2006 - 02:00:44 CST

  • Next message: Rajeev J Sebastian: "Re: Malayalam vowel sign AU"

    Kent Karlsson wrote,

    >> Reproduce Table 9-11 on page 248 of TUS4.0 in plain text. The table
    >> illustrates Malayalam Orthographic Reform.
    >
    > Note the table heading, which says *ORTHOGRAPHIC* (spelling) reform.

    Spelling rules are only a subset of orthography.

    > What is not said is how the difference in orthography is encoded in
    > the character stream.

    It's not. By design. By the script's main users. To the best of my
    knowledge.

    > Since it is an orthographic reform, there must
    > be some difference in the character stream. One plausible way is to
    > use ZWJ/ZWNJ to mark the spelling difference.

    If Table 9-11 were reproduced in plain text using ZWJ/ZWNJ, it
    should display fine... as long as a font supporting the traditional
    orthography was used. The chart could not be displayed using a
    font supporting the reformed orthography because such a font
    would not include the ligatures needed for the traditional column.

    (An OpenType font supporting the reformed orthography could
    probably be made to include ligature glyphs referenced with
    ZWJ look-ups. Some font developer in the user community would
    have to consider the effort worthwhile, though. So, until then...)

    > (Ideally, IMO there
    > should have been OLD U/NEW U, OLD UU/NEW UU characters, rather
    > than overloading U and UU with both old and new orthography.)

    From a plain text computer encoding viewpoint, you may be right...

    > This has NOTHING to do with font selection. Not at all! (Besides: that
    > figure does not include AU.)

    ... but the user community insists that the same encoded binary
    strings be displayed in either traditional or reformed style based
    upon the user's font choice. The disadvantage of not being able
    to display both forms of the script in plain text may have been
    outweighed by the advantages of not having to transcode, not having
    to maintain two sets of all web pages on a web site, easier to
    implement searching/sorting/so forth, libraries not having
    to maintain doubled databases, etc.

    > When a new orthography was announced for German a few years ago,
    > did you go and make two Latin fonts then, one for the old and one for
    > the new orthography? I guess (and hope) not... When one for Finnish
    > started to use ? and ? instead of sh and zh, did you go and make a
    > font that displays sh as ? and zh as ?? I guess and hope not.

    Of course not. I've always figured that if anybody wants to
    represent the "sh" sound with a question mark, they should just
    use the question mark character at U+0037.

    (My browser settings munged your message.)

    > How did you (and some others) manage to miss the rather clear
    > statements (in several places) that 0D4C is a **TWO-PART** vowel??

    Explanatory text about U+0D4C specifically should be added to the
    standard. Since the standard currently offers no direction with
    respect to U+0D4C and the orthographic reform, people speculate
    and form divergent opinions as to proper implementation methods.

    The user community apparently considers that U+0D4C is only a
    two-part vowel sign in the traditional orthography. It is a one
    part vowel sign in the reformed orthography. Try thinking of
    it as a unification.

    >> Quoting from:
    >> http://varamozhi.blogspot.com/2004/09/unresolved-issues-in-anjali-unicode.html
    >>> Its the responsibility of the unisribe to put the AU marker. font is
    > not doing
    >>> anything to put symbols on both sides, itd automatically done by
    > uniscribe.
    >>> let me see if i can check that behaviour of uniscribe.
    >
    > I'm not sure what this tries to say.

    The font developer is responding to a bug report in which the user
    does not find the expected behavior of the left side of U+0D4C getting
    dropped. The font developer (correctly) identifies the problem as
    being caused by the rendering engine rather than the font and offers
    to look into the rendering engine's behavior.
     
    >>> Also, it can potentially violate Uniqueness Rule when people
    > interchangably
    >
    > "Uniqueness Rule"???

         "Two different encodings should not render same,
          irrespective of the font or joiners used."

    http://varamozhi.blogspot.com/2005/07/unicode-uniqueness-rule-on-encoding.html

    >> The user community, far as I can tell, shuns the notion that U+0D4C
    > and
    >> U+0D57 are equivalent.
    >
    > They are NOT equivalent.

    Good! They shouldn't be. The text Kenneth Whistler submitted from
    5.0 could be construed to suggest that they will become equivalent in
    Unicode 5.0, though. That's why I asked and what started this thread.

    > They are DIFFERENT spellings of AU in Malayalam.

    The Malayalam user community is better qualified to judge this than
    you or I.

    Quoting from...
    http://www.supersoftweb.com/Unicode.htm

         "Thoolika2005 have both Reformed Malayalam and
          Traditional Malayalam Open Type Unicode fonts. In
          Unicode the code points for Traditional Malayalam
          script and Reformed Malayalam script is same. So,
          the changing of script from Traditional to Reformed
          and vice-versa can be achieve simply by selecting the
          font name."

    The government of India, in a special report on Indic scripts and
    Unicode (relevant section: http://tdil.mit.gov.in/Malya-guj.pdf )
    says right in their own version of the Malayalam code chart,
    "0D57 ... MALAYALAM VOWEL SIGN AU LENGTH MARK
    (new line, bullet) Not in modern use. (new line, bullet)
    already given at 0D4C".

    (In fairness please note that the tdil pages are a bit outdated
    now and others have pointed out misconceptions in various
    sections of those PDFs to this list and other lists.)

    It's my impression than one of the reasons that the actual
    users require a common encoding for either traditional or
    reformed orthography text display is that, although the
    script reform movement started some forty years ago,
    not everyone has "bought into it" yet.

    If it is Unicode's official position that traditional Malayalam
    use U+0D4C and that reformed Malayalam must use U+0D57,
    then Malayalam rendering engineers may recommend that
    traditional Malayalam fonts be designed with traditional
    AU glyphs at both code positions and reformed fonts with
    reformed glyphs there. Then they'd lobby operating system
    marketers to support their requirements while implementing
    same in OpenSource...

    Best regards,

    James Kass
    Apologies for length



    This archive was generated by hypermail 2.1.5 : Sun Apr 02 2006 - 02:12:24 CST