Re: minimizing size (was Re: allocation of Georgian letters)

From: Doug Ewell (
Date: Sat Feb 09 2008 - 13:03:24 CST

  • Next message: Doug Ewell: "Re: minimizing size (was Re: allocation of Georgian letters)"

    Sinnathurai Srivas <sisrivas at blueyonder dot co dot uk> wrote:

    > Tamil need not be a CTL script. It can work 100% and work better than
    > CTL enabled Tamil. Why then is Tamil classed as CTL script? What is
    > the criteria?

    The entire group of nine Indic scripts was considered to be structurally
    related when they were encoded in Unicode 1.0. This included
    classifying all of them as CTL, although that term was not used in
    Unicode 1.0. The following passages from TUS 1.0, page 53 describe the
    encoding model:

    "The Unicode standard follows the ISCII (Indian Standard Code for
    Information Interchange) code standard in treating all nine of the
    official Indian scripts (Devanagari, Bengali, Gurmukhi, Gujarati, Oriya,
    Tamil, Telugu, Kannada, and Malayalam) in a parallel way."


    "The graphemic syllable is built up of alphabetic pieces, the actual
    letters of the Devanagari script. These consist of three major types:
    consonants, dependent vowels, and independent vowels."

    This is the reason why Tamil is encoded in Unicode the way it is.
    Whether or not anyone agrees that it should have been encoded that way
    is a different matter.

    > As for publishing, attempt to use Unicode Tamil fails. If it is
    > acheivable, when will it be ready?

    This question misstates the concept "Some publishing applications that
    use Unicode Tamil are broken" as "Unicode Tamil is broken for
    publishing." All that is necessary to disprove the latter is to show at
    least one publishing application which uses Unicode Tamil and generates
    correct results, and John Jenkins has already done that.

    > Again what is the criteria for stopping Tamil using workable solution
    > and what is the criteria for enforcing non-working solution?

    The criterion is that duplicate encodings will not be created. This was
    done with Hangul in the early 1990s (actually removing the old encoding)
    when Unicode was very new and supported by very few systems. Read
    Section 3 of RFC 2279: "The incident has been dubbed the "Korean mess",
    and the relevant committees have pledged to never, ever again make such
    an incompatible change."

    > I think we can atleast move fast, if we introduce all necessary
    > canonical forms now, most of the publishing s/w may work with
    > canonical forms.

    Read the ISO "Principles and Procedures" document at to see why duplicate
    encodings are no longer allowed. Reinventing TUNE as a question of
    "canonical forms" and "non-canonical forms" doesn't change this. If you
    want software to work with a different Tamil model, use the PUA.

    Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14  ˆ

    This archive was generated by hypermail 2.1.5 : Sat Feb 09 2008 - 13:05:12 CST