Re: Tamil Collation vs Transliteration/Transcription Encoding

From: N. Ganesan (naa.ganesan@gmail.com)
Date: Mon Jun 27 2005 - 06:26:58 CDT

  • Next message: N. Ganesan: "Re: Tamil sha (U+0BB6) - deprecate it?"

    Richard Wordingham (richard.wordingham@ntlworld.com)
    >Have you read and understood the Unicode Collation >Algorithm (
    http://www.unicode.org/reports/tr10/ )?
    >What you most importantly need to propose is a
    >re-ordering of the weights (basically 195C.0020.0002
    >to 1972.0020.0002) assigned to the Tamil consonants
    >(U+0B95 to U+0BB9) in http://www.unicode.org/Public/UCA/latest/allkeys.txt
    >(currently Version 4.1.0). If you can demonstrate
    >that your proposed weights gives the correct order,
    >I don't see why the change shouldn't be accepted.
    >If you can fix any other collation 'errors' at the
    >same time, I think so much the better.

    >There is no explicit undertaking that the default
    >Unicode Collation Algorithm is correct for any language,
    >but I am not aware of any reason that it would be
    >wrong to make it work properly for the collation
    >of items in the Tamil script.

    Pl. see a collation chart for Tamil:
    http://nganesan.thamizamuthu.com/docs/TamilCollationChart.html
    Or, in pdf form:
    thamizh@sbcglobal.net/TamilCollationChart.pdf">http://www.geocities.com/thamizh@sbcglobal.net/TamilCollationChart.pdf

    ie.
    http://www.geocities.com/thamizh[AT]sbcglobal.net/TamilCollationChart.pdf

    I'd love to know when will the SHA (u+0bb6)
    Uniscribe be updated and SHA will work in
    Windows correctly? Fixing Uniscribe
    to render SHA series in Tamil script -
    is it a job to be done by companies like Microsoft?

    In another e-mail, R. Wordingham wrote:
    >But in this case, distinguishing the Tamil
    >script from its sister script Malayalam
    >facilitates the exclusion of letters from
    >the ancestral Grantha script!

    The Tamil Grantha script is another script,
    See diiferences between Tamil script and
    Tamil Grantha script:
    http://www.unicode.org/mail-arch/unicode-ml/y2005-m05/0071.html
    Good ref.s are by (1) R. Gruenendahl and
    (2) P. Visalakshy. There are many Sanskrit
    books being printed with the Tamil Grantha
    script, there are 1000s of books in that
    script in Adyar Theosophical Library, Chennai (Madras),
    Tamil Nadu, India. Like Devanagari script,
    Tamil Grantha script too has many conjuncts
    and both their sort orders are same.
    I've written a draft of the Tamil Grantha
    code page proposal.

    <<<
    The default weights already address this. The current
    weight entries for VOWEL SIGN O and its
    decomposition are given in the table by:

    0BCA ; [.197B.0020.0002.0BCA] # TAMIL VOWEL SIGN O
    0BC6 0BBE ; [.197B.0020.0002.0BCA] # TAMIL VOWEL SIGN O

    Note that the sorting algorithm will treat them as identical.
    A similar entry for 'ksh' would start '0B95 0BCD 0BB7'.

    I'm not sure these canonical decompositions are breaches
    of architecture any more than other canonical expansions.
    I can't get up worked about this issue because for Thai,
    for example, only the decomposed form is available.
    >>>

    Like Thai, Tamil also employs in majority,
    and in a wide class of applications (eg.,
    loans from English, the West or Islamic world)
    "ksh" only as non-conjunct. So we at INFITT
    are discussing a proposal to make the
    non-conjunct KSHA as default, and to create
    conjugated ksha with ZWJ. The majority behaviour
    of ksha as non-conjunct is in Tamil, but
    the non-conjunct ksha is not known in other
    Indic scripts. It is a Tamil special.

    N. Ganesan



    This archive was generated by hypermail 2.1.5 : Mon Jun 27 2005 - 06:27:53 CDT