Re: Tamil Collation vs Transliteration/Transcription Enc

From: Michael Everson (
Date: Sat Jun 25 2005 - 04:10:30 CDT

  • Next message: Michael Everson: "Re: Tamil Collation vs Transliteration/Transcription Enc"

    At 20:06 +0100 2005-06-24, Sinnathurai Srivas wrote:

    >Any implementation would initially attempt for a natural sort order
    >for a language, where by the default hex order of codes would be a
    >natural sort order of that language.

    This is not how modern sorting of the Unicode Standard works. Except
    for very simple scripts like Cherokee or Phoenician, hex order can
    rarely be considered to work -- and it doesn't work ANYWAY the
    instant you mix European digits or punctuation with them.

    >The question now is why Unicode decided to deny this natural
    >facility to Tamil, in its implementation strategy.

    You assume (1) that there is something "natural" to be served, and
    (2, again) that Tamil is "broken" somehow in Unicode, which it is not.

    >The answer is, in Unicode's consideration there is another
    >requirement that was considered more important than sorting order of
    >Tamil. The requirement was, the transliteration properties of code
    >order of all Indian languages must be the same and sort order was
    >considered a minute matter in comparison to sort order.

    It is the case that the Indic blocks (for the major scripts) have
    one-to-one positional equivalences. This was unnecessary, and
    wasteful of space -- but it was inherited from ISCII, so you can go
    and blame them if you don't like it. Having said that, even though it
    was unnecessary and wasteful of space, it was in no way harmful to
    any of the Indic scripts.

    >Unicode decided that writing softwares to transliterate between
    >different Indic languages is a more daunting task than writing
    >software to collate a language.

    ISO/IEC 14651 and the Unicode Collation Algorithm can sort anything
    correctly, so long as the sort is algorithmic.

    >However, Devanagari had it's upper hand in getting it natural sort
    >order encoded,

    This is inappropriate rhetoric. Devanagari is not a godlike force
    looking for superiority over Tamil, Redjang, Tibetan, and Lepcha.

    >while other languages were forced to abandon the natural sort order
    >in favour of transliteration code order.

    Not only is this unsubstantiated, but it is untrue.

    >All these other languages now face the task of implementing fixes to
    >get the collation working.

    Languages face no tasks. Implementors of the Unicode Collation
    Algorithm and ISO/IEC 14651 have to tailor those standards to meet
    their needs.

    >Unlike Latin based languages, each Indic languages use alphabet of
    >their own. For this reason abandoning natural sort order in favour
    >of transliteration sort order was not a technical but a political
    >decision by Unicode.

    Nonsense. (1) The order of the characters in a code table is
    irrelevant with regard to sorting, and (2) the order of the
    characters in the Tamil code table follows ISCII.

    >Unicode did understand the damage it made to the suffering
    >languages, but decided to go along with it's political decision,
    >forcing minority languages to obey orders.

    This allegation is outrageous and entirely untrue.

    >Software routines to do transliteration is a simple task, compared
    >to software routines to collate a scrambled encoding. Unicode still
    >decided to enforce its political agenda over a technical requirement.

    Not a thing you have said makes any sense whatsoever -- it is you,
    sir, who are being political. Nothing you are saying here makes
    either techncial or linguistic sense. Please read the Unicode

    >Unicode transliteration scheme does not work.

    Unicode has no transliteration scheme. Your belief that it does
    because ISCII had a particular structure in its code tables is

    >The saddest thing of all is that the transliteration does not work
    >as Unicode hoped it. There never was a simple transliteration
    >mechanism suitable for encoding different languages. For example,
    >Tamil writing system is based on phonemic based Alphabet system,
    >while Devanagari is based on phonemic only system.

    These terms ("phonemic based alphabet system" and "phonemic only
    system") are non-technical and inaccurate with regard to the
    structure of the writing systems.

    >In Tamil k = k, h, g, x, q, c (mahaL, magan, makkan, quil, xavier,
    >etc..). In Devanagari individual glyph shapes represent each of
    >these phonemes.

    Tamil has complex reading rules because it lost original Brahmic
    letters. So what?

    >In Tamil aspirated and many other sounds are written using a single
    >modulating indicator called Aytham, yet an unacceptably high number
    >of code points allocated for Tamil is deprecated and made unusable
    >because of this transliteration encoding that never works.

    If you mean the empty spaces are wasteful, yes they are. They are
    however not harmful.

    >It is important to understand that a superior architecture like
    >Unicode, made inferior by misguided political requirement is not
    >going to be an easy task to resolve.

    Gosh, Tamil seems to be implemented here on my Macintosh running OS
    X. Looks like someone has solved it anyway.

    >There fore it is very important that we start work on fixing the bug
    >caused by transliteration based encoding to do the collation as
    >required. We will analyse the collation techniques available to fix
    >the problem caused by transliteration based encoding bug.

    You are not going to get anywhere as long as you are stuck on this
    idea that Unicode has anything to do with transliteration.

    >To be continued....

    Write a UTN on Tamil sorting if you think that is necessary. Such a
    document would be useful, perhaps. But your apparent political agenda
    about the superiority and uniqueness of Tamil is tiresome at best.

    Michael Everson * * Everson Typography *  *

    This archive was generated by hypermail 2.1.5 : Sat Jun 25 2005 - 12:18:50 CDT