Tamil Collation vs Transliteration/Transcription Enc

From: Sinnathurai Srivas (sisrivas@blueyonder.co.uk)
Date: Fri Jun 24 2005 - 14:06:40 CDT

  • Next message: Tim Greenwood: "Re: Tamil Collation vs Transliteration/Transcription Enc"

    (Draft version)

    Tamil Collation vs Transliteration/Transcription Encodinng

    Though it undergoes numerous implementation problems, Unicode is based on a
    highly sophisticated technical architecture. In this article how Unicode
    mishandled Tamil collation and analyses the alternative solutions to attain
    Tamil Collation.

    Any implementation would initially attempt for a natural sort order for a
    language, where by the default hex order of codes would be a natural sort
    order of that language. The question now is why Unicode decided to deny this
    natural facility to Tamil, in its implementation strategy. The answer is, in
    Unicode's consideration there is another requirement that was considered
    more important than sorting order of Tamil. The requirement was, the
    transliteration properties of code order of all Indian languages must be the
    same and sort order was considered a minute matter in comparison to sort
    order. Unicode decided that writing softwares to transliterate between
    different Indic languages is a more daunting task than writing software to
    collate a language.

    However, Devanagari had it's upper hand in getting it natural sort order
    encoded, while other languages were forced to abandon the natural sort order
    in favour of transliteration code order. All these other languages now face
    the task of implementing fixes to get the collation working.

    Unlike Latin based languages, each Indic languages use alphabet of their
    own. For this reason abandoning natural sort order in favour of
    transliteration sort order was not a technical but a political decision by
    Unicode. Unicode did understand the damage it made to the suffering
    languages, but decided to go along with it's political decision, forcing
    minority languages to obey orders. Software routines to do transliteration
    is a simple task, compared to software routines to collate a scrambled
    encoding. Unicode still decided to enforce its political agenda over a
    technical requirement.

    Unicode transliteration scheme does not work. The saddest thing of all is
    that the transliteration does not work as Unicode hoped it. There never was
    a simple transliteration mechanism suitable for encoding different
    languages. For example, Tamil writing system is based on phonemic based
    Alphabet system, while Devanagari is based on phonemic only system. In Tamil
    k = k, h, g, x, q, c (mahaL, magan, makkan, quil, xavier, etc..). In
    Devanagari individual glyph shapes represent each of these phonemes. In
    Tamil aspirated and many other sounds are written using a single modulating
    indicator called Aytham, yet an unacceptably high number of code points
    allocated for Tamil is deprecated and made unusable because of this
    transliteration encoding that never works.

    It is important to understand that a superior architecture like Unicode,
    made inferior by misguided political requirement is not going to be an easy
    task to resolve. There fore it is very important that we start work on
    fixing the bug caused by transliteration based encoding to do the collation
    as required. We will analyse the collation techniques available to fix the
    problem caused by transliteration based encoding bug.

    To be continued....



    This archive was generated by hypermail 2.1.5 : Fri Jun 24 2005 - 14:09:44 CDT