Re: Tamil Collation vs Transliteration/Transcription Enc

From: Michael Everson (
Date: Sat Jun 25 2005 - 08:20:39 CDT

  • Next message: Richard Wordingham: "Re: Deprecate Tamil 0bb6"

    At 11:46 +0100 2005-06-25, Sinnathurai Srivas wrote:

    >>It is the case that the Indic blocks (for the major scripts) have
    >>one-to-one positional equivalences. This was unnecessary, and
    >>wasteful of space -- but it was inherited from ISCII, so you can go
    >>and blame them if you don't like it. Having said that, even though
    >>it was unnecessary and wasteful of space, it was in no way harmful
    >>to any of the Indic scripts.
    >I can at least talk to Unicode, I do not think ISCII would have such
    >traditions. Any way we are not talking about ISCII Code, we are
    >talking about Unicode.

    Be serious. Your argument is that Unicode chose a suboptimum code
    table layout for Tamil (as regards binary hex sorting). If the table
    is suboptimum, it is because ISCII chose that layout, not because
    Unicode (or WG2) chose it.

    >>ISO/IEC 14651 and the Unicode Collation Algorithm can sort anything
    >>correctly, so long as the sort is algorithmic.
    >Ofcourse, as we did not choose the natural sort order, a fix, like a
    >bug fix, need to be deviced to do the sorting.

    Incorrect. EVERY script has to tailor for correct sorting. There is
    no BUG to fix. Whether Tamil were in the code table in your "natural"
    sorting order or whether the Tamil characters were scattered randomly
    in that code table you would STILL have to tailor for correct sorting.

    >>This is inappropriate rhetoric. Devanagari is not a godlike force
    >>looking for superiority over Tamil, Redjang, Tibetan, and Lepcha.
    >Why the a transliteration based encoding, when it does not work. Do
    >you have a plan to change these languages to toe the Devanagari
    >line, some time in the long distance.

    Oh, stop it. Blame ISCII, if you must, for the arrangement of the
    characters in the code table. Everyone else is fine with it, and as
    Michael Kaplan has pointed out to us, the majority of the Tamil
    community is fine with it as well. Perhaps you should talk to your
    colleagues and learn why.

    "Toeing the Devanagari line" is absolute RUBBISH. Anyway it would be
    good King Ashoka's line.

    >Devanagari is encoded in sort order form,

    It isn't. Devanagari has to tailor just as the others do.

    >while others are encoded in transliteration form, that does not work.

    that? Now you've got someone else to blame for what "doesn't work".

    >>Languages face no tasks. Implementors of the Unicode Collation
    >>Algorithm and ISO/IEC 14651 have to tailor those standards to meet
    >>their needs.
    >Even after 30 years of existence, implementors could not get the
    >sorting to work in Unicode.

    I'm sure that this is incorrect.

    >The natural sort order, which never needed any significant inpu by
    >developers was abandoned in favour of transliteration based sort
    >order, which never works. We now have sort order not working at
    >least for the for seeable future.

    I'm sure that this is incorrect.

    >>Nonsense. (1) The order of the characters in a code table is
    >>irrelevant with regard to sorting, and (2) the order of the
    >>characters in the Tamil code table follows ISCII.
    >Why then is transliteration based encoding.

    Please learn to listen; I have answered this already. The order of
    the characters in the Unicode code table follows the order of the
    characters in the ISCII code table, for reasons which are largely
    irrelevant today.

    >Why not natural sorting based encoding.

    It doesn't matter what order the characters appear in in the code
    table. They could all be ordered BACKWARDS from the "natural"
    encoding and correct sorting could still be done with the correct

    >Developers need not doing any significant work to get the sorting
    >working with natural sorting.

    They are tailoring the default template just as Tamil developers do.

    >Unicode is a transliteration based encoding, except for Devanagari,
    >still there is no resemblence in character shapes between languages,
    >which all needed their own code points.
    >Do you see what sounds outrageous?

    What I see, quite frankly, is that you don't know what you are talking about.

    >Transliteration based encoding make sense.
    >Natural sort order based encoding for Devanagari make sense.
    >Transliteration based encoding never works for what it is intended
    >for, make sense.
    >Who rules who makes sense, and that is political.
    >Abandoning natural sort order in favor of toeing Devanagari line
    >that never work is a political decision. Because I say that is
    >political, are you correct in saying I'm political?

    The Unicode encoding is based on ISCII, not transliteration.
    Brahmic scripts all have the same structure, Tamil included, though
    Tamil lost some of the original Brahmic letters.
    The encoding is based on ISCII, not transliteration.
    The encoding is what it is and cannot and will not be changed, so
    what is it that you are trying to achieve by repeating this mantra of
    transliteration and politics?
    "Toeing the Devanagari line" is simply nonsense.

    >>Unicode has no transliteration scheme. Your belief that it does
    >>because ISCII had a particular structure in its code tables is
    >Yes, it does. The Unicode encoding is baed on transliteration scheme
    >and I do not think you would ask me to spend any more time with this
    >obvious fact.

    Unicode is based on ISCII. If ISCII is based on a transliteration
    scheme, then go and complain to the creators of ISCII. It won't make
    any difference there, either.

    >>Tamil has complex reading rules because it lost original Brahmic
    >>letters. So what?
    >No sir, Bramic was not their, when initial tamil Grammars were there.

    Excuse me? The Tamil script is descended from Brahmi, just as all the
    major scripts of India are. The Tamil language, of course, is a
    Dravidian language, unrealated to the Sanskrit and Prakrit written by

    >I do not think you know about Tamil Grammar.

    No, I know about the structure of the world's writing systems.

    >The alphabet structre has it Grammar rules.

    Writing systems have structure. Orthographies organize the elements
    of writing systems. "Grammar" is not a word that can be applied here.

    >I do not think any one in Unicode will ever wanted to read about at
    >least the rules on alphabets,

    You are well mistaken.

    >but dying to make alphabet based and other Gramatical based
    >decisions for Tamil.

    We use Brahmic encoding principles for Brahmic scripts.

    >I know we are power less to stop that. At least you could have some
    >considerations towards our traditions, instead of trying to change
    >something that you do not know what iot is.

    We have not changed the Tamil writing system.

    >>If you mean the empty spaces are wasteful, yes they are. They are
    >>however not harmful.

    I'll take that as agreement.

    >>Gosh, Tamil seems to be implemented here on my Macintosh running OS
    >>X. Looks like someone has solved it anyway.
    >Are you talking about collation in Mac?

    I assume that Tamil collates correctly on OS X. Is there a test list somewhere?

    >>You are not going to get anywhere as long as you are stuck on this
    >>idea that Unicode has anything to do with transliteration.
    >Well understanding what you got is the best way to resolve what you need.

    What we have has nothing to do with "transliteration". The order of
    the characters in the code table has NOTHING to do with sorting
    order. That is specified in a DIFFERENT way.

    >It is not superior that i claim, I claim it is sophisticated. There
    >are other languages that are very sophisticated in their Grammar
    >too. Tamil has a sophisticated Grammar, that is probably the oldest
    >written, but still surviving Grammar in the world. It is
    >sophisticated, you need to understand it before trying to change it
    >for your needs. I never claim it is superior.

    The Tamil encoding wastes some code space because of the early
    reliance on ISCII. That is the ONLY thing that is regrettable about
    the Tamil encoding.

    Michael Everson * * Everson Typography *  *

    This archive was generated by hypermail 2.1.5 : Sat Jun 25 2005 - 12:25:46 CDT