Re: Tamil Collation vs Transliteration/Transcription Enc

From: Sinnathurai Srivas (sisrivas@blueyonder.co.uk)
Date: Sat Jun 25 2005 - 11:45:31 CDT

  • Next message: Michael Everson: "Re: Tamil Collation vs Transliteration/Transcription Enc"

    ----- Original Message -----
    From: "Michael Everson" <everson@evertype.com>
    To: "Sinnathurai Srivas" <sisrivas@blueyonder.co.uk>
    Cc: "Unicode List" <unicode@unicode.org>
    Sent: Saturday, June 25, 2005 2:20 PM
    Subject: Re: Tamil Collation vs Transliteration/Transcription Enc

    > At 11:46 +0100 2005-06-25, Sinnathurai Srivas wrote:
    >
    >>>It is the case that the Indic blocks (for the major scripts) have
    >>>one-to-one positional equivalences. This was unnecessary, and wasteful of
    >>>space -- but it was inherited from ISCII, so you can go and blame them if
    >>>you don't like it. Having said that, even though it was unnecessary and
    >>>wasteful of space, it was in no way harmful to any of the Indic scripts.
    >>>
    >>I can at least talk to Unicode, I do not think ISCII would have such
    >>traditions. Any way we are not talking about ISCII Code, we are talking
    >>about Unicode.
    >
    > Be serious. Your argument is that Unicode chose a suboptimum code table
    > layout for Tamil (as regards binary hex sorting). If the table is
    > suboptimum, it is because ISCII chose that layout, not because Unicode (or
    > WG2) chose it.
    >

    I take this point that it was designed by ISCII and Unicode implemeted ISCII
    design.

    >>>ISO/IEC 14651 and the Unicode Collation Algorithm can sort anything
    >>>correctly, so long as the sort is algorithmic.
    >>>
    >>Ofcourse, as we did not choose the natural sort order, a fix, like a bug
    >>fix, need to be deviced to do the sorting.
    >
    > Incorrect. EVERY script has to tailor for correct sorting. There is no BUG
    > to fix. Whether Tamil were in the code table in your "natural" sorting
    > order or whether the Tamil characters were scattered randomly in that code
    > table you would STILL have to tailor for correct sorting.
    >
    >>>This is inappropriate rhetoric. Devanagari is not a godlike force looking
    >>>for superiority over Tamil, Redjang, Tibetan, and Lepcha.
    >>>
    >>Why the a transliteration based encoding, when it does not work. Do you
    >>have a plan to change these languages to toe the Devanagari line, some
    >>time in the long distance.
    >
    > Oh, stop it. Blame ISCII, if you must, for the arrangement of the
    > characters in the code table. Everyone else is fine with it, and as
    > Michael Kaplan has pointed out to us, the majority of the Tamil community
    > is fine with it as well. Perhaps you should talk to your colleagues and
    > learn why.
    >
    > "Toeing the Devanagari line" is absolute RUBBISH. Anyway it would be good
    > King Ashoka's line.
    >
    >>Devanagari is encoded in sort order form,
    >
    > It isn't. Devanagari has to tailor just as the others do.
    >

    you mean minute details?

    >>while others are encoded in transliteration form, that does not work.
    >
    > ALL INDIC SCRIPTS ARE TRANSLITERATIONS OF ASHOKA'S BRAHMI. How about that?
    > Now you've got someone else to blame for what "doesn't work".
    >
    >>>Languages face no tasks. Implementors of the Unicode Collation Algorithm
    >>>and ISO/IEC 14651 have to tailor those standards to meet their needs.
    >>>
    >>Even after 30 years of existence, implementors could not get the sorting
    >>to work in Unicode.
    >
    > I'm sure that this is incorrect.
    >
    >>The natural sort order, which never needed any significant inpu by
    >>developers was abandoned in favour of transliteration based sort order,
    >>which never works. We now have sort order not working at least for the for
    >>seeable future.
    >
    > I'm sure that this is incorrect.
    >
    >>>Nonsense. (1) The order of the characters in a code table is irrelevant
    >>>with regard to sorting, and (2) the order of the characters in the Tamil
    >>>code table follows ISCII.
    >>>
    >>Why then is transliteration based encoding.
    >
    > Please learn to listen; I have answered this already. The order of the
    > characters in the Unicode code table follows the order of the characters
    > in the ISCII code table, for reasons which are largely irrelevant today.
    >

    Well, I do not call for changing Tamil encoding. I only write about the
    foundation before goin into fixing the sorting. (There are calls to change
    encoding. I do not suscribe to that. However, if they mange to do that I'll
    be the first one to accept it.)

    Knowing and accepting reality and moving forward is my way.

    >>Why not natural sorting based encoding.
    >
    > It doesn't matter what order the characters appear in in the code table.
    > They could all be ordered BACKWARDS from the "natural" encoding and
    > correct sorting could still be done with the correct tailoring.
    >
    yes, it'l have to be done this way now.

    >>Developers need not doing any significant work to get the sorting working
    >>with natural sorting.
    >
    > They are tailoring the default template just as Tamil developers do.
    >
    >>Unicode is a transliteration based encoding, except for Devanagari, still
    >>there is no resemblence in character shapes between languages, which all
    >>needed their own code points.
    >>Do you see what sounds outrageous?
    >
    > What I see, quite frankly, is that you don't know what you are talking
    > about.
    >

    Do you want me to spend time to explain such an obvious matter.

    Map out the two encodins and compare. It will tell you it is based on
    transliteration encoding and Devanagari encoding.

    >>Transliteration based encoding make sense.
    >>Natural sort order based encoding for Devanagari make sense.
    >>Transliteration based encoding never works for what it is intended for,
    >>make sense.
    >>Who rules who makes sense, and that is political.
    >>Abandoning natural sort order in favor of toeing Devanagari line that
    >>never work is a political decision. Because I say that is political, are
    >>you correct in saying I'm political?
    >
    > The Unicode encoding is based on ISCII, not transliteration.
    > Brahmic scripts all have the same structure, Tamil included, though Tamil
    > lost some of the original Brahmic letters.
    > The encoding is based on ISCII, not transliteration.
    > The encoding is what it is and cannot and will not be changed, so what is
    > it that you are trying to achieve by repeating this mantra of
    > transliteration and politics?
    > "Toeing the Devanagari line" is simply nonsense.
    >
    But it is the reality.
    Transliteration does not work. Then what is the reason behind it.

    ISCII or whatever, now it is Unicode and it is encoded in transliteratino
    encoding.

    >>>Unicode has no transliteration scheme. Your belief that it does because
    >>>ISCII had a particular structure in its code tables is mistaken.
    >>>
    >>Yes, it does. The Unicode encoding is baed on transliteration scheme and I
    >>do not think you would ask me to spend any more time with this obvious
    >>fact.
    >
    > Unicode is based on ISCII. If ISCII is based on a transliteration scheme,
    > then go and complain to the creators of ISCII. It won't make any
    > difference there, either.
    >

    You mean Unicode is unable to help Unicode.

    >>>Tamil has complex reading rules because it lost original Brahmic letters.
    >>>So what?
    >>>
    >>No sir, Bramic was not their, when initial tamil Grammars were there.
    >
    > Excuse me? The Tamil script is descended from Brahmi, just as all the
    > major scripts of India are. The Tamil language, of course, is a Dravidian
    > language, unrealated to the Sanskrit and Prakrit written by Ashoka.
    >

    Tamil used various character shapes but always followed Grammar to utilise
    characters.

    >>I do not think you know about Tamil Grammar.
    >
    > No, I know about the structure of the world's writing systems.
    >

    Are you separating Grammar from writing system?

    >>The alphabet structre has it Grammar rules.
    >
    > Writing systems have structure. Orthographies organize the elements of
    > writing systems. "Grammar" is not a word that can be applied here.
    >
    No the first chapter in Tamil Grammar deals with alphabet and phonology,
    before diving into that extra ordinary diamension in analysing and
    regulating language. For me it looks unimaginable, that in that ancient
    times human existed, who's mind sets were so advanced to be able to sit down
    and write that detailed Grammar. But it did happen.

    For Tamil the writing system is based on grammar and not any abstract ideas.

    >>I do not think any one in Unicode will ever wanted to read about at least
    >>the rules on alphabets,
    >
    > You are well mistaken.

    What is matrai in Tamil, (earlist written Gramma of the defines matrai)?

    >
    >>but dying to make alphabet based and other Gramatical based decisions for
    >>Tamil.
    >
    > We use Brahmic encoding principles for Brahmic scripts.
    >
    Tamil Grammar is the important thing to consider, when encoding Tamil
    writing. tamil Grammar lays rule on writing systems. Ofcourse, Tamil grammar
    deals with necessary characters and not the shapes of them.

    I think you seriously misunderstand some important point about theory of
    characters, in relation to Tamil. what sounds, what base characters, timing,
    modulation and change are all part of manupulating to acheive phonology to
    the best effect. It is also a fundarmental principle in Tamil that we keep
    the number of characters to a minimum and make it extremly easy to expand.
    This is in vast contrast to other philosophies, that is create a vast array
    of alphabet for each and evry requirement.

    Also it is tradition that is important here. Tamil wishes to stay simple and
    sophisticated.
    I think it is Unicodes duty to understand and not dictate that it is Bramic.
    It is not Bramic. It is based on Tamil Grammar.

    The shapes of characters changed through out the history, but the rules on
    character usage had not changed. I hope you take your time to understand
    this. The grammar deals with abstract images, but presents rules for
    character usage.

    What I meanm by this is, there is no grammar in Tamil that deals with what
    character shape it should be. It is only the traddition and need at times
    decided what shapes. But the rules on what is needed and how to use alphabet
    had not changed. It did not change, because it is a sophisticated phonology.
    For this reason, please consider that the Grammar on Alphabet is far more
    important than the shapes forcing Tamil to take alient pheominents.

    >>I know we are power less to stop that. At least you could have some
    >>considerations towards our traditions, instead of trying to change
    >>something that you do not know what iot is.
    >
    > We have not changed the Tamil writing system.
    >
    >>>If you mean the empty spaces are wasteful, yes they are. They are however
    >>>not harmful.
    >>>
    >>mm
    >
    > I'll take that as agreement.
    >
    >>>Gosh, Tamil seems to be implemented here on my Macintosh running OS X.
    >>>Looks like someone has solved it anyway.
    >>>
    >>Are you talking about collation in Mac?
    >
    > I assume that Tamil collates correctly on OS X. Is there a test list
    > somewhere?
    >
    I'll see if any one respods.

    Mind that I can run a sort in Micrsoft excel and I get a result. That is not
    an intended sorting, but does some thing with default, natural sort order.
    Could you make sure that you are not misreading OSx, in a similar way.

    >>>You are not going to get anywhere as long as you are stuck on this idea
    >>>that Unicode has anything to do with transliteration.
    >>>
    >>Well understanding what you got is the best way to resolve what you need.
    >
    > What we have has nothing to do with "transliteration". The order of the
    > characters in the code table has NOTHING to do with sorting order. That is
    > specified in a DIFFERENT way.
    >

    I experimented with Excell/modified code point and sorting worked fine.
    There was no need for any additional software.

    Now we do not even have a complicated software that can sort Tamil.

    Of course Unicode has fundamental instructions ready to help a sorting
    software. We will investigate it. I'm talking about missed opportunity. We
    are ready to create a new opportunity in this front.

    >>It is not superior that i claim, I claim it is sophisticated. There are
    >>other languages that are very sophisticated in their Grammar too. Tamil
    >>has a sophisticated Grammar, that is probably the oldest written, but
    >>still surviving Grammar in the world. It is sophisticated, you need to
    >>understand it before trying to change it for your needs. I never claim it
    >>is superior.
    >
    > The Tamil encoding wastes some code space because of the early reliance on
    > ISCII. That is the ONLY thing that is regrettable about the Tamil
    > encoding.
    > --

    I take this point.

    > Michael Everson * * Everson Typography * * http://www.evertype.com
    >
    >



    This archive was generated by hypermail 2.1.5 : Sat Jun 25 2005 - 11:46:17 CDT