Re: Tamil Collation vs Transliteration/Transcription Enc Version2

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sat Jun 25 2005 - 19:46:36 CDT

  • Next message: Sinnathurai Srivas: "Re: Tamil Collation vs Transliteration/Transcription Enc Version2"

    Sinnathurai Srivas wrote:

    > Unfortunately, on the issue of collation, due to designs of ISCII, Unicode
    > has to abandon the sorting based encoding of Tamil in favour of
    > transliteration based encoding.

    > For example Tamil K will indicate k, h, g, q, x and other related phoneme
    > while Devanagari would have individual character shapes representing
    > individual phonemes. Tamil is based on Alphabet based phonemic system,
    > while Devanagari is based on phonemic system.

    I think you mean that Tamil spelling uses digraphs for consonants while
    Devanagari uses single letters. Unless the Tamil digraphs are sorted like
    single letters, this happens to be irrelevant for Unicode.

    > If Unicode changes it's policy from the unimportant and non functioning
    > transliteration based encoding to one of natural sorting based encoding
    > would be a superior solution. However, expecting Unicode to change it's
    > encoding philosophy of ISCII based transliteration encoding to one of
    > natural sorting based encoding is not going to be easy.

    You may care to view the UCA weights as a temporary conversion to a
    sorting-based encoding.

    > We will need to work on what is imposed on Tamil and find software
    > solutions to resolve sorting requirements.

    If Tamil sorting can be expressed purely by a sorting order of consonants
    and vowels, then the answer for sorting words is simply to rearrange the
    weights on vowels and letters in the default UCA to accord with this
    ordering.

    > Tamil Grammar, probably the worlds oldest written and a sophisticated
    > Grammar, clearly defines authography for Tamil. Here again Unicode does
    > not seem to beleive that a language can have Grammar defining it's
    > authography. In this regard it is not too late to bring to the attention
    > of Unicode
    consortium that how authography is defined and how sorting is used.

    Does the Tolkappiyam specify the use of Grantha letters? If it doesn't,
    then it doesn't specify the orthography (note spelling) of Tamil. However,
    orthography is often totally irrelevant for collation, as it is for English
    and Thai.
    > We will analise the requirements to be able to collate Tamil, by ways of
    > software fixes.

    Just look at tailoring the UCA.

    > To be continued....

    I hope with some constructive suggestions.

    Richard.



    This archive was generated by hypermail 2.1.5 : Sat Jun 25 2005 - 19:49:41 CDT