Re: Tamil Collation vs Transliteration/Transcription Enc Version2

From: Sinnathurai Srivas (sisrivas@blueyonder.co.uk)
Date: Sun Jun 26 2005 - 03:38:48 CDT

  • Next message: Richard Wordingham: "Re: Tamil Collation vs Transliteration/Transcription Enc Version2"

    ----- Original Message -----
    From: "Richard Wordingham" <richard.wordingham@ntlworld.com>
    To: <unicode@unicode.org>
    Sent: Sunday, June 26, 2005 1:46 AM
    Subject: Re: Tamil Collation vs Transliteration/Transcription Enc Version2

    > Sinnathurai Srivas wrote:
    >
    >> Unfortunately, on the issue of collation, due to designs of ISCII,
    >> Unicode has to abandon the sorting based encoding of Tamil in favour of
    >> transliteration based encoding.
    >
    >> For example Tamil K will indicate k, h, g, q, x and other related phoneme
    >> while Devanagari would have individual character shapes representing
    >> individual phonemes. Tamil is based on Alphabet based phonemic system,
    >> while Devanagari is based on phonemic system.
    >
    > I think you mean that Tamil spelling uses digraphs for consonants while
    > Devanagari uses single letters. Unless the Tamil digraphs are sorted like
    > single letters, this happens to be irrelevant for Unicode.
    >

    No if by digraphs, you mean
    http://www.deltatranslator.com/delta/diagraphs.htm.

    each alphabet represent some related phonemes.

    examples,

    see the symbol k makingup phonemes h, g, k, q, x, c, etc..
    mahaL=makaL
    magan=makan
    makkaL=mqkkaL
    kuyil = quyil
    lukshmi=Luxmi
    kaN=caN

    see the symbols a makes up a^ a', etc

    Ammaa = A`mmaa`, A`mbrella
    Annai = A^nna^i, A^merica

    >> If Unicode changes it's policy from the unimportant and non functioning
    >> transliteration based encoding to one of natural sorting based encoding
    >> would be a superior solution. However, expecting Unicode to change it's
    >> encoding philosophy of ISCII based transliteration encoding to one of
    >> natural sorting based encoding is not going to be easy.
    >
    > You may care to view the UCA weights as a temporary conversion to a
    > sorting-based encoding.
    >
    Can you give some pointers.

    >> We will need to work on what is imposed on Tamil and find software
    >> solutions to resolve sorting requirements.
    >
    > If Tamil sorting can be expressed purely by a sorting order of consonants
    > and vowels, then the answer for sorting words is simply to rearrange the
    > weights on vowels and letters in the default UCA to accord with this
    > ordering.
    >
    99% yes.

    Simply, the pulli (virama!), the dependent vowels, vowels and Aytham need to
    be weighted and that's it.

    However, by Grammar, because of puLLi/virama there should not be conjuncts
    in Tamil. However Unicode has decided Tamil has one conjunct. (Not hundreds
    but one). Instead if treating the Grantham ksh as x, Unicode insists ksh is
    a conjunct. There is no other complications. So we may need to spend vast
    amount of mony to fix this insistance by Unicode, does not matter if only
    one or a thosand Tamil has a conjunct in the form of ksh and if collation
    need to be implemented as in Tamil design, Tamil need to accept Unicode
    design and work with it.

    There is also another problem, that was created by Unicode for Tamil.

    There are double encodings of some phenominan. Unicode violated it's own
    policy of standardising language by double encoding in the name of
    canonisim. This is also violation of Unicode architecture, wher by it
    violates linear and ligature philosophy by mis understanding canonism.
    see http://www.geocities.com/avarangal/rfc/RFC-TA-content_Tamil.html
    This unwanted inclusion may cut the 99% simple algorithm to about 80% simple
    plus 20% extremly complicated and back breaking algorithm, that might cause
    problem for a long time to come.

    Hence the violation by Unicode of puLLI/virama as defined in Grammar
    and violation of Unicode architecture in the name of canonism are the main
    problems that are holding simple solutionas required by Tamil Grammar.

    >> Tamil Grammar, probably the worlds oldest written and a sophisticated
    >> Grammar, clearly defines authography for Tamil. Here again Unicode does
    >> not seem to beleive that a language can have Grammar defining it's
    >> authography. In this regard it is not too late to bring to the attention
    >> of Unicode
    > consortium that how authography is defined and how sorting is used.
    >
    > Does the Tolkappiyam specify the use of Grantha letters? If it doesn't,
    > then it doesn't specify the orthography (note spelling) of Tamil.
    > However, orthography is often totally irrelevant for collation, as it is
    > for English and Thai.
    >> We will analise the requirements to be able to collate Tamil, by ways of
    >> software fixes.
    >
    > Just look at tailoring the UCA.
    >

    Tholkappiyam defines characters as abstract. Does not specify a character
    shape.
    There were many different character shapes all confirming to the same rule.
    Grantham was a late arrival.
    Probably some links to sindu shapes, probably some liks to kuami shapes. But
    the point is Grammar deals with authography and phonology.

    Grantham looks like based on phonemic only system. Tamil Grammar is based on
    Alphabet based Phonemic system. Devanagari uses Grantham principles. Tamil
    does not use Grantham principles, but has a well defined authography as part
    of Grammar.

    >> To be continued....
    >
    > I hope with some constructive suggestions.
    >
    > Richard.
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Sun Jun 26 2005 - 03:41:39 CDT