Re: Tamil Collation vs Transliteration/Transcription Enc Version2

From: Sinnathurai Srivas (sisrivas@blueyonder.co.uk)
Date: Sun Jun 26 2005 - 03:38:48 CDT

Next message: Richard Wordingham: "Re: Tamil Collation vs Transliteration/Transcription Enc Version2"

Previous message: Richard Wordingham: "Re: Tamil Collation vs Transliteration/Transcription Enc Version2"
In reply to: Richard Wordingham: "Re: Tamil Collation vs Transliteration/Transcription Enc Version2"
Next in thread: Richard Wordingham: "Re: Tamil Collation vs Transliteration/Transcription Enc Version2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

----- Original Message -----
From: "Richard Wordingham" <richard.wordingham@ntlworld.com>
To: <unicode@unicode.org>
Sent: Sunday, June 26, 2005 1:46 AM
Subject: Re: Tamil Collation vs Transliteration/Transcription Enc Version2

> Sinnathurai Srivas wrote:
>
>> Unfortunately, on the issue of collation, due to designs of ISCII,
>> Unicode has to abandon the sorting based encoding of Tamil in favour of
>> transliteration based encoding.
>
>> For example Tamil K will indicate k, h, g, q, x and other related phoneme
>> while Devanagari would have individual character shapes representing
>> individual phonemes. Tamil is based on Alphabet based phonemic system,
>> while Devanagari is based on phonemic system.
>
> I think you mean that Tamil spelling uses digraphs for consonants while
> Devanagari uses single letters. Unless the Tamil digraphs are sorted like
> single letters, this happens to be irrelevant for Unicode.
>

No if by digraphs, you mean
http://www.deltatranslator.com/delta/diagraphs.htm.

each alphabet represent some related phonemes.

examples,

see the symbol k makingup phonemes h, g, k, q, x, c, etc..
mahaL=makaL
magan=makan
makkaL=mqkkaL
kuyil = quyil
lukshmi=Luxmi
kaN=caN

see the symbols a makes up a^ a', etc

Ammaa = A`mmaa`, A`mbrella
Annai = A^nna^i, A^merica

>> If Unicode changes it's policy from the unimportant and non functioning
>> transliteration based encoding to one of natural sorting based encoding
>> would be a superior solution. However, expecting Unicode to change it's
>> encoding philosophy of ISCII based transliteration encoding to one of
>> natural sorting based encoding is not going to be easy.
>
> You may care to view the UCA weights as a temporary conversion to a
> sorting-based encoding.
>
Can you give some pointers.

>> We will need to work on what is imposed on Tamil and find software
>> solutions to resolve sorting requirements.
>
> If Tamil sorting can be expressed purely by a sorting order of consonants
> and vowels, then the answer for sorting words is simply to rearrange the
> weights on vowels and letters in the default UCA to accord with this
> ordering.
>
99% yes.

Simply, the pulli (virama!), the dependent vowels, vowels and Aytham need to
be weighted and that's it.

However, by Grammar, because of puLLi/virama there should not be conjuncts
in Tamil. However Unicode has decided Tamil has one conjunct. (Not hundreds
but one). Instead if treating the Grantham ksh as x, Unicode insists ksh is
a conjunct. There is no other complications. So we may need to spend vast
amount of mony to fix this insistance by Unicode, does not matter if only
one or a thosand Tamil has a conjunct in the form of ksh and if collation
need to be implemented as in Tamil design, Tamil need to accept Unicode
design and work with it.

There is also another problem, that was created by Unicode for Tamil.

There are double encodings of some phenominan. Unicode violated it's own
policy of standardising language by double encoding in the name of
canonisim. This is also violation of Unicode architecture, wher by it
violates linear and ligature philosophy by mis understanding canonism.
see http://www.geocities.com/avarangal/rfc/RFC-TA-content_Tamil.html
This unwanted inclusion may cut the 99% simple algorithm to about 80% simple
plus 20% extremly complicated and back breaking algorithm, that might cause
problem for a long time to come.

Hence the violation by Unicode of puLLI/virama as defined in Grammar
and violation of Unicode architecture in the name of canonism are the main
problems that are holding simple solutionas required by Tamil Grammar.

>> Tamil Grammar, probably the worlds oldest written and a sophisticated
>> Grammar, clearly defines authography for Tamil. Here again Unicode does
>> not seem to beleive that a language can have Grammar defining it's
>> authography. In this regard it is not too late to bring to the attention
>> of Unicode
> consortium that how authography is defined and how sorting is used.
>
> Does the Tolkappiyam specify the use of Grantha letters? If it doesn't,
> then it doesn't specify the orthography (note spelling) of Tamil.
> However, orthography is often totally irrelevant for collation, as it is
> for English and Thai.
>> We will analise the requirements to be able to collate Tamil, by ways of
>> software fixes.
>
> Just look at tailoring the UCA.
>

Tholkappiyam defines characters as abstract. Does not specify a character
shape.
There were many different character shapes all confirming to the same rule.
Grantham was a late arrival.
Probably some links to sindu shapes, probably some liks to kuami shapes. But
the point is Grammar deals with authography and phonology.

Grantham looks like based on phonemic only system. Tamil Grammar is based on
Alphabet based Phonemic system. Devanagari uses Grantham principles. Tamil
does not use Grantham principles, but has a well defined authography as part
of Grammar.

>> To be continued....
>
> I hope with some constructive suggestions.
>
> Richard.
>
>
>

Next message: Richard Wordingham: "Re: Tamil Collation vs Transliteration/Transcription Enc Version2"
Previous message: Richard Wordingham: "Re: Tamil Collation vs Transliteration/Transcription Enc Version2"
In reply to: Richard Wordingham: "Re: Tamil Collation vs Transliteration/Transcription Enc Version2"
Next in thread: Richard Wordingham: "Re: Tamil Collation vs Transliteration/Transcription Enc Version2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Jun 26 2005 - 03:41:39 CDT