From: N. Ganesan (email@example.com)
Date: Mon Jun 27 2005 - 06:26:58 CDT
Richard Wordingham (firstname.lastname@example.org)
>Have you read and understood the Unicode Collation >Algorithm (
>What you most importantly need to propose is a
>re-ordering of the weights (basically 195C.0020.0002
>to 1972.0020.0002) assigned to the Tamil consonants
>(U+0B95 to U+0BB9) in http://www.unicode.org/Public/UCA/latest/allkeys.txt
>(currently Version 4.1.0). If you can demonstrate
>that your proposed weights gives the correct order,
>I don't see why the change shouldn't be accepted.
>If you can fix any other collation 'errors' at the
>same time, I think so much the better.
>There is no explicit undertaking that the default
>Unicode Collation Algorithm is correct for any language,
>but I am not aware of any reason that it would be
>wrong to make it work properly for the collation
>of items in the Tamil script.
Pl. see a collation chart for Tamil:
Or, in pdf form:
I'd love to know when will the SHA (u+0bb6)
Uniscribe be updated and SHA will work in
Windows correctly? Fixing Uniscribe
to render SHA series in Tamil script -
is it a job to be done by companies like Microsoft?
In another e-mail, R. Wordingham wrote:
>But in this case, distinguishing the Tamil
>script from its sister script Malayalam
>facilitates the exclusion of letters from
>the ancestral Grantha script!
The Tamil Grantha script is another script,
See diiferences between Tamil script and
Tamil Grantha script:
Good ref.s are by (1) R. Gruenendahl and
(2) P. Visalakshy. There are many Sanskrit
books being printed with the Tamil Grantha
script, there are 1000s of books in that
script in Adyar Theosophical Library, Chennai (Madras),
Tamil Nadu, India. Like Devanagari script,
Tamil Grantha script too has many conjuncts
and both their sort orders are same.
I've written a draft of the Tamil Grantha
code page proposal.
The default weights already address this. The current
weight entries for VOWEL SIGN O and its
decomposition are given in the table by:
0BCA ; [.197B.0020.0002.0BCA] # TAMIL VOWEL SIGN O
0BC6 0BBE ; [.197B.0020.0002.0BCA] # TAMIL VOWEL SIGN O
Note that the sorting algorithm will treat them as identical.
A similar entry for 'ksh' would start '0B95 0BCD 0BB7'.
I'm not sure these canonical decompositions are breaches
of architecture any more than other canonical expansions.
I can't get up worked about this issue because for Thai,
for example, only the decomposed form is available.
Like Thai, Tamil also employs in majority,
and in a wide class of applications (eg.,
loans from English, the West or Islamic world)
"ksh" only as non-conjunct. So we at INFITT
are discussing a proposal to make the
non-conjunct KSHA as default, and to create
conjugated ksha with ZWJ. The majority behaviour
of ksha as non-conjunct is in Tamil, but
the non-conjunct ksha is not known in other
Indic scripts. It is a Tamil special.
This archive was generated by hypermail 2.1.5 : Mon Jun 27 2005 - 06:27:53 CDT