Re: "markers" codepoints for some combining letter sets in Dravidian scripts

From: N. Ganesan (naa.ganesan@gmail.com)
Date: Wed Apr 12 2006 - 21:34:34 CST

Next message: Markus Scherer: "Re: Unicode 5.0 Character Count?"

Previous message: Curtis Clark: "Re: Unicode 5.0 Character Count?"
Maybe in reply to: N. Ganesan: ""markers" codepoints for some combining letter sets in Dravidian scripts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

The need for "markers" in the Dravidian scripts
of India was mentioned. As an example,
2 "markers" for Telugu (Telugu abbreviation marker,
Telugu alveolar marker for TCA and TJA) and 3 "markers"
for Malayalam scripts were shown as illustrative
examples.

Let us take the case of Malayalam possible
code-points:

(a) Malayalam gemination marker:
It has a ramp/saw_tooth shape which ligates at the
bottom in conjuncts like cca and rvva, etc.,
In transliteration, the geminate marker
can be represented for cca as c.ca .

(b) Malayalam short u marker:

Unicode has a Virama based model
where the Virama normally deletes/"kills"
inherent "a" in "consonants"/akSarams like [ka].

So, in order to make abugidas with short u,
no need to stack a Virama after
[consonant] + [vowel modifier u] abugidas.
That will break the normal Unicode meaning of Virama
in Indic scripts, and create an unusual function
for Virama only in Malayalam. Similary,
no need to use [ku], ZWxJ followed by Virama
for saMvRthokaram u in Malayalam.
Typically, samvRuthokaram u is transliterated
as [consonant] + u with breve (U+016D).
http://homepage.ntlworld.com/stone-catend/trinotes.htm

Like (b) where Virama properties for just Malayalam
alone need to be changed if we don't have "short u"
marker code-point and a corrsponding combining sign,
it is better if we do not use special properties
for ZWxJ, Virama in the case of Malayalam cillus.
Antoine Leca mentioned a cillu-y today,
possibly there are some more cillus (that will
be brought to attention). So, the question is:
does UTC want to encode, say, 10,12 or 14
code-points for cillus (which will divorce them
from their root consonants which is not good
linguistically)?

Please note the distinct cillu-m (which not in the shape of
Malayalam anuswaram). The distinct shapes
of Malayalam cillus (Ref. : R. Gruenendahl)
are given in the pdf attachment in:
http://groups.google.com/group/CTamil/msg/f5ac450e80b33bfb
(click download, save to desktop to open the pdf).
Transliteration of cillu is done with a : sign.
Note 12 in
http://homepage.ntlworld.com/stone-catend/trinotes.htm

In the pdf file, Cillus shown are for 9 consonants:
ka, na, nna, ma, ra, ta, la, lla, llla.
Note the llla-specific cillu in the pdf.
Also, the cillu_l and cillu_t can be differentiated
with the glyphs given. Take for example,
the third glyph for cillu_t and the second one
for cillu_l. This is also adhered to in
the Library of Congress ALA-LC romanization table:
http://www.loc.gov/catdir/cpso/romanization/malayala.pdf
Of course, the codepoints for cillu_l and cillu_t
are different.

A. Leca wrote:
>Until now, it is not known if cillu-l (and,
>as far as I can see, your putative cillu-t as well)
>should be encoded as <0D31, 0D4D, 200D>
>or U+0D7B. But nothing more.

Please note that there is *no* separate cillu_rr,
so code point for a Malayalam cillu with 0D31
does *not* arise. Refer ALA-LC romanization
or ISO 15919 etc., In word-final position, cillu_r
is spoken out as Malayalam letter RR. So,
in word-final position, cillu_r is transliterated
as _r (r with an underline) in Roman script.
But it is still a cillu_r like the rest of cillu_r's.
ISO 15919, ALA-LC tables, and other books
do not give any cillu_rr.

In Unicode, cillu letters of Malayalam
can be called as "Malayalam prepausal consonant marker"
or "Malayalam cillu marker".
This combining sign, with properties like
anusvara, will have a dotted circle.
Cillu marker code-point is highly recommended
(1) for not imposing new properties on ZWxJ
just for Malayalam among Indic scripts
(2) cillus are too many to be given separate code
points (Future may throw up more cillus)
which will move them away from root consonants
(Chitrajakumar/Gangadharan doument).

N. Ganesan

Next message: Markus Scherer: "Re: Unicode 5.0 Character Count?"
Previous message: Curtis Clark: "Re: Unicode 5.0 Character Count?"
Maybe in reply to: N. Ganesan: ""markers" codepoints for some combining letter sets in Dravidian scripts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Apr 12 2006 - 21:55:17 CST