Re: "markers" codepoints for some combining letter sets in Dravidian scripts

From: N. Ganesan (naa.ganesan@gmail.com)
Date: Wed Apr 12 2006 - 21:34:34 CST

  • Next message: Markus Scherer: "Re: Unicode 5.0 Character Count?"

    The need for "markers" in the Dravidian scripts
    of India was mentioned. As an example,
    2 "markers" for Telugu (Telugu abbreviation marker,
    Telugu alveolar marker for TCA and TJA) and 3 "markers"
    for Malayalam scripts were shown as illustrative
    examples.

    Let us take the case of Malayalam possible
    code-points:

    (a) Malayalam gemination marker:
    It has a ramp/saw_tooth shape which ligates at the
    bottom in conjuncts like cca and rvva, etc.,
    In transliteration, the geminate marker
    can be represented for cca as c.ca .

    (b) Malayalam short u marker:

    Unicode has a Virama based model
    where the Virama normally deletes/"kills"
    inherent "a" in "consonants"/akSarams like [ka].

    So, in order to make abugidas with short u,
    no need to stack a Virama after
    [consonant] + [vowel modifier u] abugidas.
    That will break the normal Unicode meaning of Virama
    in Indic scripts, and create an unusual function
    for Virama only in Malayalam. Similary,
    no need to use [ku], ZWxJ followed by Virama
    for saMvRthokaram u in Malayalam.
    Typically, samvRuthokaram u is transliterated
    as [consonant] + u with breve (U+016D).
    http://homepage.ntlworld.com/stone-catend/trinotes.htm

    (c) Malayalam 'cillu' (prepausal consonant) marker:

    Like (b) where Virama properties for just Malayalam
    alone need to be changed if we don't have "short u"
    marker code-point and a corrsponding combining sign,
    it is better if we do not use special properties
    for ZWxJ, Virama in the case of Malayalam cillus.
    Antoine Leca mentioned a cillu-y today,
    possibly there are some more cillus (that will
    be brought to attention). So, the question is:
    does UTC want to encode, say, 10,12 or 14
    code-points for cillus (which will divorce them
    from their root consonants which is not good
    linguistically)?

    Please note the distinct cillu-m (which not in the shape of
    Malayalam anuswaram). The distinct shapes
    of Malayalam cillus (Ref. : R. Gruenendahl)
    are given in the pdf attachment in:
    http://groups.google.com/group/CTamil/msg/f5ac450e80b33bfb
    (click download, save to desktop to open the pdf).
    Transliteration of cillu is done with a : sign.
    Note 12 in
    http://homepage.ntlworld.com/stone-catend/trinotes.htm

    In the pdf file, Cillus shown are for 9 consonants:
    ka, na, nna, ma, ra, ta, la, lla, llla.
    Note the llla-specific cillu in the pdf.
    Also, the cillu_l and cillu_t can be differentiated
    with the glyphs given. Take for example,
    the third glyph for cillu_t and the second one
    for cillu_l. This is also adhered to in
    the Library of Congress ALA-LC romanization table:
    http://www.loc.gov/catdir/cpso/romanization/malayala.pdf
    Of course, the codepoints for cillu_l and cillu_t
    are different.

    A. Leca wrote:
    >Until now, it is not known if cillu-l (and,
    >as far as I can see, your putative cillu-t as well)
    >should be encoded as <0D31, 0D4D, 200D>
    >or U+0D7B. But nothing more.

    Please note that there is *no* separate cillu_rr,
    so code point for a Malayalam cillu with 0D31
    does *not* arise. Refer ALA-LC romanization
    or ISO 15919 etc., In word-final position, cillu_r
    is spoken out as Malayalam letter RR. So,
    in word-final position, cillu_r is transliterated
    as _r (r with an underline) in Roman script.
    But it is still a cillu_r like the rest of cillu_r's.
    ISO 15919, ALA-LC tables, and other books
    do not give any cillu_rr.

    In Unicode, cillu letters of Malayalam
    can be called as "Malayalam prepausal consonant marker"
    or "Malayalam cillu marker".
    This combining sign, with properties like
    anusvara, will have a dotted circle.
    Cillu marker code-point is highly recommended
    (1) for not imposing new properties on ZWxJ
    just for Malayalam among Indic scripts
    (2) cillus are too many to be given separate code
    points (Future may throw up more cillus)
    which will move them away from root consonants
    (Chitrajakumar/Gangadharan doument).

    N. Ganesan



    This archive was generated by hypermail 2.1.5 : Wed Apr 12 2006 - 21:55:17 CST