"markers" codepoints for some combining letter sets in Dravidian scripts

From: N. Ganesan (naa.ganesan@gmail.com)
Date: Tue Apr 11 2006 - 19:06:18 CST

  • Next message: N. Ganesan: "Re: "markers" codepoints for some combining letter sets in Dravidian scripts"

    This note is about the allocation of "markers" for
    the Dravidian scripts such as Tamil, Telugu,
    Kannada, Malayalam in Unicode code-charts. The "markers"
    will be very useful for working with historic letters,
    and esp. the cillakSaram consonants in Malayalam
    since the special cillu-markers will free up ZWxJ
    functionality, and no special properties in ZWJ/ZWNJ
    just for the case of Malayalam in the family of
    Indian scripts will be ineeded in TUS and implementation.
    Here are 4 examples of marker codepoints

    (1) Telugu alveolar marker

    There are two historic Telugu letters not used
    in current print books, but found in grammar books.
    Usually called TCA and TJA due to alveolar
    modification upon Telugu letters, CA and Ja

    TCA and TJA can be generated by
    "Telugu alveolar marker" sign with
    an annotation something like
    "This Telugu sign works only on CA and JA".
    This combining sign, with properties like
    anusvara, will have a dotted circle.

    (2) Telugu abbreviation marker:

    In Telugu script, words are abbreviated
    and shown as the first letter (abugida or vowel) of the word
    followed immediately by two closely spaced vertical
    lines. This combining sign, with properties like
    anusvara, with a dotted circle followed by
    two vertical lines II will be "Telugu
    contraction (or abbreviation) marker".

    There are many example words with
    Telugu contraction marker listed in books.

    (3) Malayalam cillu marker

    The Indic list has gone through several inputs
    on this problem for implementation.
    did some reserach, and I do not recall a glyph for cillu m,
    that is in line with cillu n, cillu nn, etc.,
    Will give this shape from published evidence
    tomorrow. Does UTC have the cillu m glyph?

    In Unicode, cillu letters of Malayalam
    can be called as "Malayalam prepausal consonant marker"
    or "Malayalam cillu marker".

    This combining sign, with properties like
    anusvara, will have a dotted circle.
    Cillus can come at the end of words
    and word-medially. ka, na, nna, la, lla, llla, r, t and m
    Please note that one Unicode cillu marker
    will provide cillus for all these 9 consonants
    (otherwise, 9 separate code-points! which
    will remove them from their root genetic
    consonants unaccepatble linguistically,)
    And, cillu marker will do away with any special
    rules for ZWJ/ZWNJ just for Malayalam
    among Indian scripts also.

    In usual transliteration, cillu_consonant = consonant (roman) followed by :
    So, eg., cillu_nna (in Malayalam) = n:na in roman, otherwise nna only.
    In the Rachana document on cillu letters, cillu_nma set
    in pg. 2 of L2/05-210, can be transliterated as "n:ma"
    with : representing the cillu-marker codepoint.
    In word-final position, cillu_r = Malayalam letter RR.

    (4) Malayalam short u marker

    Sometimes, especially in North Kerala,
    words ending with u has a short u
    indicated in orthography. The word-terminal
    [consonant] + short u is shown visibly
    using a virama. This is called saMvRttokaram
    in Sanskrit/Malayalam, and is used in Malayalam.

    It can be encoded as "Malayalam short u marker"
    which works with words ending as [consonant] + u
    This "Malayalam short u marker" is the last codepoint/sign
    in a word. There is a separate document on the
    importance of Samvruttokaram in Malayalam script
    authored by Drs. Chtrajakumar and Gangadharan.
    This combining "short u marker" in Malayalam Unicode,
    with properties like anusvara, will have a dotted circle.
    It has to work with [consonant]+ [u-vowel modifier].

    These 4 examples are given to illustrate the use
    of "marker" codepoints in Unicode among Dravidian scripts.
    There are some more markers that can be added in Unicode
    over time.

    N. Ganesan

    This archive was generated by hypermail 2.1.5 : Tue Apr 11 2006 - 19:07:49 CST