on Public Review Issue #66: Encoding of Chillu Forms in Malayalam

From: Cibu (cibucj@gmail.com)
Date: Tue Mar 22 2005 - 18:23:49 CST

  • Next message: John Hudson: "Re: 'lower case a' and 'script a' in unicode"

    Hi,

    Since Chillu-NA and NA + visible VIRAMA can give different meaning to
    a word, we cannot let the rendering system choose. Therefore, here are
    my preferences in the decreasing order:

    1) Explicitly encode Chillu characters. Various issues are discussed
    in detail below.
    2) <NA, VIRAMA> (without any joiner) should be mapped to NA with a
    visible Virama because, it will enforce uniformity. That is, Consonant
    + VIRAMA will form visible Virama symbol, irrespective of whether the
    consonant is capable of forming a Chillu or not. Example SA + VIRAMA
    and NA + VIRAMA will have visible Virama symbol.

    Issues in current representation of a Chillu letter as Consonant + Virama + ZWJ

    1) ZWJ and ZWNJ are supposed to be font directives, directing a font
    to select from two or more semantically same renderings. In case of
    Malayalam, this is no longer true. ZWJ becomes an alien language
    construct introduced to Malayalam by Unicode to produce Chillu
    letters. Thus, it is possible to produce two semantically different
    words, which differ only by ZWJ in their Unicode representation.
    Example: അവന്‍ (avan – meaning 'he') & അവന്‌ (avan~ - meaning 'for
    him')

    2) When a word is searched in Unicode text, the search algorithm
    should ignore ZWJ & ZWNJ because it should not care about the
    rendering of the word. From the first reasoning, this does not hold
    good for Malayalam. However, if search algorithm does not ignore ZWJ &
    ZWNJ, then it surely is going to miss some words, which are
    semantically same but rendered differently by using/omitting ZWJ/ZWNJ.

    3) Chillu of a consonant is different from its C1-conjoining form
    without inherent അ (A).

    3.1)Phonetic differences
    Consider the combination: Vow + CC + Con.
    Vow - a vowel
    CC - a consonant capable of forming Chillu
    Con - a consonant

    When CC takes its Chillu form, it is joins more with Vow. This effect
    produces a noticeable small stop between CC and Con.

    When CC takes, its C2/C1-conjoining forming form without inherent അ
    (A), it is pronounced closer to Con.

    Examples:
    ഉണര്‍വ്‌ ഉണര്വ്‌ (unlike its pair, not a meaningful word)
    കല്‍വിളക്ക്‌ വില്വാദ്രി
    കണ്‍വട്ടം കണ്വന്‍

    4) Chillu of a consonant can be treated as Anusvara
    A. R. Raja Raja Varma states in his Keralapanineeyam (which is the
    foremost grammar book of Malayalam) "Anusvara is the Chillu of MA".
    Thus, we can say that Malayalam has more than one Anusvara. There is
    Anusvara for MA; there is Anusvara for NA, NNA, LA etc. This is
    essentially same as saying Malayalam got some number of Chillus, which
    includes MA, NA, LA etc.

    If we look closely, the phonetic rules are also same for Anusvara and
    other Chillus. Most importantly the half stop property (please see
    Appendix A), if it occurs in the middle of a word. Examples:

    സംയുക്തം സാമ്യം
    കല്‍വിളക്ക്‌ വില്വാദ്രി
    കണ്‍വട്ടം കണ്വന്‍

    Essentially this means Unicode should do either of:
    1. Include separate character locations for Chillu characters
    - solves the confusion of ല്‍ (Chillu of LA/TA) (see below)
    - Addresses above mentioned Chillu representation issues
    2. Allow Anusvara to be encoded as MA + Virama + ZWJ
    - does not change existing encoding for Chillu
    - does not address previously explained Chillu representation issues

    Background
    ----------
    A) Overloading of visible Virama in Malayalam

    Following are its functions:
    A.1) at end of a word, it acts as quarter vowel ഉ (U). Example: അവന്‌ (avan~)
    A.2) In the middle of a word, it means the consonant before is forming
    a conjunct with consonant after. Example: ശബ്‌ദം (Sabdam) In this
    context, it does not produce any sound what so ever.
    Functionality-(A.2) has been overloaded with this grapheme when
    typesetting friendly new orthography has been introduced. Unicode
    recognizes functionality-(A.2) alone with visible Virama of Malayalam.
    This contributes to the problem that Unicode representation of അവന്‍
    (avan) & അവന്‌ (avan~) being different only by ZWJ/ZWNJ.

    B) Evolution & Confusion of ല്‍ (Chillu LA/TA)
    For Sanskrit words used Malayalam, ത (TA) is pronounced as it is, only
    when a vowel or semi-vowel comes after it. For all other occasions, it
    is pronounced as ല (LA).

    An example would be ഉത്സവം (ulsavam). Even though, it's Sanskrit
    originated form is ഉത്‌സവം (uthsavam), it is pronounced in Malayalam
    as ഉല്‌സവം (ulsavam).

    This means, Chillu form of ത (TA) should be pronounced as if it is
    Chillu form of ല (LA). Thus, ല്‍ (chillu LA/TA) is in a very curious
    situation:

    B.1) Grapheme level: Graphically it is Chillu of ത (TA).
    B.2) Character level: It can represent the characters – either ത (TA) or ല (LA).
    B.3) Phoneme level: Its pronunciation is the Chillu of ല (LA).

    Reference: കേരളപാണിനീയം (kEraLapaaNineeyam), പീഠിക (peeThika) - A. R.
    Raja Raja Varma

    thanks,
    Cibu

    -- 
    More about me: http://www.blogger.com/profile/1246232
    


    This archive was generated by hypermail 2.1.5 : Tue Mar 22 2005 - 18:32:48 CST