RE: Malayalam and unicode

From: Marco.Cimarosti@icl.com
Date: Thu Nov 18 1999 - 06:31:15 EST


In Unicode, a sequence like:

        U+0D15 U+0D15

simply represents a sequence of two ka's (pronounced, I guess, "kaka").

If you want to spell "kka", you should use the sequence:

        U+0D15 U+0D4D U+0D15

Where U+0D4D is Malayalam virama (or halant?). But good-quality application
should not actually show the virama: the whole 3-character sequence should
be substituted with the glyph for "kka". (This is not encoded as a separate
Unicode character, but should nevertheless be a specific glyph in a
Malayalam font).

When you actually want the virama sign to be shown, I think you should use
the sequence:

        U+0D15 U+0D4D U+200C U+0D15

U+200C is a control character called "zero width non joiner" or "ZWNJ"
(pronounced "zweenj") that is used for several purposes in different
scripts. For Indic scripts like Malayalam, it has the meaning to "reveal"
viramas that should normally be invisible.

(I am not totally sure about this U+200C; especially, I don't remember if it
goes before or after U+0D4D. but other readers may correct me).

I think that this policy to use virama to build composed consonants is not
specific to Unicode: if I remember well, the mechanism is taken from ISCII,
an 8-bit Indian character set.

I hope this helps.

- Marco

> -----Original Message-----
> From: RajKumar [SMTP:raj2569@flashmail.com]
> Sent: 1999 November 18, Thursday 10.48
> To: Unicode List
> Subject: Malayalam and unicode
>
> hi all
>
> I am a newbie to the list and just wondering if anyone out there had done
> some work in malayalam.
>
> one of the problems that i have come across is that in malayalam when two
> consonents for eg when 2 0d15 chars combine we get a new glyph to
> represent the compound char.
>
> now my question is how can the new combined glyph is represented in
> unicode, we cannot simply replace every 2 occurence of the 0d15 char with
> the new glyph since they can exist as two independent chars also. can the
> information that the two chars combiened togeter forms a new glyph can be
> represented some how?
>
> raj



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT