> As I understand Unicode, it is trying to represent a text in its deep
> and it is the job of the font to convert that deep structure to surface
> or actual glyphs of the text. This is what exactly transliteration also
> to do (atleast in case of Indic scripts). Finding out the rules to do
> conversion is the core of both. What is being remaining is, assigning
> numbers in case of Unicode or assigning correpsonding Latin character
> in case of transliteration. Both are reasonably trivial. So my questions
> 1. Is my theory correct ? If not, in which way ?
> 2. Are these rules for conversion between deep structure to surface
> documented somewhere, in case of Malayalam ?
Using the encodings of the Indic scripts, it's pretty easy to implement a
function between Indic scripts. Transliteration from the Indic scripts to
isn't too difficult either. However, there are a number of situations where
extremely awkward. For example, converting from Roman to the Indic scripts
The real issue here is the separation of visual elements and the raw data.
like you want to process text on the screen, and accept keyboard input on
that have been pushed through the transliteration process. What this means
you need to track where the encoded character boundaries are located and
these to the Roman syllables, with correct cursor movement - not a simple
This is a function that should be separated from the internal text and
into whatever rendering engine you use.
However you look at it, this is beyond what the Unicode Standard provides.
want character encodings, Unicode provides this; for visual elements and
behavior of your cursor keys, CDAC's implementation in Leap is probably
the de facto standard you should follow. I believe this is well documented
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT