Re: Unicode and transliteration

From: Cibu_Johny@3com.com
Date: Wed Aug 25 1999 - 20:23:52 EDT

Next message: schererm@us.ibm.com: "unicode standard text on cd"
Previous message: zé do rock: "Virus"
Maybe in reply to: Cibu_Johny@3com.com: "Unicode and transliteration"
Next in thread: Reynolds, Gregg: "RE: Unicode and transliteration"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

brendan_murray@lotus.com on 08/23/99 11:39:50 PM

To: Unicode List <unicode@unicode.org>
cc: (Cibu Johny/MW/US/3Com)
Subject: Re: Unicode and transliteration

Cibu_johnnt@3com.com wrote:
<snip>
> As I understand Unicode, it is trying to represent a text in its deepstructure
> and it is the job of the font to convert that deep structure to
surfaceelements
> or actual glyphs of the text. This is what exactly transliteration alsotrying
> to do (atleast in case of Indic scripts). Finding out the rules to dothis
> conversion is the core of both. What is being remaining is, assigningcode
> numbers in case of Unicode or assigning correpsonding Latin charactersequence
> in case of transliteration. Both are reasonably trivial. So my questionsare:
>
> 1. Is my theory correct ? If not, in which way ?
> 2. Are these rules for conversion between deep structure to surfacestructure
> documented somewhere, in case of Malayalam ?

However you look at it, this is beyond what the Unicode Standard provides. If
you
want character encodings, Unicode provides this; for visual elements and the
behavior of your cursor keys, CDAC's implementation in Leap is probably
the de facto standard you should follow. I believe this is well documented
in their manuals.

====> Thanks for your kind reply.

My question was not based on any implementations or other standards.

    Given an encoding for a language, a Unicode font should be able to reproduce
all the glyphs
    existing in that language. As all you know, it is not that intuitive. For
example, Malayalam
    'ra'(0D31) can take at least three glyph forms depending on the context.
Whether all these
    glyphs are reproducable from a Unicode font should be a concern. For this,
rules
    should be defined, describing such and such combination produces such and
such glyph. As an
    example, <ra> + <zwj> produces 'cillu' glyph of 'ra'. Is such a rule set
existing ?

thanks,
Cibu

Next message: schererm@us.ibm.com: "unicode standard text on cd"
Previous message: zé do rock: "Virus"
Maybe in reply to: Cibu_Johny@3com.com: "Unicode and transliteration"
Next in thread: Reynolds, Gregg: "RE: Unicode and transliteration"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT