RE: Reph and Khmer encoding model

From: Kent Karlsson (kentk@md.chalmers.se)
Date: Tue Mar 04 2003 - 06:19:48 EST

Next message: David Oftedal: "Re: Need program to convert UTF-8 -> Hex sequences"

Previous message: Marco Cimarosti: "RE: Need program to convert UTF-8 -> Hex sequences"
In reply to: Mijan: "(no subject)"
Next in thread: Mijan: "RE: Reph and Khmer encoding model"
Reply: Mijan: "RE: Reph and Khmer encoding model"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> I understand that unicode is supposed to represent the
> language, not the way it is written.

No, Unicode is supposed to be able to represent the written
form. (Of course.)

...
> Let's consider the ra+virama+ya case. In the mostpart the
> ra+virama+ya is
> displayed as ya+reph. This obviously seems to be an
> instance of ambiguous interpretation because ra+virama+ya
> could also represents
> ra+ja-phalaa. ya+reph and ra+ja-phalaa are used in different
> words and have
> different meaning.
> Form this you see that ja-phalaa is not equivalent to
> virama-ya and is better
> as a separate letter in Unicode. We always thought of
> ya-phalaa as separate
> anyway.

> > >3. There are no other cases of a Vowel+Virama combination in the
> > >Unicode encoding model.
> >
> > Yes, there are. Khmer.
>
> I do not understand Khmer but I see that it does not use the
> same 'encoding
> model'. Please look, you will see that you were wrong to use
> Khmer as an example.

Khmer uses the same encoding model as most other Indic scripts,
except for one point: the "reph" is represented via a combining
character (which also means that it does not come in "logical order"
in the text representation), so the ambiguity you refer to does
not exist for Khmer. Further, Khmer could have been represented
in a "Tibetan-like" encoding model (but isn't). Further, IIRC,
independent vowels can both be subscripted (before virama/coeng)
and be subscripts (after virama/coeng) in Khmer. The latter is
orthographically different from using dependent vowels.

/kent k

Next message: David Oftedal: "Re: Need program to convert UTF-8 -> Hex sequences"
Previous message: Marco Cimarosti: "RE: Need program to convert UTF-8 -> Hex sequences"
In reply to: Mijan: "(no subject)"
Next in thread: Mijan: "RE: Reph and Khmer encoding model"
Reply: Mijan: "RE: Reph and Khmer encoding model"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Mar 04 2003 - 07:12:00 EST