RE: Reph and Khmer encoding model

From: Kent Karlsson (
Date: Tue Mar 04 2003 - 06:19:48 EST

  • Next message: David Oftedal: "Re: Need program to convert UTF-8 -> Hex sequences"

    > I understand that unicode is supposed to represent the
    > language, not the way it is written.

    No, Unicode is supposed to be able to represent the written
    form. (Of course.)

    > Let's consider the ra+virama+ya case. In the mostpart the
    > ra+virama+ya is
    > displayed as ya+reph. This obviously seems to be an
    > instance of ambiguous interpretation because ra+virama+ya
    > could also represents
    > ra+ja-phalaa. ya+reph and ra+ja-phalaa are used in different
    > words and have
    > different meaning.
    > Form this you see that ja-phalaa is not equivalent to
    > virama-ya and is better
    > as a separate letter in Unicode. We always thought of
    > ya-phalaa as separate
    > anyway.

    > > >3. There are no other cases of a Vowel+Virama combination in the
    > > >Unicode encoding model.
    > >
    > > Yes, there are. Khmer.
    > I do not understand Khmer but I see that it does not use the
    > same 'encoding
    > model'. Please look, you will see that you were wrong to use
    > Khmer as an example.

    Khmer uses the same encoding model as most other Indic scripts,
    except for one point: the "reph" is represented via a combining
    character (which also means that it does not come in "logical order"
    in the text representation), so the ambiguity you refer to does
    not exist for Khmer. Further, Khmer could have been represented
    in a "Tibetan-like" encoding model (but isn't). Further, IIRC,
    independent vowels can both be subscripted (before virama/coeng)
    and be subscripts (after virama/coeng) in Khmer. The latter is
    orthographically different from using dependent vowels.

                    /kent k

    This archive was generated by hypermail 2.1.5 : Tue Mar 04 2003 - 07:12:00 EST