Re: Tibetan/Burmese/Khmer

From: Maurice J Bauhahn (
Date: Sat Jan 18 1997 - 11:04:13 EST

Thank you Michael for the information you passed on. Thank you for
disclosing at what the stage Thai encoding was changed.

> >If Khmer was substituted on to an ISCII encoding, several consonants would
> >need to be added out of alphabetic order and some vowels added. For speed
> >of sorting that would be disadvantageous...but if economic considerations
> >made it necessary, Cambodians would have to put up with the
> >inconvenience.
> I can't imagine that it would be *that* expensive in terms of speed.
> Certainly the closer a script is to a Brahmic encoding, the cheaper it is
> to adapt from software built for an existing Brahmic encoding.

I wish I could calculate the theoretical limits to settle that question.
All I know is the difficulty which I have experienced in creating a
sorting algorythm for the language. There are five levels of dependencies
with up to 35 members in each dependency. Of course the real language does
not have all combinations but the variations are enough that a simple
dictionary lookup does not seem practical.
> (I prefer to call the ISCII/Unicode/10646 "family of encoding-principles"
> Brahmic encoding generically. Tibetan would not have a Brahmic encoding,
> though historically it is a Brahmic script.)
> >One _can_ put proper weight to subscripts in this way...however at
> >additional time cost. I envision helping Cambodians to sort
> >millions of strings on their computers, and am fearful of the
> >implications of numerous compromises that reduce the efficiency of
> >sorting. Sorting in Khmer is similarly dependent on the root of the
> >syllable (thank you, Michael, for putting it that makes it more
> >understandable to the uninitiated).
> Maurice, will you please write a paper with examples and submit it to WG2
> and to the UTC for scrutiny? A paper in HTML format would be nice.

I would love to do that, but have no idea how to incorporate Khmer script
examples into HTML without a bunch of little giffs! Are there any working
browsers which take advantage of the proposed RFC2070
(Internationalization of the Hypertext Markup Language)?
> >In Khmer there are five different
> >weightings within a syllable: base consonant (or implied glottal stop
> >consonant), first subscript consonant, second subscript consonant, vowel,
> >and sign. It will be nice with Unicode to combine all the vowel glyphs
> >combinations into one character!
> None of this sounds like "root" in the sense in which Tibetan uses the term.

Please post a URL to a document which describes what 'root' does mean when
refering to Tibetan.

Thank you for pursing this topic.

Maurice Bauhahn

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT