From: Philippe Verdy <>
Date: Mon, 15 Aug 2011 09:33:32 +0200

Thai, Khmer, Lao and Tai Viet are already exceptions to the Unicode
character encoding model. This should remain bounded to the native
scripts of this region of Indochina. For the rest, all Indic scripts
using the logical encoding order (including those of Burma, and the
Philippines) should have the same coherent behavior.

So the case of the Khmer Coeng is not a good case, as Khmer does not
behave and is not included as a regular Indic script, depite of its
historic origin (anyway there's a split of representation as well
between Semitic scripts and Greek/Coptic, even if there's a common
historic origin). The split between Indochinan scripts and other
scripts with Brahmic origin is probably much more recent (and
justified by compatibility with legacy encodings), but it is
justifiable to consider those Indochinan scripts in a class separated
from "Indic" scripts, within the same large "Brahmic" family.

The so called "Unicode character model" already includes distinct
classes between alphabetic scripts, abjads, abugidas (Indic),
syllabaries, and sinographic scripts, within the phonographic family,
plus logographic scripts. This just adds another class for Indochinan
abugidas (using the visual encoding order), which should probably be
better formalized officially.


2011/8/14 Richard Wordingham <>:
> On Fri, 24 Jun 2011 18:24:01 +0530
> Shriramana Sharma <> wrote:
>> The point is that the sequence:
>> <la, virama, candrabindu, la>
>> is strictly speaking *the* sequence recommended *across* Indic
>> scripts for representation of Sanskrit clusters involving a nasal and
>> non-nasal "semivowel".
> Could you please quote me chapter and verse for this from the TUS or
> other relevant ruling.  It contradicts TUS 6.0 Section 11.4 Ordering of
> Syllable Components (p367), which treats U+17D2 KHMER SIGN COENG and
> its following consonant (or independent vowel) as inseparable.
> It also creates the further oddity that when using a 'consonant sign'
> (Tibetan, possibly Myanmar, and Tai Tham) one would have the sequence
> <la, candrabindu, subjoined la>.  (Alas, I don't have any relevant
> Sanskrit examples in those scripts.)
> The problem may be what is meant by an 'Indic script'?  Do you include
> Tibetan and Further Indian Indic scripts (e.g. Myanmar, Tai Tham and
> Khmer), or do you just mean Indian Indic scripts?
> Richard.
