Thai as Indic (was: letters that "complete the rectangle" in Indic scripts)

From: Richard Wordingham <>
Date: Fri, 20 Sep 2013 00:16:20 +0100

On Thu, 19 Sep 2013 10:42:43 +0200
Philippe Verdy <> wrote:

> So **within the UCS**, the Thai script is not an Indic script. There
> was so many existing documents encoded like in TIS sctandards that
> preserving the roundtrop compatibility was judged more essential than
> adopting the logical Indic order for this script. This has
> consequences for some algorithms, notably for collation.

As I understand it, 'logical Indic order' is the order that makes
collation straightforward. Thai is one of the few major Indic script
languages for which the Unicode Collation Algorithm (I don't just
mean with the DUCET or CLDR default) readily delivers correct results.
Thai collation is actually very computer friendly, collating <SARA E, SO
SUA, LO LING, SARA AA> the same whether it is graphically one
syllable /sa lǎw/ 'beautiful' or two /sěː laː/ 'hill'. By contrast,
CLDR currently despairs of sorting Hindi correctly, and resorts to
brute force for Burmese.

As I had it explained to me in this forum, 'logical order' for
Thai would have been achieved by swapping the 'logic order exception'
vowels with the following consonant. *<KO KAI, SARA O, RO RUA, THO
THONG> for โกรธ /kròːt/ 'angry' isn't what most people would think of as
'logical order'.

Where Thai differs from most Indic scripts is that there is no
conjoining mechanism, and is ambiguous as a result. This is a language
property rather than a script property. However, even Pali in the
'traditional' orthography, which uses U+0E3A THAI CHARACTER PHINTHU
as a visible virama, would not be simple to convert to a 'logical
order', for syllable division as evidenced by the placement of the
preposed vowels is not simple and is often erratic.

Received on Thu Sep 19 2013 - 18:19:09 CDT

This archive was generated by hypermail 2.2.0 : Thu Sep 19 2013 - 18:19:10 CDT