Re: Is UniCode's Thai character representation is acceptable by TISI or not?

From: Mark Davis (
Date: Tue Jul 16 2002 - 11:43:20 EDT

Some comments below.

◄ “Eppur si muove” ►

----- Original Message -----
From: "Samphan Raruenrom" <>
To: "Asmus Freytag" <>
Cc: "Sreedhar M" <>; <>; "Rick
McGowan" <>
Sent: Tuesday, July 16, 2002 07:22
Subject: Re: Is UniCode's Thai character representation is acceptable
by TISI or not?

> Asmus Freytag wrote:
> > At 12:06 PM 7/16/02 +0700, Samphan Raruenrom wrote:
> >> There're some mistakes in Unicode char.
> >> properties for Thai char. and you have to "code around" that.
> > And the mistakes are?
> I've discussed a few of them here in this list. I'll write
> a more formal report on the issue later. Here're some titles
> Problems from Unicode properties
> - error in combining class of vowel signs make normalization
> in some cases. This is important if you want to compare strings.

Meaning: the normalized forms of two strings are not equal in cases
where Thais would consider them equal, right?

> - decomposition of SARA AM add more problem to normalization

I don't recall seeing that note; I'll look forward to your report.

> - some properties make grapheme cluster for Thai
> imcompatible with the way Thai expect, e.g PINTHU as
> virama, SARA AM not a combining character

In the last UTC, action was taken that is not yet in the draft TR on
boundaries. In particular, this affects Thai.

> Inaccuracy in the Unicode book
> - backspace 'always' use the same (grapheme cluster) character
> as Del and left/right arrow. Actually Thai use backspace to
delete single
> character not the whole cluster. So character boundary for
> should be locale specific.

This text will be overriden by the TR.

> - in Thai, zero width space is said to be able to expand in
> paragraph. Actually it is always zero width.

There may be some misunderstanding here. What is meant is: if you had
the sequence ABCD, and between the B and the C was a zero-width space,
AND you were inter-character spacing for justification, you would not
expect to see:


Instead, you would expect to see


That is, the zero-width space does not prevent the characters from
using inter-character spacing.

> These are things you have to khow after learning the Unicode
> if you plan to work with Thai language, to 'code around' the problem
> to make it acceptable for Thai people.
> I plan to write a formal report on the issue, not to change the
> but to note what is wrong and what have to be code around. So people
> who like to work with Thai language (like you) will know the right
> to do and not repeat the same mistake as in some softwares.
> --
> Samphan Raruenrom
> Information Research and Development Division,
> National Electronics and Computer Technology Center, Thailand.

This archive was generated by hypermail 2.1.2 : Tue Jul 16 2002 - 09:53:28 EDT