Re: Is UniCode's Thai character representation is acceptable by TISI or not?

From: Samphan Raruenrom (samphan@thai.com)
Date: Tue Jul 16 2002 - 10:22:17 EDT


Asmus Freytag wrote:
> At 12:06 PM 7/16/02 +0700, Samphan Raruenrom wrote:
>> There're some mistakes in Unicode char.
>> properties for Thai char. and you have to "code around" that.
> And the mistakes are?

I've discussed a few of them here in this list. I'll write
a more formal report on the issue later. Here're some titles

Problems from Unicode properties
- error in combining class of vowel signs make normalization worthless
   in some cases. This is important if you want to compare strings.
- decomposition of SARA AM add more problem to normalization
- some properties make grapheme cluster for Thai
   imcompatible with the way Thai expect, e.g PINTHU as
   virama, SARA AM not a combining character

Inaccuracy in the Unicode book
- backspace 'always' use the same (grapheme cluster) character boundary
   as Del and left/right arrow. Actually Thai use backspace to delete single
   character not the whole cluster. So character boundary for backspace
   should be locale specific.
- in Thai, zero width space is said to be able to expand in full-justified
   paragraph. Actually it is always zero width.

These are things you have to khow after learning the Unicode standard
if you plan to work with Thai language, to 'code around' the problem
to make it acceptable for Thai people.
I plan to write a formal report on the issue, not to change the standard,
but to note what is wrong and what have to be code around. So people
who like to work with Thai language (like you) will know the right thing
to do and not repeat the same mistake as in some softwares.

-- 
Samphan Raruenrom
Information Research and Development Division,
National Electronics and Computer Technology Center, Thailand.
http://www.nectec.or.th/home/index.html



This archive was generated by hypermail 2.1.2 : Tue Jul 16 2002 - 08:36:19 EDT