Re: Is UniCode's Thai character representation is acceptable by TISI or not?

From: Mark Davis (mark@macchiato.com)
Date: Tue Jul 16 2002 - 11:43:20 EDT


Some comments below.

Mark
__________
http://www.macchiato.com
◄ “Eppur si muove” ►

----- Original Message -----
From: "Samphan Raruenrom" <samphan@thai.com>
To: "Asmus Freytag" <asmusf@ix.netcom.com>
Cc: "Sreedhar M" <sreedhar@cmcltd.com>; <unicode@unicode.org>; "Rick
McGowan" <rick@unicode.org>
Sent: Tuesday, July 16, 2002 07:22
Subject: Re: Is UniCode's Thai character representation is acceptable
by TISI or not?

> Asmus Freytag wrote:
> > At 12:06 PM 7/16/02 +0700, Samphan Raruenrom wrote:
> >> There're some mistakes in Unicode char.
> >> properties for Thai char. and you have to "code around" that.
> > And the mistakes are?
>
> I've discussed a few of them here in this list. I'll write
> a more formal report on the issue later. Here're some titles
>
> Problems from Unicode properties
> - error in combining class of vowel signs make normalization
worthless
> in some cases. This is important if you want to compare strings.

Meaning: the normalized forms of two strings are not equal in cases
where Thais would consider them equal, right?

> - decomposition of SARA AM add more problem to normalization

I don't recall seeing that note; I'll look forward to your report.

> - some properties make grapheme cluster for Thai
> imcompatible with the way Thai expect, e.g PINTHU as
> virama, SARA AM not a combining character

In the last UTC, action was taken that is not yet in the draft TR on
boundaries. In particular, this affects Thai.

>
> Inaccuracy in the Unicode book
> - backspace 'always' use the same (grapheme cluster) character
boundary
> as Del and left/right arrow. Actually Thai use backspace to
delete single
> character not the whole cluster. So character boundary for
backspace
> should be locale specific.

This text will be overriden by the TR.

> - in Thai, zero width space is said to be able to expand in
full-justified
> paragraph. Actually it is always zero width.

There may be some misunderstanding here. What is meant is: if you had
the sequence ABCD, and between the B and the C was a zero-width space,
AND you were inter-character spacing for justification, you would not
expect to see:

A BC D

Instead, you would expect to see

A B C D

That is, the zero-width space does not prevent the characters from
using inter-character spacing.

>
> These are things you have to khow after learning the Unicode
standard
> if you plan to work with Thai language, to 'code around' the problem
> to make it acceptable for Thai people.
> I plan to write a formal report on the issue, not to change the
standard,
> but to note what is wrong and what have to be code around. So people
> who like to work with Thai language (like you) will know the right
thing
> to do and not repeat the same mistake as in some softwares.
>
> --
> Samphan Raruenrom
> Information Research and Development Division,
> National Electronics and Computer Technology Center, Thailand.
> http://www.nectec.or.th/home/index.html
>
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 16 2002 - 09:53:28 EDT