Re: Is UniCode's Thai character representation is acceptable by TISI or not?

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Wed Jul 17 2002 - 20:38:27 EDT


The "SARA AM" problem seems to be with the compatibility decomposition (NFKD and NFKC).
NFK* change a lot of characters and strings - not just Thai - in various visible and functional ways and must be used with caution.

markus

Samphan Raruenrom wrote:

> Mark Davis wrote:
> >>- decomposition of SARA AM add more problem to normalization
> > I don't recall seeing that note; I'll look forward to your report.
>
> Please see my discussion with khun Peter Constable quoted below.

--- 8< ---

> 2)
>
> 0E32;THAI CHARACTER SARA AA;Lo;0
> 0E48;THAI CHARACTER MAI EK;Mn;107
> 0E33;THAI CHARACTER SARA AM;Lo;0;L;<compat> "NIKHAHIT" "SARA AA"
>
> There're two ways to represent the word KO KAI + MAI EK + SARA AM
>
> (a) KO KAI + MAI EK + SARA AM
> (b) KO KAI + NIKHAHIT + MAI EK + SARA AA
>
> (b) must be in this sequence to get the intended look for
> the word (not that this is the valid sequence for Thai/WTT).
> That is the mai-ek is on top of the nikhahit.
>
> The problem is with the NFKD/NFKC of (a), which is
>
> (c) KO KAI + MAI EK + NIKHAIT + SARA AA
>
> Which will be rendered with nikhahit on top of mai-ek.
> Which is not the same as (a), and is not the intened look.
> So this means that the string change its shape after
> normalization. Is this a violation of any principle?
>
> The problem comes also from the fact that combining class of
> NIKHAHIT is 0 and that make reording of (c) impossible.



This archive was generated by hypermail 2.1.2 : Wed Jul 17 2002 - 19:02:49 EDT