The "SARA AM" problem seems to be with the compatibility decomposition (NFKD and NFKC).
NFK* change a lot of characters and strings - not just Thai - in various visible and functional ways and must be used with caution.
Samphan Raruenrom wrote:
> Mark Davis wrote:
> >>- decomposition of SARA AM add more problem to normalization
> > I don't recall seeing that note; I'll look forward to your report.
> Please see my discussion with khun Peter Constable quoted below.
--- 8< ---
> 0E32;THAI CHARACTER SARA AA;Lo;0
> 0E48;THAI CHARACTER MAI EK;Mn;107
> 0E33;THAI CHARACTER SARA AM;Lo;0;L;<compat> "NIKHAHIT" "SARA AA"
> There're two ways to represent the word KO KAI + MAI EK + SARA AM
> (a) KO KAI + MAI EK + SARA AM
> (b) KO KAI + NIKHAHIT + MAI EK + SARA AA
> (b) must be in this sequence to get the intended look for
> the word (not that this is the valid sequence for Thai/WTT).
> That is the mai-ek is on top of the nikhahit.
> The problem is with the NFKD/NFKC of (a), which is
> (c) KO KAI + MAI EK + NIKHAIT + SARA AA
> Which will be rendered with nikhahit on top of mai-ek.
> Which is not the same as (a), and is not the intened look.
> So this means that the string change its shape after
> normalization. Is this a violation of any principle?
> The problem comes also from the fact that combining class of
> NIKHAHIT is 0 and that make reording of (c) impossible.
This archive was generated by hypermail 2.1.2 : Wed Jul 17 2002 - 19:02:49 EDT