Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

From: Richard Wordingham via Unicode <>
Date: Thu, 7 Jun 2018 08:36:06 +0100

On Tue, 5 Jun 2018 01:37:47 +0100
Richard Wordingham via Unicode <> wrote:

> The decomposed
> form that looks the same is นํ้า <U+0E19, U+0E4D, U+0E49, U+0E32>.
> The problem is that for sane results, <tone mark, SARA AM> needs
> special handling. This sequence is also often untypable - part of the
> protection against Thai homographs.

I've been misquoted on the Rust discussion topic - or the behaviour is
more diverse that I was aware of. On LibreOffice, with sequence
checking not disabled, typing <U+0E19, U+0E4D> disables the input by
typing of U+0E49 or U+0E32 immediately afterwards. Another mechanism
is for typing another vowel to replace the U+0E4D. The problem here is
that in standard Thai, U+0E4D may not be followed by another vowel or
tone mark, so Wing Thuk Thi (WTT) rules cut in. (They're also quite
good at preventing one from typing Northern Khmer.) In LibreOffice,
typing the NFKC form <U+0E19, U+0E49, U+0E4D, U+0E32> is stopped at
attempting to type U+0E4D, though one can get back to the original by
typing U+0E33 instead. To the rule checker, that is mission

Received on Thu Jun 07 2018 - 02:36:34 CDT

This archive was generated by hypermail 2.2.0 : Thu Jun 07 2018 - 02:36:35 CDT