Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

From: Richard Wordingham via Unicode <unicode_at_unicode.org>
Date: Tue, 5 Jun 2018 01:37:47 +0100

On Mon, 4 Jun 2018 12:49:20 -0700
Manish Goregaokar via Unicode <unicode_at_unicode.org> wrote:

> Hi,
>
> The Rust community is considering
> <https://github.com/rust-lang/rfcs/pull/2457> adding non-ascii
> identifiers, which follow UAX #31
> <http://www.unicode.org/reports/tr31/> (XID_Start XID_Continue*, with
> tweaks). The proposal also asks for identifiers to be treated as
> equivalent under NFKC.

> (In general, are there other problems folks see with this proposal?)

There's the usual lurking issue that the Thai word for water, น้ำ
<U+0E19 THAI CHARACTER NO NU, U+0E49 THAI CHARACTER MAI THO, U+0E33 THAI
CHARACTER SARA AM>, is unacceptable and often untypable and uncopiable
when converted to NFKC น้ํา <U+0E19, U+0E49, U+0E4D THAI CHARACTER
NIKHAHIT, U+0E32 THAI CHARACTER SARA AA>. The decomposed form that
looks the same is นํ้า <U+0E19, U+0E4D, U+0E49, U+0E32>. The problem
is that for sane results, <tone mark, SARA AM> needs special handling.
This sequence is also often untypable - part of the protection against
Thai homographs.

Richard.
Received on Mon Jun 04 2018 - 19:38:17 CDT

This archive was generated by hypermail 2.2.0 : Mon Jun 04 2018 - 19:38:17 CDT