Re: Combining Class of Thai Nonspacing_Marks from Gerriet M. Denkmann on 2017-04-04 (Unicode Mail List Archive)

From: Gerriet M. Denkmann <gerrietm_at_icloud.com>
Date: Wed, 5 Apr 2017 10:00:25 +0700

> On 4 Apr 2017, at 00:00,Asmus Freytag <asmusf_at_ix.netcom.com> wrote:
>
> It is not possible to construct a set of secure network identifiers based on simply
> a) ensuring the string is in NFC
> b) otherwise allowing all of the Thai characters (insofar as the they are PVALID in IDNA 2008 [RFC5892]).
>
> Considerable attention to allowable contexts is required. There is a group in Thailand working on this, but their results have not yet been made public.

Maybe this: Proposal for the Thai Script Root Zone Label Generation Rulesets <https://www.icann.org/en/system/files/files/proposal-thai-lgr-15dec16-en.pdf>

But the rules for Root Zone Labels are (rightly) much more restricted than what I want:

Any two strings which look (almost?) identical should be normalised into some canonical form.
Reason: not to have identical looking filenames in a filesystem.
With the current rules of normalisation there could be 8 different filenames all looking identical to “กินครึ่งทิ้งครึ่ง”.

E.g. :
- both NIKHAHIT + Sara Aa and Sara Am should be normalised into the same string (whatever this is)
- both top-vowel + tone-mark and tone-mark + top-vowel should be normalised into the same string (whatever this is).
etc.

If, as Richard Wordingham wrote: "Unicode combining classes cannot be changed. All that can be done is
to enforce the order of characters in normalised text.” then the Unicode Normalisation algorithms should be updated.

Kind regards,

Gerriet.
Received on Tue Apr 04 2017 - 22:00:58 CDT

This archive was generated by hypermail 2.2.0 : Tue Apr 04 2017 - 22:00:58 CDT