Re: String Ranges in Unicode Sets from Doug Ewell on 2015-09-08 (Unicode Mail List Archive)

From: Doug Ewell <doug_at_ewellic.org>
Date: Tue, 08 Sep 2015 08:19:03 -0700

Mark Davis 🍱️ <mark at macchiato dot com> wrote:

>> TUS 8.0 Chapter 3 C6: "A process shall not assume that the
>> interpretations of two canonical-equivalent character sequences are
>> distinct."
>
> A compiler will take source code containing String x="á"; and compile
> it to a certain binary. If that same source code is NFD'd, the
> compiler will produce a different result.
>
> Do you really think that such compiler is not compliant to Unicode??
> If so, then we should add some more clarifications around C6.

I agree. The word "interpretations" in C6 can't have been intended to
include the interpretation of code points qua code points. That would
make a great many internal processes impossible.

I think of C6 as meaning that spell-checkers, for example, should not
treat José (NFC, four code points) and José (NFD, five code points)
as separate entries.

--
Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸

Received on Tue Sep 08 2015 - 10:20:12 CDT

This archive was generated by hypermail 2.2.0 : Tue Sep 08 2015 - 10:20:12 CDT