Accumulated Feedback on PRI #333

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Fri Aug 19 07:09:31 CDT 2016
Name: Jonathan Warden
Report Type: Other Question, Problem, or Feedback
Opt Subject: Suggestions for Immutable Identifiers

I have a few suggestions regarding UAX31-R2 (Immutable Identifiers)

1. clarify that these recommendations:
  - allow unassigned characters for which properties such as normal form, 
	script, etc. are unknown, which means:
    - identifiers can't be compared for NFC/NFKC/case-insensitive equality
    - can't be restricted per the TR39 recommendations 
  - are meant for those cases (like XML) that can't update across versions 
	of Unicode, and don't require information about normal form, script, etc.
  - disallowing unassigned characters is recommended as a best practice *for 
	cases that do require this information*.

2. point out option of using a whitelist of allowed characters from a specific 
version of Unicode and never upgrading (such as the characters in 
IdentifierStatus.txt under http://www.unicode.org//Public/security/9.0.0).  
I write about this in my blog here: http://jonathanwarden.com/2016/08/18/immutable-unicode-identifiers/

3. finally, the recommendation of allowing "any non-empty string of characters 
that contains no character having any of the following property values" would 
allow identifiers to start with (and contain only) digits.  Another recommendation 
might be to use Default Identifiers, but to define <Continue> as "no character 
having any of the following property values", and <Start> as <Continue> minus 
characters with general properties m, n, or Pc.

From: Patrik Fältström
Subject: Re: Prep for Unicode 10.0, liaison contact
Date: Wed, 29 Mar 2017 10:54:33 +0200

... 

I have checked mechanically the 10.0.0 derived attribute values and compared
with 9.0.0 defined attribute values according to the IDNA2008 algorithm and
have not found any issues.

What I am concerned about though is the continued communication that UTS#46 is
something that can be used in applications when in reality that creates
confusion regarding what code points can be used in identifiers like domain
names. Specifically as normal users do not understand the various flags that
one must define (to give the same and predictable result), the fact UTS#46 do
not only recommend a certain mapping step (which IDNA2003 include, but not
IDNA2008). And finally that according to my reading UTS#46 and UAX#31 do have
different sets of allowed characters, which further creates confusion. For
example when one look at what normal people believe is "emojis".

I would like to encourage Unicode Consortium be more clear in its intentions
with the future recommended use of UTS#46 and UAX#31 in the context of the
IDNA2008 algorithm.

   Patrik Fältström
   IETF Liaison to Unicode Consortium

From: Mark Davis
Date: Thu, 6 Apr 2017 17:09:01 +0200
Subject: Re: Prep for Unicode 10.0, liaison contact

I agree with the suggestion to clarify the meaning and "default" values of the
different flags used in
http://www.unicode.org/reports/tr46/proposed.html#ToASCII  and
http://www.unicode.org/reports/tr46/proposed.html#ToUnicode 

As to UTS#46 and UAX#31, it was never a goal to make them align and they never
have aligned. The primary goal for UAX#31 is to extend identifiers such as
used in programming languages to Unicode (and UAX#31 defines several different
kinds of identifiers). The primary goal for UTS#46 is to provide a solution
for implementations that want to maintain backwards compatibility with
IDNA2003, while extending the repertoire to modern Unicode versions based on
the IDNA2003 principles.

Of course, any implementation can always apply additional filters on top of
UTS#46, including restricting to UAX#31 default identifiers, restricting to
the IDNA2008 repertoire, applying tests such as in UTS#39 for mixed scripts,
or applying ICANN rules. For IDNA2008, the data files in fact provide
information about what IDNA2008 would allow, and also reference certain
conditions in IDNA2008, such as ContextJ. (UTS#46 does project forward to the
current Unicode release — based on the IDNA2008 principles — since the version
of Unicode supported by IDNA2008 is old.)

Mark