Accumulated Feedback on PRI #347

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Thu Jan 5 10:29:43 CST 2017
Name: Alastair Houghton
Report Type: Error Report (UTS #46)
Opt Subject: IdnaTest.txt contains incorrect test cases

The test vectors for UTS #46, which can be found in
http://www.unicode.org/Public/idna/9.0.0/IdnaTest.txt appear to have a few
errors.

For instance, line 74:

 B;	0à.\u05D0;	;	xn--0-sfa.xn--4db	#	0à.א

which should fail [B1] because the first character has Bidi property EN, not
L, R or AL, and line 93:

 B;	àˇ.\u05D0;	;	xn--0ca88g.xn--4db	#	àˇ.א

which should fail [B6] because “ˇ” has Bidi property ON, not L, EN or NSM.

This is quite a common problem in the file.

(I've already mentioned this on the Unicode mailing list and was asked by Mark
Davis to report it here.)

From: Patrik Fältström
Subject: Re: Prep for Unicode 10.0, liaison contact
Date: Wed, 29 Mar 2017 10:54:33 +0200

... 

I have checked mechanically the 10.0.0 derived attribute values and compared
with 9.0.0 defined attribute values according to the IDNA2008 algorithm and
have not found any issues.

What I am concerned about though is the continued communication that UTS#46 is
something that can be used in applications when in reality that creates
confusion regarding what code points can be used in identifiers like domain
names. Specifically as normal users do not understand the various flags that
one must define (to give the same and predictable result), the fact UTS#46 do
not only recommend a certain mapping step (which IDNA2003 include, but not
IDNA2008). And finally that according to my reading UTS#46 and UAX#31 do have
different sets of allowed characters, which further creates confusion. For
example when one look at what normal people believe is "emojis".

I would like to encourage Unicode Consortium be more clear in its intentions
with the future recommended use of UTS#46 and UAX#31 in the context of the
IDNA2008 algorithm.

   Patrik Fältström
   IETF Liaison to Unicode Consortium

From: Mark Davis
Date: Thu, 6 Apr 2017 17:09:01 +0200
Subject: Re: Prep for Unicode 10.0, liaison contact

I agree with the suggestion to clarify the meaning and "default" values of the
different flags used in
http://www.unicode.org/reports/tr46/proposed.html#ToASCII  and
http://www.unicode.org/reports/tr46/proposed.html#ToUnicode 

As to UTS#46 and UAX#31, it was never a goal to make them align and they never
have aligned. The primary goal for UAX#31 is to extend identifiers such as
used in programming languages to Unicode (and UAX#31 defines several different
kinds of identifiers). The primary goal for UTS#46 is to provide a solution
for implementations that want to maintain backwards compatibility with
IDNA2003, while extending the repertoire to modern Unicode versions based on
the IDNA2003 principles.

Of course, any implementation can always apply additional filters on top of
UTS#46, including restricting to UAX#31 default identifiers, restricting to
the IDNA2008 repertoire, applying tests such as in UTS#39 for mixed scripts,
or applying ICANN rules. For IDNA2008, the data files in fact provide
information about what IDNA2008 would allow, and also reference certain
conditions in IDNA2008, such as ContextJ. (UTS#46 does project forward to the
current Unicode release — based on the IDNA2008 principles — since the version
of Unicode supported by IDNA2008 is old.)

Mark