Re: Proposal for change to UTS #46 Data
Date: 2013 Dec 12
From: Mark Davis

Consider adding an informative data field for currently-invalid IDNA2008 characters:
XV8 - is not valid in IDNA2008 for the corresponding version of Unicode, but was valid in some previous version of IDNA2008.


We define a field in the UTS #46 data file in http://unicode.org/reports/tr46/#Table_Data_File_Fields

3    IDNA2008 status: NV8. Only present if the status is valid but the character is excluded by IDNA2008 from all domain names for all versions of Unicode. This is not a normative field.

​And according to that definition,  it is correct that there is no value on:
19DA          ; valid                                  # 5.2  NEW TAI LUE THAM DIGIT ONE

That is because that character is valid in at least one version of IDNA2008. However, that field is easily misunderstood to mean "invalid in the current version", as in Michel's error report (on unicore): 

Currently says for 19DA:

19DA          ; valid                                  # 5.2  NEW TAI LUE THAM DIGIT ONE


But recent IDNA2008 (RFC 6452 at http://tools.ietf.org/html/rfc6452) says:



   The GeneralCategory for this character changes from Nd to No.  This
   implies that the derived property value changes from PVALID to DISALLOWED
Accordingly,  the entry for 19DA in IdnaMappingTable.txt should have either ‘disallowed’ or some migration functionality (‘deviation’ ?) so that the document could be used to create a IDNA2008 6.3 compatible process.
(I found that gem when comparing the IANA IDNA2008 table for 6.3 and Unicode equivalent. That was the only difference I found). Just to show that GC changes are not w/o consequences.

Because IDNA2008 does not guarantee the stability of valid characters, if people don't read the documentation carefully, there are two possible, reasonable meanings for NV8. Where UV is the corresponding version of Unicode:

A. A character is valid in UTS 46 (transitional) but is not valid according to IDNA2008 in UV, 
B. A character is valid in UTS 46 (transitional) but has never been valid according to IDNA2008 for any version of Unicode up to and including UV.

The data we have for NF8 is for B, not A. 
  • B is the best data to use if your implementation needs to guarantee stability for IDNA2008, while 
  • A is the best data to use if your implementation wants to confirm precisely to IDNA2008 for UV.
There is one possible hiccough with providing A. We do not have a guarantee that the IETF will not retroactively change the characters valid in IDNA2008 for a specific version of Unicode. (If they want to grandfather a character in, they have to proactively propose a change to the spec, which takes some time.) But this is also an issue for B.

I suggest for that we document that in such as cases we will issue a dot-dot release, like Version 6.3.1 with modified XV8, NV8 field values.