Proposed Update Unicode Technical Standard #46 (Unicode IDNA Compatibility Processing)

From: Colosi, John (jcolosi@verisign.com)
Date: Thu Sep 16 2010 - 16:59:51 CDT

  • Next message: Krishna Birth: "Unix Codes for Diacritics"

    Hello all,

     

    I represent the VeriSign Domain Name Registry as an implementer of the latest IDNA specifications. The following four (4) questions arose during our implementation of the conformance test.

     

     

    Question 1 of 4

    Line 204

    Input \u0646 \u0627 \u0645 \u0647 \u200C \u0627 \u06CC

    Reference Appendix A.1 of RFC 5892 (Tables) <https://trac.tools.ietf.org/html/rfc5892>

    Issue Per the reference, the ZWNJ (\u200C) must meet one of two qualifications. It must be preceded by a character with VIRAMA combining class. OR the characters in the label must have a certain pattern of joining types. This input does not meet either of these criteria, and appears to be an invalid IDN label with respect to the IDNA 2008 standards. There are ten (10) such lines in the input file.

     

     

    Question 2 of 4

    Line 319

    Input ...1234567890123456789012345678901234567890123456789012345678901234...

    Reference Sections 3.1 and 3.5 of RFC 1034 <http://www.ietf.org/rfc/rfc1034.txt>

    Issue Per the reference, DNS labels cannot contain more than 63 octets. It appears that this is a purposeful test, since the first label is exactly 63 octets, and the second label is 64 octets. This does not apply to other applications, but these lines of input are not valid for DNS. There are three (3) such lines in the input file.

     

     

    Question 3 of 4

    Line 319

    Input U \u0308 . xn--tda

    Reference Section 4.1 of RFC 5891 (Protocol) <https://trac.tools.ietf.org/html/rfc5891>

    Issue Per the reference, input into the IDNA Registration process "MUST be... in Normalization Form C". This input does not meet these standards. The first label is not properly normalized. Implementations of IDNA 2008 for registration should expect an exception. There are four (4) such lines in the input file.

     

     

    Question 4 of 4

    Line 276

    Input xn-53h

    Reference Appendix B.1 of RFC 5892 (Tables) <https://trac.tools.ietf.org/html/rfc5892>

    Issue Per the reference, the character \u2615 is disallowed.

    2460..26CD ; DISALLOWED # CIRCLED DIGIT ONE..DISABLED CAR

    Implementations should expect an exception. There are twenty (20) such lines in the input file.

      

     

    Any input is appreciated,

    -- John

     

     

    John Colosi | Naming Services | VeriSign, Inc.
    Å 703.948.3211 È 703.967.4062 Ê 703.421.8233

    This message is intended for the use of the individual or entity to
    which it is addressed, and may contain information that is privileged,
    confidential and exempt from disclosure under applicable law. Any
    unauthorized use, distribution, or disclosure is strictly prohibited. If
    you have received this message in error, please notify sender
    immediately and destroy/delete the original transmission.



    This archive was generated by hypermail 2.1.5 : Thu Sep 16 2010 - 17:27:44 CDT