Re: Proposed Update Unicode Technical Standard #46 (Unicode IDNA Compatibility Processing)

From: Mark Davis ☕ (mark@macchiato.com)
Date: Sun Sep 19 2010 - 18:26:58 CDT

  • Next message: Nick Nicholas: "Re: Missing old Greek ligature/letter "omicron+upsilon above""

    Thanks for checking the data. I'm sorry for not responding earlier; I was on
    vacation, and am now working through my backlog of email.

    Some of the differences are because UTS#46 provides a compatibility 'bridge'
    between IDNA2003 and IDNA2008. For details of these particular cases, see
    below.

    Note that the current tests do not attempt to be exhaustive, eg include a
    line for every character with the status for whether it is valid or not.
    Such a test can be written using the main data file at
    http://unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt.

    Other test cases can be added for the future; if you (or others) have
    suggestions for good test lines, please let us know.

    Mark

    *— Il meglio è l’inimico del bene —*

    On Thu, Sep 16, 2010 at 14:59, Colosi, John <jcolosi@verisign.com> wrote:

    > Hello all,
    >
    >
    >
    > I represent the VeriSign Domain Name Registry as an implementer of the
    > latest IDNA specifications. The following four (4) questions arose during
    > our implementation of the conformance test.
    >
    >
    >
    >
    >
    > *Question **1 of 4***
    >
    > *Line* 204
    >
    > *Input* \u0646 \u0627 \u0645 \u0647 \u200C \u0627 \u06CC
    >
    > *Reference* Appendix A.1 of *RFC 5892 (Tables)<https://trac.tools.ietf.org/html/rfc5892>
    > *
    >
    > *Issue* Per the reference, the ZWNJ (\u200C) must meet one of
    > two qualifications. It must be preceded by a character with VIRAMA
    > combining class. OR the characters in the label must have a certain pattern
    > of joining types. This input does not meet either of these criteria, and
    > appears to be an invalid IDN label with respect to the IDNA 2008 standards.
    > There are ten (10) such lines in the input file.
    >

    This is by design. UTS#46 does not have the contextual checks for ZWJ and
    ZWNJ.

    Background: While those are excellent checks to have, and are recommended,
    they only prevent a small fraction of the homoglyph exploits, so they are
    not required by UTS#46 and are not tested for in the file. (If you disagree
    with that approach, you should bring that up to the UTC for the next version
    of UTS#46.) UTS#46 does allow for implementations to be stricter if desired,
    so any implementation can apply those IDNA2008 checks.

    Note that we could add a field in the test file that indicated whether the
    input (or mapped input [see below]) was valid under IDNA2008. Do people
    think that would be helpful?

    >
    >
    >
    >
    > *Question **2 of 4***
    >
    > *Line* 319
    >
    > *Input* …
    > 1234567890123456789012345678901234567890123456789012345678901234…
    >
    > *Reference* Sections 3.1 and 3.5 of *RFC 1034<http://www.ietf.org/rfc/rfc1034.txt>
    > *
    >
    > *Issue* Per the reference, DNS labels cannot contain more than
    > 63 octets. It appears that this is a purposeful test, since the first label
    > is exactly 63 octets, and the second label is 64 octets. This does not
    > apply to other applications, but these lines of input are not valid for
    > DNS. There are three (3) such lines in the input file.
    >

    This appears to be a mistake in the conformance file generation. I'll look
    at it to see what is happening.

    >
    >
    >
    >
    > *Question **3 of 4***
    >
    > *Line* 319
    >
    > *Input* U \u0308 . xn--tda
    >
    > *Reference* Section 4.1 of *RFC 5891 (Protocol)<https://trac.tools.ietf.org/html/rfc5891>
    > *
    >
    > *Issue* Per the reference, input into the IDNA Registration
    > process “MUST be… in Normalization Form C”. This input does not meet these
    > standards. The first label is not properly normalized. Implementations of
    > IDNA 2008 for registration should expect an exception. There are four (4)
    > such lines in the input file.
    >

    Here is the situation:

       - IDNA2003 allows as input denormalized text; it requires that text be
       normalized (and case-folded) in the process of generating the punycode.
       - IDNA2008 disallows denormalized text per se; however it allows a
       mapping phase for the input, which can do a normalization and case folding
       for consistency with IDNA2003.

    UTS#46 provides for a mapping that is consistent with IDNA2003 and allowed
    by IDNA2008. That mapping normalizes U\u0308 to a lowercase U-umlaut, which
    is valid.

    >
    >
    >
    > *Question **4 of 4***
    >
    > *Line* 276
    >
    > *Input* xn—53h
    >
    > *Reference* Appendix B.1 of *RFC 5892 (Tables)<https://trac.tools.ietf.org/html/rfc5892>
    > *
    >
    > *Issue* Per the reference, the character \u2615 is disallowed.
    >
    > 2460..26CD ; DISALLOWED # CIRCLED DIGIT ONE..DISABLED CAR
    >
    > Implementations should expect an exception. There are twenty (20) such
    > lines in the input file.
    >
    >
    >

    This is another instance where UTS#46 is mapping. See the line of
    http://unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt, which has the
    following. Such a mapping is permitted by IDNA2008.

    2461 ; mapped ; 0032 # 1.1 CIRCLED DIGIT TWO

    >
    > Any input is appreciated,
    >
    > -- John
    >
    >
    >
    >
    >
    > John Colosi | Naming Services | Veri*Sign*, Inc.
    > Å 703.948.3211 È 703.967.4062 Ê 703.421.8233
    >
    > *This message is intended for the use of the individual or entity to
    > **which it is addressed, and may contain information that is privileged,
    > **confidential and exempt from disclosure under applicable law. Any
    > **unauthorized use, distribution, or disclosure is strictly prohibited. If
    > **you have received this message in error, please notify sender
    > **immediately and destroy/delete the original transmission.
    >
    > *
    >



    This archive was generated by hypermail 2.1.5 : Sun Sep 19 2010 - 18:35:38 CDT