Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms

From: Simon Josefsson (
Date: Thu Jan 27 2005 - 03:46:22 CST

  • Next message: Radovan Garabik: "Greek sigma with acute accent"

    "Shawn Steele" <> writes:

    > "Simon" said:
    >> There is deployed code and standards that use the old interpretation.
    > There is deployed code that use both of the interpretations.

    Right, so there is a practical problem.

    >> StringPrep, and IDN, will continue to use the old interpretation,
    >> until they are updated to reference this update. There are no draft
    >> documents on that, as far as I know.
    > As far as I know (I could be wrong), StringPrep & IDN don't specify
    > which interpretation of the UAX are "correct" for those RFCs.

    Those specifications were published before the problem was discovered,
    so they couldn't have specified what to do.

    By referencing Unicode 3.2, StringPrep use the old interpretation.
    Clarifying this would be good, because it is not universally accepted,
    but sadly this doesn't seem to happen, leaving implementations with a
    interoperability problem.

    > Besides, these are not linguistically correct code points so names
    > shouldn't really contain them. Additionally IDN requires that
    > ToAscii(ToUnicode(x)) == x, which pretty much causes NFKC(x) == x
    > (ToAscii does the NFKC step and x should already be NFKC.) So any
    > name that would be broken by this clarification would be illegal
    > anyway in IDN.

    No, that is false. Let's say x = U+1100 U+0300 U+1161. ToUnicode(x)
    = x by definition (see 4.2 of RFC 3490). ToAscii(ToUnicode(x)) =
    xn--ksa1467f, with the fix (i.e., how IDN is specified to work). You
    then get ToUnicode(ToAscii(ToUnicode(x))) = U+AC00 U+0300, which
    according to PR29 would be "wrong". With the proposed fix you would
    get U+1100 U+0300 U+1161 instead. There is nothing invalid about
    these IDN strings, although they supposedly do not occur naturally.

    >> I'd wish that this was only about punishing people that came to the
    >> "wrong conclusion". I believe the previous situation was perfectly
    >> clear, even if that situation is problematic, in that the introduction
    >> text and example code were buggy. It seems to me that one problematic
    >> situation is solved by creating other problems.
    > Its obvious that the text disagreed with itself and the sample. Where
    > the bug is seems to be somewhat subjective, however the NFKC(NFKC(x)) ==
    > NFKC(x) is obviously desirable and was explicitly stated in the text.
    > It is unfortunate that this test case wasn't included in the test file
    > :-)


    > Anyway, this has been well discussed already, and either way would
    > require some people to fix their code, so I wouldn't try to argue
    > against the update :-)

    It is not about merely fixing code. When/if StringPrep use Unicode
    4.1 or later, with the fix, there will be an upgrade problem with
    interopability and at worst security implications.


    This archive was generated by hypermail 2.1.5 : Thu Jan 27 2005 - 03:49:04 CST