Re: IDN and Missed Normalisations

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Mon May 07 2007 - 16:34:58 CDT

  • Next message: Mike: "Re: IDN and Missed Normalisations"

    Philippe Verdy wrote on Monday, May 07, 2007 7:19 PM

    > Richard Wordingham wrote:
    >> The present standard for International Domain Name Processing (nameprep -
    >> RFC 3491 and stringprep - RFC 3454) currently operates with four steps:
    >> mapping, normalisation (NFKC), prohibition and bidi checking. Mapping
    >> replaces single characters by sequences, which may be empty. It is
    >> composed
    >> of two elements - deletion of default ignorables, and full case-folding,
    >> complicated because it is done before compatibility decomposition. (I
    >> may
    >> have missed some minor wrinkles in mapping.)

    > Isn't the Unicode normalization the first step to perform before
    > performing
    > mappings and deletion?

    Not if http://tools.ietf.org/html/rfc3454 is the definition. I think the
    transformation sequence should have been:

    1) Conversion to NFKD
    2) Mapping
    3) Conversion to NFC

    but it isn't.

    Mapping could then have been almost entirely deletion and standard full case
    mapping. Instead, some of the compatibility decompositions are addressed in
    mapping, e.g. mapping U+1D449 MATHEMATICAL ITALIC CAPITAL V directly to
    U+0076 LATIN SMALL LETTER V, instead of using the compatibility
    decomposition to U+0056 LATIN CAPITAL LETTER V, and then using the normal
    lower casing.

    > The nameprep result strings should be identical from all canonically
    > equivalent Unicode strings.
    >
    > The complication that you may have forgotten is that you must compute the
    > closure of these steps. Unicode provides a few closures for the
    > combination
    > of standard normalization and standard case foldings.

    > For IDN purpose, that performs additional case mappings, you need to
    > compute
    > the extra closures.

    I would have hoped that the sequence I gave is idempotent (if that is what
    you mean by computing the closure). Do you know that some transformation is
    missing, or are you just being cautious?

    > Given that NFKC is one member of the transformation, the
    > canonical equivalence of the nameprep result which is in normalized form
    > should be guaranteed, otherwise your nameprep implementation is bogous.

    Is nameprep a Unicode-compliant process? I would hope that it was, but then
    I thought default full upper-casing ought to be a Unicode-compliant process.

    Richard



    This archive was generated by hypermail 2.1.5 : Mon May 07 2007 - 16:36:24 CDT