From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Mon May 07 2007 - 16:34:58 CDT
Philippe Verdy wrote on Monday, May 07, 2007 7:19 PM
> Richard Wordingham wrote:
>> The present standard for International Domain Name Processing (nameprep -
>> RFC 3491 and stringprep - RFC 3454) currently operates with four steps:
>> mapping, normalisation (NFKC), prohibition and bidi checking. Mapping
>> replaces single characters by sequences, which may be empty. It is
>> composed
>> of two elements - deletion of default ignorables, and full case-folding,
>> complicated because it is done before compatibility decomposition. (I
>> may
>> have missed some minor wrinkles in mapping.)
> Isn't the Unicode normalization the first step to perform before
> performing
> mappings and deletion?
Not if http://tools.ietf.org/html/rfc3454 is the definition. I think the
transformation sequence should have been:
1) Conversion to NFKD
2) Mapping
3) Conversion to NFC
but it isn't.
Mapping could then have been almost entirely deletion and standard full case
mapping. Instead, some of the compatibility decompositions are addressed in
mapping, e.g. mapping U+1D449 MATHEMATICAL ITALIC CAPITAL V directly to
U+0076 LATIN SMALL LETTER V, instead of using the compatibility
decomposition to U+0056 LATIN CAPITAL LETTER V, and then using the normal
lower casing.
> The nameprep result strings should be identical from all canonically
> equivalent Unicode strings.
>
> The complication that you may have forgotten is that you must compute the
> closure of these steps. Unicode provides a few closures for the
> combination
> of standard normalization and standard case foldings.
> For IDN purpose, that performs additional case mappings, you need to
> compute
> the extra closures.
I would have hoped that the sequence I gave is idempotent (if that is what
you mean by computing the closure). Do you know that some transformation is
missing, or are you just being cautious?
> Given that NFKC is one member of the transformation, the
> canonical equivalence of the nameprep result which is in normalized form
> should be guaranteed, otherwise your nameprep implementation is bogous.
Is nameprep a Unicode-compliant process? I would hope that it was, but then
I thought default full upper-casing ought to be a Unicode-compliant process.
Richard
This archive was generated by hypermail 2.1.5 : Mon May 07 2007 - 16:36:24 CDT