Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms

From: Simon Josefsson (jas@extundo.com)
Date: Fri Jan 28 2005 - 15:09:41 CST

Next message: Doug Ewell: "Re: Subj: Scotland"

Previous message: Jon Hanna: "RE: Subj: Scotland"
In reply to: Markus Scherer: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Markus Scherer <markus.scherer@jtcsv.com> writes:

> The interesting thing here is that ICU's IDN implementation was
> modified, in response to the earlier discussion on this list, to use
> the broken NFKC implementation, while ICU's normalization API
> provides the fixed implementation according to the corrigendum (and
> the sample code).

That is indeed interesting, thanks for the information.

>> It would be interesting to find out what percentage of the problem
>> sequences are unstable under NFKC.
>
> This might be difficult: There is an infinite number of such
> sequences since there can be more than one combining mark between
> the wrongly composing characters. A comparison would be on the order
> of how many even numbers are there compared to all integers.

I'd settle for the percentage of 10e6 (properly) randomized problem
sequences that are unstable under NFKC. The figure is only input to
discussions anyway.

> I propose that
> 1. Domain name registrars test new registrations for problematic
> domain names and reject them, ASAP.
> For example, ICU's internal flag could be used to normalize
> a string twice and check for differences.
> 2. Domain names that have already been registered be checked
> for problematic strings.
>
> Number 1. ensures that the problem does not grow, as far as domain names are concerned.
> I predict that number 2. will produce an empty set.

I agree completely.

Libidn include an API to test for the problem sequences, and I
recommend everyone to use it:

http://josefsson.org/libidn/manual/html_node/PR29-Functions.html

Thanks,
Simon

Next message: Doug Ewell: "Re: Subj: Scotland"
Previous message: Jon Hanna: "RE: Subj: Scotland"
In reply to: Markus Scherer: "Re: Open Issue #61: Proposed Update UAX #15 Unicode Normalization Forms"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Jan 28 2005 - 15:12:18 CST