Re: Public Review Issue #133: Proposed Draft UTS #46, Unicode IDNA Compatible Preprocesssing

From: Troy (
Date: Thu Dec 11 2008 - 07:21:35 CST

The document states that allowing names to be interpreted differently by
different applications would cause a "huge interoperability problem."

Then, right after a table listing some examples, the document goes on to
say that "[An IDNA2008-conformant implementation] could even decide,
based on local linguistic mappings, to map #5 and #6 to different valid
domain names".

Do I understand correctly that it will now become acceptable to have
"huge interoperability problems," as different applications are certain
to handle locales differently.

I see it as an improvement that invalid names are no longer allowed.
I.e. any name which is not already normalized and in lower case will not
be allowed. This makes it unambiguous as to which name is meant.

Therefore I find it really contradictory that software is allowed to use
"local mapping" to interpret a name in an unpredictable manner. Two
domain names, e.g. "ää.com" and "" can be owned by two different
entities, so it cannot be acceptable behavior that a name "Ää.com" can
be interpreted as "" by software running under the US locale, and
as "ää.com" or even "" by software running under the German

I think software must interpret the name as "ää.com" and if it can't,
reject it as invalid.

Wouldn't it be clearer to express "[\-a-zA-Z0-9]" as "[a-zA-Z0-9-]"?

The following sentence seems a bit odd:
"Note also that some browsers allow characters like "_" in domain
RFC 1033 recommends a set of characters for domain name labels which
includes the underscore [a-zA-Z0-9_-]. Therefore it is no surprise that
they are accepted as valid labels by browsers and other software.

As an aside, why does the pattern of allowed characters exclude the
underscore character?


Troy Korjuslommi
+358 40 570 9900
Tksoft Inc.

This archive was generated by hypermail 2.1.5 : Fri Jan 02 2009 - 15:33:07 CST