Re: Public Review Issue #133: Proposed Draft UTS #46, Unicode IDNA Compatible Preprocesssing

From: Mark Davis (
Date: Fri Dec 12 2008 - 13:36:24 CST

Re #1. This document is written presuming that IDNA2008 is approved as it is
in the current draft. Whether or not I'd agree with you, if you have any
concerns about IDNA2008, this isn't the forum to voice them. (Note: I have a
personal document at,
but note that that is a personal contribution.)

To Subscribe:

The draft documents are at:

draft-ietf-idnabis-defs <>
draft-ietf-idnabis-bidi <>

Protocol is the key one. Rationale is intended to be informative background.

The archives are long - you can try searching for particular terms, eg
googling with a site search: eszett

Re #2. "[\-a-zA-Z0-9]" as "[a-zA-Z0-9-]"

This is a bit tricky, because "-" is a syntax character, and *only* by
convention is "-" at the start or end of the whole range treated as a
literal; but that convention that can vary between regex engine. So the most
neutral expression is to quote it specifically.

Re #3. About the underbar - people wiser than I in the ways of ASCII domain
names have said that _ is prohibited, and the IDNA specs certainly disallow
it. You might raise this question on the IDNA list if you want an answer.


On Thu, Dec 11, 2008 at 05:21, Troy <> wrote:
> 1.
> The document states that allowing names to be interpreted differently by
> different applications would cause a "huge interoperability problem."
> Then, right after a table listing some examples, the document goes on to
> say that "[An IDNA2008-conformant implementation] could even decide,
> based on local linguistic mappings, to map #5 and #6 to different valid
> domain names".
> Do I understand correctly that it will now become acceptable to have
> "huge interoperability problems," as different applications are certain
> to handle locales differently.
> I see it as an improvement that invalid names are no longer allowed.
> I.e. any name which is not already normalized and in lower case will not
> be allowed. This makes it unambiguous as to which name is meant.
> Therefore I find it really contradictory that software is allowed to use
> "local mapping" to interpret a name in an unpredictable manner. Two
> domain names, e.g. "ää.com <>" and "" can be
owned by two different
> entities, so it cannot be acceptable behavior that a name "Ää.com" can
> be interpreted as "" by software running under the US locale, and
> as "ää.com <>" or even "" by software running
under the German
> locale.
> I think software must interpret the name as "ää.com <>"
and if it can't,
> reject it as invalid.
> 2.
> Wouldn't it be clearer to express "[\-a-zA-Z0-9]" as "[a-zA-Z0-9-]"?
> 3.
> The following sentence seems a bit odd:
> "Note also that some browsers allow characters like "_" in domain
> names."
> RFC 1033 recommends a set of characters for domain name labels which
> includes the underscore [a-zA-Z0-9_-]. Therefore it is no surprise that
> they are accepted as valid labels by browsers and other software.
> As an aside, why does the pattern of allowed characters exclude the
> underscore character?
> Troy
> --
> Troy Korjuslommi
> +358 40 570 9900
> Tksoft Inc.

This archive was generated by hypermail 2.1.5 : Fri Jan 02 2009 - 15:33:07 CST