IDN spoofing

From: Erik van der Poel (erik@vanderpoel.org)
Date: Fri Feb 18 2005 - 07:41:45 CST

  • Next message: Neil Harris: "Re: IDN problem.... :("

    All,

    This email is being sent to both the Unicode and IDN mailing lists. I'm
    wondering how we can move forward with the IDN spoofing issue. Let me
    take a stab at it.

    Regarding the proposal to unify (or map) all the homographs, Doug Ewell
    wrote a humorous email illustrating how difficult such an effort would be:

    http://ops.ietf.org/lists/idn/idn.2002/msg00498.html

    John Klensin says that a "one label, one language" rule has been
    suggested to combat look-alike confusion. See section 1.5.1 in:

    http://www.ietf.org/internet-drafts/draft-klensin-reg-guidelines-06.txt

    Indeed, this label-based idea makes sense because DNS is
    administratively divided into labels. For example, the .com operator
    might be able to impose some restrictions on the 2nd level domain label,
    but if someone registers foo.com, then it's up to them to decide what
    will be allowed at the 3rd level (e.g. bar.foo.com). No?

    Recent discussion on the IDN mailing list has suggested that we might
    want to think more in terms of *script* than language. However, I note
    that there is a very diverse history of mixing scripts:

    http://ptolemy.tlg.uci.edu/~opoudjis/unicode/unicode_mixing.html

    But do we really need to allow for such rich script mixing in DNS? Some
    of the script mixing described in the document above is scholarly
    transliteration or "one-offs".

    So, instead, I propose that we start thinking of a "one label, one
    writing system" rule. The Unicode book defines "writing system" as "a
    set of rules for using one or more scripts to write a particular language".

    This makes a lot of sense for some of the ccTLDs. For example, the .jp
    domain could choose to allow the Japanese writing system in the 2nd
    level domain label.

    But what can we do about .com? It's clearly a worldwide TLD now. It
    should probably allow multiple writing systems. Perhaps the .com
    operator could specify that 2nd level domain labels must stick to one
    writing system, and that that writing system must be indicated in the
    RRP (Registry Registrar Protocol) in order to validate the 2nd level
    name against the table of characters allowed in that writing system.

    This would probably require a (new?) set of names for writing systems,
    somewhat similar to the language tags of ISO 639.

    Some people might point out that it is unfair to impose a writing system
    rule on domain labels since DNS has not had such restrictions in the
    past. Or has it? The DNS spec itself may allow various octet values, but
    the infrastructure and conventions appear to be restricted to some of
    the ASCII characters, which I guess you could just call the English
    writing system, no?

    Also, I'm guessing that any "one label, one writing system" rule cannot
    really be mandated, since TLD operators have historically been free to
    do whatever they want, to make as much money as they want. So this rule
    would just be a guideline (Klensin's document is titled "Suggested
    Practices ...") and the TLD operators could follow it, if they wish to
    combat the IDN spoofing problem more than they wish to make money (in
    the short term :-)

    Erik



    This archive was generated by hypermail 2.1.5 : Fri Feb 18 2005 - 07:42:51 CST