Re: [idn] IDN spoofing

From: Hans Aberg (
Date: Tue Feb 22 2005 - 12:44:48 CST

  • Next message: Christopher Fynn: "Re: IDN spoofing"

    At 13:33 +1100 2005/02/22, George W Gerrity wrote:
    >it doesn't make sense for these rules to be part of a standard on how to extend
    >Domain names to use scripts other than Latin: they are much better handled as
    >(algorithmic where possible) regulations specified by the authority for a given
    >TLD, or set of TLDs, in the case of the universal TLDs.

    It seems simplest to merely require the names to be 8-bit bytes, UTF-8

    >At the TLD itself, one can allow a limited, but finite number of character
    >strings to be equivalent, including the rule that script mixtures are
    >inadmissable, but maybe case folding will be allowed.

    Then if DNS name lookup software is not updated, only ASCII cases will be
    identified, as before, but no other casings, not even for Latin script
    letters with diacritical marks. (In retrospect, when facing the full Unicode
    set, it might have been better to identify ASCII letter cases.)

    > doesn't make sense for these rules to be part of a standard on how to
    >extend Domain names to use scripts other than Latin: they are much better
    >handled as (algorithmic where possible) regulations specified by the authority
    >for a given TLD, or set of TLDs, in the case of the universal TLDs.

    Then all confusable problems will be handled at the registry.

    >By using this approach, and starting off with a set of rules that disallow most
    >forms of script mixes, then where appeals to common sense and the wishes of a
    >reasonable number of potential clients suggest a loosening of the rules, this
    >can be done with little disruption to the existing state of affairs.

    If one uses the method I indicated to define equivalences, then script mixes
    can be allowed. If cases are not identified in scripts, then these
    equivalence will be between characters of different scripts. Thus, they
    should not cut down on manuscript names. (I want to avoid throwing in
    general equivalences such as that of casings, as different equivalences can
    combine to generate unwanted equivalence chains.)

    >The problems for universal TLDs (<.com>, <.net>) are far more complex, because
    >they are required to accept all language scripts.

    If all language scripts are already decided admissable on these levels,
    these will be the battleground for confusables. So there might not be a
    point in restricting other levels. One should also note that the country
    codes are not language or codes indicating scripts, and most nations are
    multilingual today. It might be constroversal to restrict country codes to
    just certain scripts.

    >c) At this point, the <> registrar will need to exercise some common
    >sense. For instance, it seems unreasonable that this domain should accept codes
    >outside the Latin and Cyrillic code blocks, and if they do, then mixes should
    >be strongly discouraged. Certainly, the use of, say, Hebrew vowel pointing with
    >Latin Codes, while perhaps acceptable in Israel TLD, should be unacceptable in
    >the Russia TLD. In fact, as a general rule, mixes of diacritics from one code
    >block with code points from another, should never be allowed.

    So this assumes that there are no Hebrews in Russia. This restriction might
    be interpreted politically as those speaking Hebrew in Russia should go to
    Israel, at least as far as defining their Internet domain names goes. It
    might be wise to avoid this kind of political controversy. :-)

    I think one can define a lot of homograph equivalences, which is then used
    only for an automated first check when attempting to register a new name.
    The cases that fail to register automatically will become reviewed by a
    human. One will then discover if one has defined too many equivalences. It
    might be wise to set up a report system, where the public can report
    confusable names. Then a committee will have to review those cases, and
    decide what to do about them.

    (I also like the idea that sites that use a non-ASCII name must register a
    parallel ASCII name, for international access: It might be difficult to make
    proper control of sites if one has to be an expert on International scripts
    in order to access them. One easy way for a criminal to "hide away" a site
    might otherwise to give it a strange name.)

      Hans Aberg

    This archive was generated by hypermail 2.1.5 : Tue Feb 22 2005 - 15:32:21 CST