Re: [idn] IDN spoofing

From: George W Gerrity (g.gerrity@gwg-associates.com.au)
Date: Mon Feb 21 2005 - 20:33:41 CST

  • Next message: Erik van der Poel: "nameprep, IDN spoofing and the registries"

    The two references below summarise much that has been said about the
    difficulty of dealing with the internationalisation of Domain Names.
    Let us agree once and for all:

    1. The completely general problem is mathematically and computationally
    intractable, even if we use fuzzy mapping;

    2. The problem is a typical engineering challenge to find a workable
    solution $B!=(B future-proofed as much as possible $B!=(B which is minimally
    complex;

    3. If the engineers (us?) don't solve it, the lawyers will have a
    heyday, the courts will find expensive solutions, the cost of running
    the web will blow out, and all of us will have mud all over our faces.

    4. Now is the time $B!=(B when there are only a very few registered names
    with possible clashes $B!=(B to do it before we have to go through the
    painful process of unregistering names and upgrading TLD machine codes.

    So let's sketch out an approach, using <.com.ru> as an example.

    a) The <.com.ru> registrar only accepts latin characters for that
    domain name, or only accepts Cyrillic characters, no mix, and maps the
    two as equivalent. Case-equivalence mapping may also be allowed, at a
    cost of more complexity. Let the registrar decide that, and let's be
    sure that as far as possible, the issuing authority licencing the TLD
    to the registrar ensures legal protection for these arbitrary, but
    fixed decisions.

    b) the first filter selects name tags whose codes (including
    diacritics, etc) are either not all in the Cyrillic block or the Latin
    block(s) for special attention.

    My guess is that at this point, only a few percent will require special
    attention.

    c) At this point, the <.com.ru> registrar will need to exercise some
    common sense. For instance, it seems unreasonable that this domain
    should accept codes outside the Latin and Cyrillic code blocks, and if
    they do, then mixes should be strongly discouraged. Certainly, the use
    of, say, Hebrew vowel pointing with Latin Codes, while perhaps
    acceptable in Israel TLD, should be unacceptable in the Russia TLD. In
    fact, as a general rule, mixes of diacritics from one code block with
    code points from another, should never be allowed.

    Further rules can limit legal sequences of the allowed mixes. For
    instance, in alphabetic scripts such as Latin and Cyrillic, isolated
    code points from one script found in another make no sense unless
    spoofing is intended. Earlier, I suggested that a code-point string of
    a single script found mixed with strings of other scripts, should be of
    minimum length 2. One can also limit the number of separate substrings
    of an alternate script found interspersed with a dominant (national?)
    script.

    These sort of common-sense rules can be easily implemented and the
    computational overhead is minimal. Of course, owners of ridiculous
    trade marks (such as <U+004B U+0049 U+039B>, $B!H(BKI$B&+!I(B, for the brand name
    of the automobile $B!H(BKIA$B!I(B) will disagree, but realism has to intrude
    somewhere into the free market economy.

    The problems for universal TLDs (<.com>, <.net>) are far more complex,
    because they are required to accept all language scripts. At the TLD
    itself, one can allow a limited, but finite number of character strings
    to be equivalent, including the rule that script mixtures are
    inadmissable, but maybe case folding will be allowed.

    Once again, however, application of some judicious sieve filters and
    rules about how mixed scripts may be composed, can simplify the
    handling of the name tags. There are also sieve rules that can
    immediately throw out most inadmissable combinations, such as the
    string length rule mentioned above. Those strings remaining can be
    tossed to a human, who will be required to be an expert in orthography
    (nice new line of business for many on the Unicode list?).

    Now, it doesn't make sense for these rules to be part of a standard on
    how to extend Domain names to use scripts other than Latin: they are
    much better handled as (algorithmic where possible) regulations
    specified by the authority for a given TLD, or set of TLDs, in the
    case of the universal TLDs.

    By using this approach, and starting off with a set of rules that
    disallow most forms of script mixes, then where appeals to common sense
    and the wishes of a reasonable number of potential clients suggest a
    loosening of the rules, this can be done with little disruption to the
    existing state of affairs.

    George
    ------

    On 22 Feb 2005, at 08:40, Doug Ewell wrote:

    > Hans Aberg <haberg at math dot su dot se> wrote:
    >
    >> The suggestion I made, was to use a function to detect confusables by
    >> declaring them equivalent, but retaining the full Unicode character
    >> set for representing the IDN's. If this is used at the registration
    >> level only, the only thing that happens when somebody enters a
    >> confusable, is that it is rejected. There is a problem only when an
    >> authority admits parallel, confusable names to be registered.
    >
    > Granted. The problem, as I have said so often, is determining what the
    > set of "confusables" is. Don't just say a/$B'Q(B and o/$B&O(B, either; that's
    > only the tip of the iceberg.

    On 22 Feb 2005, at 07:03, Erik van der Poel wrote:

    > Hans Aberg wrote:
    >> Sure you can change it: One can make the equivalence classes smaller,
    >> whenever one wants.
    >
    > As a mathematician, one might be inclined to think that way. But here,
    > we're not talking about theoretical mathematics. We're talking about
    > network engineering. A totally different way of thinking.
    >
    > You can't just change the mapping whenever you want because there are
    > many (client and server) installations out there that can't be changed
    > overnight (what is known in network parlance as a "flag day").
    >
    > For example, even if a registry were to change their mapping, go
    > through their entire database, and delete the names that are
    > determined to be duplicates (however one might accomplish that), there
    > will be people with the old version of the app, which uses the old
    > mapping, and will not be able to find the name (since it has been
    > deleted).
    >
    > Now, this might be a good thing if the name is an evil spoof, but what
    > about innocent registrations? What if two separate parties have an
    > equally legitimate claim on a particular name? This happens a lot in
    > the ASCII DNS, and basically, whoever got there first (or is willing
    > to pay a lot of money) wins.
    >
    > One way to continue to support these innocent duplicates is to use a
    > different prefix (i.e. something other than xn--) in the new mapping,
    > and keep the old names (with the old prefix) in the database (instead
    > of deleting them). This way, the old clients continue to find the old
    > innocent names.
    >
    > But what about the new clients? Now they will suddenly end up on a
    > different Web site when the user clicks on a link. I suppose the user
    > will just have to update their client, or the domain name owner will
    > have to register a different name and update all the Web pages to
    > point to the different name (assuming that they even have control over
    > *all* of the Web pages that might contain a link to their site).
    >
    > And so on. Do you get it now? You can't just change the mapping
    > "whenever" you want. If you do this at all, you do it as few times as
    > possible.
    >
    > Now, you may point out that we are just getting started with IDN and
    > that not very many names have been registered (and I may even agree
    > with you), but it would still take a while to come up with a better
    > mapping (and reach consensus on it -- shudder), and in the meantime,
    > more names would be registered.
    >
    > And this still would not negate my main point, which is that you can't
    > do this "whenever" you want.
    >
    > Erik
    >
    >



    This archive was generated by hypermail 2.1.5 : Mon Feb 21 2005 - 20:34:40 CST