Re: [idn] IDN spoofing

From: George W Gerrity (
Date: Mon Feb 21 2005 - 06:51:05 CST

  • Next message: Patrick Andries: "diacritic spoofing (Re: orthographies)"

    On 21 Feb 2005, at 22:36, William Tan wrote:

    > George W Gerrity wrote:
    >> For the second-level (or third-level where the top is a country code)
    >> domain tag, it should be the legal responsibility of the name
    >> authorities for the domain above to ensure that spoofed names cannot
    >> be registered (or if registered, all belong to one owner). In the
    >> Western world, if that is not already the case, then I'm sure that
    >> the first time a spoof of, say Coca-Cola (or Pepsi — let's be
    >> even-handed) is registered, then we can be certain that afterwards,
    >> the issuing authority will never do it again.
    > While it is true that TLDs are responsible for preventing the
    > registration of spoofs, commercial TLDs that have automated
    > registration systems never perform that check.

    I'm suggesting that pretty soon they will start, or get sued by the big
    boys. It would be a help if the terms of their licensing included a
    requirement to take all reasonable steps to disallow spoofs. Otherwise,
    if name handling is automatic, being a TLD Authority is just a license
    to print money.

    > Does registering prevent someone else from getting

    Quite obviously, it doesn't right now, until (not if) the “real”
    coca-cola comes down like a ton of bricks on the TLD Authority (rather
    than the spoofer: deepest hip pocket principle). I don't know if this
    particular case has occurred, but certainly there have been court
    cases, fought successfully, against people who have jumped in and
    registered a trade-mark name as part of a domain. As the usage and law
    of domain names mature, a natural extension is to include spoofs as
    well as trade marks in the list of names not registrable except by the
    owner of the trade-mark or non-spoof tag.

    >> In the case of countries whose law systems are still a bit wild and
    >> wooly (The former Soviet Union?), then I suspect that for the time
    >> being it will remain ‘Caveat Emptor’. In either case, a domain name
    >> holder should be able to license all spoofs for free, in order to
    >> limit its exposure to spoofing, whether or not there is adequate
    >> legal recourse.
    > If the TLD operator is careful, there is no need to license spoofs to
    > protect one's domain from being spoofed. On the other hand, if the TLD
    > does not even perform that check (such as .com), then it is unlikely
    > that you get to license all spoofs for free anyway - you have to pay
    > for each and every permutation of it.

    Hence the reason — in the short term — for allowing the owner of an
    original to register all spoofs free of charge. Currently, I believe
    that some big internationals are already doing that — ie, registering
    all lookalikes in every conceivable domain. They can afford it:
    start-ups can't. When (not if) the law suits start to come in, TLD
    operators will be happy to license spoofs for free to the legit holder
    of a name (or at least, add them to their tables of non-licensable
    names), because it will mean that fewer number of potential court

    >> The point I'm making is that while the authorities for or
    >> may do what they like, we can at least give them advice plus
    >> some tables that will detect many, if not most, spoofs. In the case
    >> where the authority allows (for whatever reason) a name with mixed
    >> orthographies, then clearly the first to apply whose signature is not
    >> a spoof for an (already well-established) trade-marked name or domain
    >> name, should get the license, and all other applicants with a similar
    >> name be refused. The name authority should be protected by the laws
    >> of the countries in which it operates from being sued for refusing to
    >> register confusable names.
    > This is a fairly interesting proposal, i.e. to use the bundling (see
    > draft-klensin-reg-guidelines or rfc3743) to solve the homograph
    > problem at the registry level, provided we can come up with a
    > satisfactory table of lookalikes.

    No reason why we should look around too much for homographs, which, as
    has already been said, depend so much on the actual fonts use. I can't
    imagine that, for instance, a hit on comparing greek lc omega shouldn't
    always be termed a homograph for latin w. It is then up to the registry
    to determine if it will allow isolated greek omegas to appear in an
    otherwise latin string, or even if it will allow any sort of mixed

    > As an example, the word "coke" can be represented completely in
    > Cyrillic homographs, so one can generate 16 combinations of ASCII and
    > Cyrillic characters forming strings that look like "coke". When you
    > register "", the other 16 variants are automatically tied to
    > this domain (for free or for a fee). They can be either all activated
    > (put into the zone file) or simply blocked from registration.
    > The good thing about this is that the lookalikes mapping table does
    > not have to be set-in-stone at the protocol level, but individual
    > registries may choose to implement whatever makes sense for them.

    Exactly. But you can be certain that as the domain system matures, the
    rules at TLD registries will tighten up, if only so that they can
    automate the finding of spoofs of legitimate, already-registered names.

    > The problem with this is that the number of variants gets out of hand
    > pretty quickly, and most registry systems aren't equipped to deal with
    > bundles.

    Yep. That's why the rules will tighten up. For instance, the
    TLD might refuse names containing characters from Greek or Coptic code
    groups, and might even refuse to register names containing mixes of
    cyrillic and roman characters, where the string sizes of the characters
    in one set are less than two, or where there are more than three mixed
    substrings. That reduces the size of the lookup tables considerably. In
    your example of ‘coke’, there are only two combinations of substrings
    (each from one set, but differing from the other substrings) where all
    are of size > 1.

    However, while these suggestions will be a help in forming rules, and
    the existence of lists or tables of homographs will be welcome, in the
    long run it will be up to the registries to get it right, or they will
    find themselves out of business due to legal costs.


    This archive was generated by hypermail 2.1.5 : Mon Feb 21 2005 - 06:52:04 CST