Re: Unicode and RFC 4690

From: Neil Harris (neil@tonal.clara.co.uk)
Date: Thu Oct 05 2006 - 11:53:37 CST

  • Next message: Magda Danish (Unicode): "The Unicode Standard, Version 5.0 preorder period ends in 10 days!"

    Stephane Bortzmeyer wrote:
    > On Thu, Oct 05, 2006 at 09:32:08AM +0200,
    > Philippe Verdy <verdy_p@wanadoo.fr> wrote
    > a message of 10 lines which said:
    >
    >
    >> * avoiding labels using multiple scripts and informing users when an
    >> IDN label may contain confusable characters. This is for the
    >> immediate client-side need.
    >>
    >
    > And it is a terrible idea because, in many countries, Latin letters are
    > used together with the local script, at least in the computing domain
    > (Russia is a good example).
    >
    >
    >> * developing a standard within the DNS that allows each DNS server
    >> to specify which set of non confusable characters it accepts for
    >> registration as subdomain names.
    >>
    >
    > Warning: registration is not done at the DNS server but at the
    > registry system. There are much less registries than DNS servers so
    > the need of a standard is less obvious.
    >
    > Otherwise, there *is* a standard to express the list of authorized
    > characters (RFC 4290 and I attach a table at this syntax for the french
    > language).
    >
    > It does not address confusability issues because the entire area
    > is... confuse and has no solution. It is just a way for ICANN to step
    > in the registration policies of TLDs.
    >

    As RFC 4690 acknowledges, there can never be a perfect solution to the
    confusables problem; but it does mean we shouldn't try to address it. As
    it happens, we can do rather well at reducing the possibilities for
    spoofing to very low levels.

    Character repertoires for DNS labels, combined with script-mixing rules
    can _completely eliminate_ the possibility of mixed-script confusables,
    as well as vastly reducing the opportunities for within-script
    confusables. Doing this reduces the combinatorial opportunities for
    spoofing generation by many orders of magnitude, and similarly
    simplifies the task of constructing confusables lists, greatly
    increasing the chances of successful blocking of the remaining
    single-script and whole-script confusables by other means such as
    homograph lists.

    UTR #36 and UTR #39 have a very detailed treatment of the all the issues
    involved.

    Notice that implementing these constraints on a per-label basis has no
    bearing at all on script-mixing between different labels in a FQDN,
    which is not a security problem, and that nothing in the above policy
    need stop labels from any of a number of different individual character
    sets from being issued in the same zone, providing care is taken to
    block or bundle possible collisions.

    Politics shouldn't be the issue here: individual domain operators and
    their users should all have a common interest in preventing homograph
    attacks, and these techniques can work effectively regardless of
    political issues.

    -- Neil



    This archive was generated by hypermail 2.1.5 : Thu Oct 05 2006 - 11:55:38 CST