Re: Unicode and RFC 4690

From: Jefsey_Morfin (jefsey@jefsey.com)
Date: Thu Oct 05 2006 - 18:20:26 CST

  • Next message: Neil Harris: "Re: Unicode and RFC 4690"

    There is a confusion between the need (IDN) and a solution (IETF
    IDNA). Regulating the need will not correct the lacks of the
    solution. The solution must be fool proof. How? In having all the
    confusive strings being converted into the same ACE ("xn--" ASCII
    equivalence).

    Is it possible? Yes. A grapheme is a graphic concept which can be
    mathematically documented. The problem is that Unicode assigns
    numbers to these concepts in a polyonomous manner. So we need a
    Unicode/Grapheme table. Either in comparing the characters'
    mathematic descriptions through their integrals (graphemes). Or in
    capitalizing on experience. To obtain a table of characters graphic
    families ( another way to list graphemes).

    A "super punycode" version will use this table to transcode in the
    same ASCII sequence all the characters of the same family. This will
    remove none of the possibilities of the current solution, but it will
    prevent two different ACE from being seen in the same way. Because
    there will be only one possible ACE possible. This will not reduce
    the possibility of that ACE to fully support all the existing
    confusive labels. The disadvantages of not using that "super
    punycode" function will probably make it used quickly. The drawback
    is that some existing names may be confused with other names.This is
    why the need is urgent (there is a limited number of IDNs and no many
    confusive ones [confusive labels are at higher levels]). If this was
    a real difficulty, the solution is proposed is to use another prefix
    than "xn--" (this would help addressing another type of problem).

    jfc

    At 21:50 05/10/2006, Philippe Verdy wrote:

    >From: "Neil Harris" <neil@tonal.clara.co.uk>
    > > UTR #36 and UTR #39 have a very detailed treatment of the all the issues
    > > involved.
    > >
    > > Notice that implementing these constraints on a per-label basis has no
    > > bearing at all on script-mixing between different labels in a FQDN,
    > > which is not a security problem, and that nothing in the above policy
    > > need stop labels from any of a number of different individual character
    > > sets from being issued in the same zone, providing care is taken to
    > > block or bundle possible collisions.
    > >
    > > Politics shouldn't be the issue here: individual domain operators and
    > > their users should all have a common interest in preventing homograph
    > > attacks, and these techniques can work effectively regardless of
    > > political issues.
    >
    >One problem of this RFC is that the current format for the database
    >of confusables supported as equivalents by a registry is NOT
    >integrated in the DNS so that it can scale widely.
    >
    >I would better expect a format that can be integrated completely as
    >DNS records, possibly with a new DNS record type, simple to parse,
    >and where each DNS server may cache reliably by a reference to a
    >authoritative DNS server maintained by the registry (or the domain
    >administrator if this is in a private domain).
    >
    >Such files do not address the need in local subdomains, and having a
    >single file per language will not resolve the issues regarding
    >security and ease of deployment.
    >
    >Note that even if a TLD registry does not support IDN, support for
    >IDN labels may be present (wanted, needed) within a subdomain for
    >various things such as user names, product names, book titles...
    >used as labels within a private subdomain.
    >
    >If every registry (or domain name authority) can specify its own
    >rules regarding acceptable characters and their IDN-canonical
    >equivalents, things would be simpler. The RFC just needs to address
    >the required features in the IDN implementation, i.e. the implicit
    >(non negociable) support for Unicode canonical equivalents (from
    >which it is NOT necessary to specify the list of all possible equivalents).
    >
    >Each TLD registry or each subdomain authority should provide a
    >default set of rules that will be applied by default in all
    >subdomains, unless one of the registered domain contains a record
    >referencing another rule set (which should be another domain name
    >that specifies the complete set of rules), or records specifying
    >overrides (for example, the support of more characters); one of the
    >common ruleset should include the one for the default reduced
    >ASCII-only subset (i.e. no support for IDN), and this should be
    >specified simply by referencing the domain name of the root registry
    >(if the root must remain ASCII-only), or a documented domain name
    >owned by the authority managing the root (for example
    >no-idn.iana.org), or some special subdomain (for example:
    >none.idn.arpa), where the data of the rule set is registered.



    This archive was generated by hypermail 2.1.5 : Thu Oct 05 2006 - 18:27:23 CST