Re: Unicode and RFC 4690

From: Jefsey_Morfin (jefsey@jefsey.com)
Date: Thu Oct 05 2006 - 18:20:26 CST

Next message: Neil Harris: "Re: Unicode and RFC 4690"

Previous message: Philippe Verdy: "Re: Unicode and RFC 4690"
In reply to: Philippe Verdy: "Re: Unicode and RFC 4690"
Next in thread: Neil Harris: "Re: Unicode and RFC 4690"
Reply: Neil Harris: "Re: Unicode and RFC 4690"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

There is a confusion between the need (IDN) and a solution (IETF
IDNA). Regulating the need will not correct the lacks of the
solution. The solution must be fool proof. How? In having all the
confusive strings being converted into the same ACE ("xn--" ASCII
equivalence).

Is it possible? Yes. A grapheme is a graphic concept which can be
mathematically documented. The problem is that Unicode assigns
numbers to these concepts in a polyonomous manner. So we need a
Unicode/Grapheme table. Either in comparing the characters'
mathematic descriptions through their integrals (graphemes). Or in
capitalizing on experience. To obtain a table of characters graphic
families ( another way to list graphemes).

A "super punycode" version will use this table to transcode in the
same ASCII sequence all the characters of the same family. This will
remove none of the possibilities of the current solution, but it will
prevent two different ACE from being seen in the same way. Because
there will be only one possible ACE possible. This will not reduce
the possibility of that ACE to fully support all the existing
confusive labels. The disadvantages of not using that "super
punycode" function will probably make it used quickly. The drawback
is that some existing names may be confused with other names.This is
why the need is urgent (there is a limited number of IDNs and no many
confusive ones [confusive labels are at higher levels]). If this was
a real difficulty, the solution is proposed is to use another prefix
than "xn--" (this would help addressing another type of problem).

jfc

At 21:50 05/10/2006, Philippe Verdy wrote:

>From: "Neil Harris" <neil@tonal.clara.co.uk>
> > UTR #36 and UTR #39 have a very detailed treatment of the all the issues
> > involved.
> >
> > Notice that implementing these constraints on a per-label basis has no
> > bearing at all on script-mixing between different labels in a FQDN,
> > which is not a security problem, and that nothing in the above policy
> > need stop labels from any of a number of different individual character
> > sets from being issued in the same zone, providing care is taken to
> > block or bundle possible collisions.
> >
> > Politics shouldn't be the issue here: individual domain operators and
> > their users should all have a common interest in preventing homograph
> > attacks, and these techniques can work effectively regardless of
> > political issues.
>
>One problem of this RFC is that the current format for the database
>of confusables supported as equivalents by a registry is NOT
>integrated in the DNS so that it can scale widely.
>
>I would better expect a format that can be integrated completely as
>DNS records, possibly with a new DNS record type, simple to parse,
>and where each DNS server may cache reliably by a reference to a
>authoritative DNS server maintained by the registry (or the domain
>administrator if this is in a private domain).
>
>Such files do not address the need in local subdomains, and having a
>single file per language will not resolve the issues regarding
>security and ease of deployment.
>
>Note that even if a TLD registry does not support IDN, support for
>IDN labels may be present (wanted, needed) within a subdomain for
>various things such as user names, product names, book titles...
>used as labels within a private subdomain.
>
>If every registry (or domain name authority) can specify its own
>rules regarding acceptable characters and their IDN-canonical
>equivalents, things would be simpler. The RFC just needs to address
>the required features in the IDN implementation, i.e. the implicit
>(non negociable) support for Unicode canonical equivalents (from
>which it is NOT necessary to specify the list of all possible equivalents).
>
>Each TLD registry or each subdomain authority should provide a
>default set of rules that will be applied by default in all
>subdomains, unless one of the registered domain contains a record
>referencing another rule set (which should be another domain name
>that specifies the complete set of rules), or records specifying
>overrides (for example, the support of more characters); one of the
>common ruleset should include the one for the default reduced
>ASCII-only subset (i.e. no support for IDN), and this should be
>specified simply by referencing the domain name of the root registry
>(if the root must remain ASCII-only), or a documented domain name
>owned by the authority managing the root (for example
>no-idn.iana.org), or some special subdomain (for example:
>none.idn.arpa), where the data of the rule set is registered.

Next message: Neil Harris: "Re: Unicode and RFC 4690"
Previous message: Philippe Verdy: "Re: Unicode and RFC 4690"
In reply to: Philippe Verdy: "Re: Unicode and RFC 4690"
Next in thread: Neil Harris: "Re: Unicode and RFC 4690"
Reply: Neil Harris: "Re: Unicode and RFC 4690"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Oct 05 2006 - 18:27:23 CST