In looking at how to implement 2008 (and maintain backward compatibility), we are wrestling with some practical questions that we'd appreciate feedback on.
P1 => P2 => P3 => P4 => DNS
There are a lot of variables here:
Examples: IE6 only handles punycode, and won't do any validity checking. IE7 handles both punycode and Unicode. It checks the punycode, so a valid IDNA2008 IRI with a ZWJ will fail. There are still enough IE6 implementations around that we (and others) need to handle them, and for years to come there will be IE7 implementations around. Not to speak of other browsers, emailers, word processors, etc. that handle URL/IRIs based on IDNA2003.
Note: even if validity checking is done on an IRI, non-registries don't need to include the tests for BIDI or CONTEXT, so there is no guarantee that a punycode form is an A-Label or that a Unicode form is a U-Label.
1. Suppose that P2 is on Unicode 5.1, and the others are on Unicode 6.0. If P2 does a validity check, then it could prevent a perfectly valid IRI from being correctly looked up. To prevent this problem, does that mean that the best practice is for only P4 to do validity checking? Or should the others do some weaker form of validity checking, like skipping a check for UNASSIGNED?
2. Suppose P3 is a non-IDNA aware process, so IRIs should be converted to Punycode by P2 before sending. Should one do a validity check in P2? How do we avoid problem #1 in that case?
3. The current protocol spec appears to only require validity checking when converting to punycode. So when an IRI is already in punycode (which could have been from IDNA2003 application), it might not undergo any checking at all when going from P1 to the DNS; so everything depends on the registry's doing the right thing. Is it best to check anyway, or does that run into problem #1?
4. If P2 accepts an IRI in Unicode and passes it on to P3 in Unicode (never converting to punycode), should it do any validity checking?
5. When a search engine does indexing, it has to map together IRIs that are "equivalent" (resolving to the same logical location). When it provides an IRI to the user for a page, that IRI should go to the indexed page. However, because IDNA2003 and IDNA2008 browsers may go to different places with the same IRI, which do we provide? If we try to test for which browser the user has, that is clumsy and error-prone.