From: Erik van der Poel (erik@vanderpoel.org)
Date: Sat Mar 05 2005 - 17:14:10 CST
Doug Ewell wrote:
> This brings up the topic of "Unicode abuse" in general. Conformance to
> the Unicode Standard (see DUTR #33, of which Asmus is a co-author)
> generally refers to support for and adherence to the "letter of the
> law," things like implementing normalization and casing correctly. It's
> not quite so easy to quantify adherence to the "spirit of the law," in
> terms of things like abusing math characters and compatibility
> characters, or using directional overrides where they don't harm
> anything and aren't invalid, but also aren't necessary or appropriate.
>
> This almost falls into the same category as spoofing, which is being
> addressed in a different UTR, but seems different somehow.
It's funny that you should mention that today. Why, just yesterday, I
wrote this new section:
This may be somewhat subjective, but to me, it seems unnecessary and
inappropriate to allow U+2102 DOUBLE-STRUCK CAPITAL C = the set of
complex numbers, in HTML links.
This is indeed different from spoofing, but if Nameprep continues to
allow this type of character in pre-mapped IDNs, we may well see the
proliferation of yet another type of "garbage" on the Web.
IDN spoofing is done using characters of Stringprep category AO, while
unnecessary and inappropriate IDN characters are of category MN:
7.1 Categories of code points
Each code point in a repertoire named by a profile of stringprep can
be categorized by how it acts in the process described in earlier
sections of this document:
AO Code points that can be in the output
MN Code points that cannot be in the output because they
never appear as output from mapping or normalization
D Code points that cannot be in the output because they are
disallowed in the prohibition step
U Unassigned code points
Cheers,
Erik
This archive was generated by hypermail 2.1.5 : Sat Mar 05 2005 - 17:14:57 CST