Re: Unicode abuse

From: Erik van der Poel ([email protected])
Date: Sat Mar 05 2005 - 17:14:10 CST

Next message: Peter Constable: "RE: Unicode's Mandate"

Previous message: Jon Hanna: "RE: Unicode's Mandate"
In reply to: Doug Ewell: "Unicode abuse (was: Re: But E0000 Custom Language Tags Are Actually *Required* For Use By Unicode)"
Next in thread: Erik van der Poel: "Re: Unicode abuse"
Maybe reply: Erik van der Poel: "Re: Unicode abuse"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Doug Ewell wrote:
> This brings up the topic of "Unicode abuse" in general. Conformance to
> the Unicode Standard (see DUTR #33, of which Asmus is a co-author)
> generally refers to support for and adherence to the "letter of the
> law," things like implementing normalization and casing correctly. It's
> not quite so easy to quantify adherence to the "spirit of the law," in
> terms of things like abusing math characters and compatibility
> characters, or using directional overrides where they don't harm
> anything and aren't invalid, but also aren't necessary or appropriate.
>
> This almost falls into the same category as spoofing, which is being
> addressed in a different UTR, but seems different somehow.

It's funny that you should mention that today. Why, just yesterday, I
wrote this new section:

http://nameprep.org/#map-norm

This may be somewhat subjective, but to me, it seems unnecessary and
inappropriate to allow U+2102 DOUBLE-STRUCK CAPITAL C = the set of
complex numbers, in HTML links.

This is indeed different from spoofing, but if Nameprep continues to
allow this type of character in pre-mapped IDNs, we may well see the
proliferation of yet another type of "garbage" on the Web.

IDN spoofing is done using characters of Stringprep category AO, while
unnecessary and inappropriate IDN characters are of category MN:

7.1 Categories of code points

    Each code point in a repertoire named by a profile of stringprep can
    be categorized by how it acts in the process described in earlier
    sections of this document:

AO Code points that can be in the output

MN Code points that cannot be in the output because they
never appear as output from mapping or normalization

D Code points that cannot be in the output because they are
disallowed in the prohibition step

U Unassigned code points

Cheers,

Erik

Next message: Peter Constable: "RE: Unicode's Mandate"
Previous message: Jon Hanna: "RE: Unicode's Mandate"
In reply to: Doug Ewell: "Unicode abuse (was: Re: But E0000 Custom Language Tags Are Actually *Required* For Use By Unicode)"
Next in thread: Erik van der Poel: "Re: Unicode abuse"
Maybe reply: Erik van der Poel: "Re: Unicode abuse"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Mar 05 2005 - 17:14:57 CST