Re: Unicode abuse

From: Erik van der Poel (erik@vanderpoel.org)
Date: Sat Mar 05 2005 - 17:14:10 CST

  • Next message: Peter Constable: "RE: Unicode's Mandate"

    Doug Ewell wrote:
    > This brings up the topic of "Unicode abuse" in general. Conformance to
    > the Unicode Standard (see DUTR #33, of which Asmus is a co-author)
    > generally refers to support for and adherence to the "letter of the
    > law," things like implementing normalization and casing correctly. It's
    > not quite so easy to quantify adherence to the "spirit of the law," in
    > terms of things like abusing math characters and compatibility
    > characters, or using directional overrides where they don't harm
    > anything and aren't invalid, but also aren't necessary or appropriate.
    >
    > This almost falls into the same category as spoofing, which is being
    > addressed in a different UTR, but seems different somehow.

    It's funny that you should mention that today. Why, just yesterday, I
    wrote this new section:

    http://nameprep.org/#map-norm

    This may be somewhat subjective, but to me, it seems unnecessary and
    inappropriate to allow U+2102 DOUBLE-STRUCK CAPITAL C = the set of
    complex numbers, in HTML links.

    This is indeed different from spoofing, but if Nameprep continues to
    allow this type of character in pre-mapped IDNs, we may well see the
    proliferation of yet another type of "garbage" on the Web.

    IDN spoofing is done using characters of Stringprep category AO, while
    unnecessary and inappropriate IDN characters are of category MN:

    7.1 Categories of code points

        Each code point in a repertoire named by a profile of stringprep can
        be categorized by how it acts in the process described in earlier
        sections of this document:

           AO Code points that can be in the output

           MN Code points that cannot be in the output because they
                   never appear as output from mapping or normalization

           D Code points that cannot be in the output because they are
                   disallowed in the prohibition step

           U Unassigned code points

    Cheers,

    Erik



    This archive was generated by hypermail 2.1.5 : Sat Mar 05 2005 - 17:14:57 CST