Re: Unicode abuse

From: Rick McGowan (
Date: Wed Mar 09 2005 - 20:19:44 CST

  • Next message: Christopher Fynn: "Re: Encoded rendering instructions (was Unicode's Mandate)"

    A number of mail messages from Mark Davis were not distributed by this
    server due to a configuration problem. Attached below is a copy of one such


    --- Below this line is a copy of the message.

    From: "Mark Davis" <>
    To: "Erik van der Poel" <>,
                    "Doug Ewell" <>
    Cc: "Unicode Mailing List" <>
    Subject: Re: Unicode abuse
    Date: Sun, 6 Mar 2005 10:58:22 -0800

    I don't view this as a problem, if user-agents take the simple precaution of
    displaying IDNs in post-nameprep form, which they really want to do for
    other reasons.


    ----- Original Message -----
    From: "Erik van der Poel" <>
    To: "Doug Ewell" <>
    Cc: "Unicode Mailing List" <>
    Sent: Saturday, March 05, 2005 15:14
    Subject: Re: Unicode abuse

    > Doug Ewell wrote:
    > > This brings up the topic of "Unicode abuse" in general.  Conformance to
    > > the Unicode Standard (see DUTR #33, of which Asmus is a co-author)
    > > generally refers to support for and adherence to the "letter of the
    > > law," things like implementing normalization and casing correctly.  It's
    > > not quite so easy to quantify adherence to the "spirit of the law," in
    > > terms of things like abusing math characters and compatibility
    > > characters, or using directional overrides where they don't harm
    > > anything and aren't invalid, but also aren't necessary or appropriate.
    > >
    > > This almost falls into the same category as spoofing, which is being
    > > addressed in a different UTR, but seems different somehow.
    > It's funny that you should mention that today. Why, just yesterday, I
    > wrote this new section:
    > This may be somewhat subjective, but to me, it seems unnecessary and
    > inappropriate to allow U+2102 DOUBLE-STRUCK CAPITAL C = the set of
    > complex numbers, in HTML links.
    > This is indeed different from spoofing, but if Nameprep continues to
    > allow this type of character in pre-mapped IDNs, we may well see the
    > proliferation of yet another type of "garbage" on the Web.
    > IDN spoofing is done using characters of Stringprep category AO, while
    > unnecessary and inappropriate IDN characters are of category MN:
    > 7.1 Categories of code points
    >     Each code point in a repertoire named by a profile of stringprep can
    >     be categorized by how it acts in the process described in earlier
    >     sections of this document:
    >        AO      Code points that can be in the output
    >        MN      Code points that cannot be in the output because they
    >                never appear as output from mapping or normalization
    >        D       Code points that cannot be in the output because they are
    >                disallowed in the prohibition step
    >        U       Unassigned code points
    > Cheers,
    > Erik

    This archive was generated by hypermail 2.1.5 : Wed Mar 09 2005 - 20:20:16 CST