Re: Unicode abuse

From: Doug Ewell (
Date: Sun Mar 06 2005 - 21:02:49 CST

  • Next message: Doug Ewell: "Re: Unicode abuse"

    Erik van der Poel <erik at vanderpoel dot org> wrote:

    > Maybe I should show a piece of HTML with an IDN (Internationalized
    > Domain Name):
    > <a href="http://www.payp&#1072;

    I'm aware of the Paypаl spoof.

    When I click on that link from your message, it tries to send me to
    http://www.payp&/#1072; the slash that was added after the
    ampersand. Of course, the URL was not found. (This is with Microsoft
    user agents, so YMMV.)

    > As you can see, it is possible to use HTML's numeric character
    > references inside domain names, and they work. Likewise, it would be
    > *possible* to use &#x2102; (double-struck C) even though that just
    > maps to regular small 'c' in Nameprep.

    So, are you saying that user agents should not interpret numeric
    entities when resolving domain names, or that they are doing so in the
    wrong order?

    > Nameprep itself does not control whether domain names are stored in
    > HTML. But the fact is that domain names *do* appear in HTML, and it is
    > *possible* to have unnecessary characters like double-struck C in
    > domain names in HTML. It may not be likely, but that's not my point.
    > My point is that it shouldn't even be possible. Why do we even want to
    > allow such garbage in HTML?

    You can store any old character you like in HTML. It would be very
    strange if this were not so.

    > And it's not HTML's fault, it's Nameprep's. If Nameprep had chosen to
    > filter double-struck C out *before* performing Unicode's Normalization
    > Form KC, we wouldn't have this "problem" (which, again, is not a huge
    > problem). Just kinda yucky. Highly subjective.

    Isn't that why "additional folding," like the mapping from ℂ to c in
    stringprep, exists? To remove some of the yuckiness? I'm completely
    lost now.

    -Doug Ewell
     Fullerton, California

    This archive was generated by hypermail 2.1.5 : Sun Mar 06 2005 - 21:10:00 CST