Re: Unicode abuse

From: Doug Ewell (dewell@adelphia.net)
Date: Sun Mar 06 2005 - 21:02:49 CST

Next message: Doug Ewell: "Re: Unicode abuse"

Previous message: Patrick Andries: "Re: Egyptian Transliteration Characters?"
In reply to: Erik van der Poel: "Re: Unicode abuse"
Next in thread: Doug Ewell: "Re: Unicode abuse"
Reply: Doug Ewell: "Re: Unicode abuse"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Erik van der Poel <erik at vanderpoel dot org> wrote:

> Maybe I should show a piece of HTML with an IDN (Internationalized
> Domain Name):
>
> <a href="http://www.paypаl.com/2

I'm aware of the Paypаl spoof.

When I click on that link from your message, it tries to send me to
http://www.payp&/#1072;l.com/ote the slash that was added after the
ampersand. Of course, the URL was not found. (This is with Microsoft
user agents, so YMMV.)

> As you can see, it is possible to use HTML's numeric character
> references inside domain names, and they work. Likewise, it would be
> *possible* to use ℂ (double-struck C) even though that just
> maps to regular small 'c' in Nameprep.

So, are you saying that user agents should not interpret numeric
entities when resolving domain names, or that they are doing so in the
wrong order?

> Nameprep itself does not control whether domain names are stored in
> HTML. But the fact is that domain names *do* appear in HTML, and it is
> *possible* to have unnecessary characters like double-struck C in
> domain names in HTML. It may not be likely, but that's not my point.
> My point is that it shouldn't even be possible. Why do we even want to
> allow such garbage in HTML?

You can store any old character you like in HTML. It would be very
strange if this were not so.

> And it's not HTML's fault, it's Nameprep's. If Nameprep had chosen to
> filter double-struck C out *before* performing Unicode's Normalization
> Form KC, we wouldn't have this "problem" (which, again, is not a huge
> problem). Just kinda yucky. Highly subjective.

Isn't that why "additional folding," like the mapping from ℂ to c in
stringprep, exists? To remove some of the yuckiness? I'm completely
lost now.

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Next message: Doug Ewell: "Re: Unicode abuse"
Previous message: Patrick Andries: "Re: Egyptian Transliteration Characters?"
In reply to: Erik van der Poel: "Re: Unicode abuse"
Next in thread: Doug Ewell: "Re: Unicode abuse"
Reply: Doug Ewell: "Re: Unicode abuse"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Mar 06 2005 - 21:10:00 CST