Re: Unicode abuse

From: Erik van der Poel (erik@vanderpoel.org)
Date: Sun Mar 06 2005 - 18:32:40 CST

Next message: Erik van der Poel: "Re: Unicode abuse"

Previous message: Doug Ewell: "Re: Unicode abuse"
In reply to: Doug Ewell: "Re: Unicode abuse"
Next in thread: Erik van der Poel: "Re: Unicode abuse"
Reply: Erik van der Poel: "Re: Unicode abuse"
Reply: Doug Ewell: "Re: Unicode abuse"
Reply: Mark Davis: "Re: Unicode abuse"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Doug Ewell wrote:
> Erik van der Poel <erik at vanderpoel dot org> wrote:
>
>>I would have to agree that this is not a huge problem, but it is a
>>pity that the current version of Nameprep allows domain names to be
>>stored in other formats (e.g. HTML) with various unnecessary
>>characters coming from hither and yon in this vast Unicode space.
>
> Nameprep is a process by which characters are normalized, case-folded,
> thrown away, and so forth. What control would it have over whether
> domain names are stored in HTML or any other format?

Hi Doug,

Maybe I should show a piece of HTML with an IDN (Internationalized
Domain Name):

<a href="http://www.paypаl.com/

This snippet of HTML is from:

http://secunia.com/multiple_browsers_idn_spoofing_test/

As you can see, it is possible to use HTML's numeric character
references inside domain names, and they work. Likewise, it would be
*possible* to use ℂ (double-struck C) even though that just maps
to regular small 'c' in Nameprep.

Nameprep itself does not control whether domain names are stored in
HTML. But the fact is that domain names *do* appear in HTML, and it is
*possible* to have unnecessary characters like double-struck C in domain
names in HTML. It may not be likely, but that's not my point. My point
is that it shouldn't even be possible. Why do we even want to allow such
garbage in HTML? And it's not HTML's fault, it's Nameprep's. If Nameprep
had chosen to filter double-struck C out *before* performing Unicode's
Normalization Form KC, we wouldn't have this "problem" (which, again, is
not a huge problem). Just kinda yucky. Highly subjective.

Erik

Next message: Erik van der Poel: "Re: Unicode abuse"
Previous message: Doug Ewell: "Re: Unicode abuse"
In reply to: Doug Ewell: "Re: Unicode abuse"
Next in thread: Erik van der Poel: "Re: Unicode abuse"
Reply: Erik van der Poel: "Re: Unicode abuse"
Reply: Doug Ewell: "Re: Unicode abuse"
Reply: Mark Davis: "Re: Unicode abuse"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Mar 06 2005 - 18:34:17 CST