RE: Unicode in a URL

From: addison@inter-locale.com
Date: Fri Apr 27 2001 - 12:38:14 EDT


Hi Paul,

Except that the "+" in U+xxxx notation already has a URL meaning...

It's also interesting to note that many web servers try to match the
decoded *bytes* to the bytes in the file names on the local file
system. IOW, you're right, the W3C "standard" is haphazardly implemented
and then often breaks at the next level. For one project last year I wrote
an Apache module to intercept the URL and decode it for Apache.

One thing to consider is that, provided you control the website's content
absolutely, you can forbid non-ASCII filenames. Then only data (the
after-the-question-mark stuff) has to be interpreted and that data is
going to be interpreted by your own code instead of the web server's, so
you're in a much better position there.

It's not friendly, but until implementers catch up (and they won't unless
customers call and complain), it is a reliable solution. ::sigh::

Addison

Addison P. Phillips
Globalization Architect
webMethods, Inc.



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT