Re: Unicode in a URL

From: Martin Duerst (duerst@w3.org)
Date: Thu Apr 26 2001 - 21:33:00 EDT


At 11:28 01/04/26 -0700, Markus Scherer wrote:
>Paul Deuter wrote:
> > I am wondering if there isn't a need for the Unicode Spec to also
> > dictate a way of encoding Unicode in an ASCII stream. Perhaps
>
>How many more ways to we need?
>
>To be 8-bit-friendly, we have UTF-8.
>To get everything into ASCII characters, we have UTF-7.
>W3C specifies to use %-encoded UTF-8 for URLs.

Unfortunately, there is more.

HTML/XML use &#ddddd; (ddddd is a decimal number) or
&#xhhhhh; (hhhh is hexadecimal). Java has \u.... CSS has \hhhhh.

It would be very nice if there were only one convention everywhere,
but the circumstances (and their history) make that very difficult.

Another issue is that if you start combining things (e.g.
producing XML from Java or Perl,..., it can be much less
confusing if each language has different conventions.

Regards, Martin.



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT