RE: Allow to type URL in Unicode

From: Addison Phillips [wM] (aphillips@webmethods.com)
Date: Wed Jan 23 2002 - 13:37:17 EST


Hi Eric,

Domain names are a work in progress. Currently they are restricted to a subset of ASCII. There is a working group within the IETF working on a solution to this which happens to use Unicode.

As for the rest of a URL, browsers and webservers have long supported non-ASCII URLs (via %-encoding of the hex values of the underlying bytes/octets). Because the URL contains no character encoding information, the browser and server must have agreed in advance on an encoding, or the hex values will be misinterpreted by the server and (most likely) a 404 error returned.

The W3C I18N group has (strongly) recommended that browsers and servers always use the UTF-8 encoding of Unicode in URLs. Internet Explorer 5.0 and later automatically encode the characters you type into a URL using the UTF-8 encoding (except in Japan, Korea, and Taiwan, where the option is turned off by default and users may turn it on) up to the "?". So you if type a non-ASCII string into IE5's URL entry box, the server will receive a percent encoded UTF-8 representation of that string.

This still leaves some issues to do with normalization, path parsing, the query bits after the "?", and so on. But you can make things work in most cases with a little care.

Best Regards,

Addison

Addison P. Phillips
Globalization Architect / Manager, Globalization Engineering
webMethods, Inc. | The Business Integration Company
432 Lakeside Drive, Sunnyvale, California, USA
+1 408.962.5487 (phone) +1 408.210.3569 (mobile)
-------------------------------------------------
Internationalization is an architecture. It is not a feature.

-----Original Message-----
From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
Behalf Of Eric Lannaud
Sent: Wednesday, January 23, 2002 9:29 AM
To: Unicode List
Subject: Allow to type URL in Unicode

Hi,

Some one kwon if it's possible to type an URL
(http://xxxxxxx.xxxxx.xxxx/yyy/yyyy/yyy.yyy) in a browser not in Ascii
caracters but in another caracters (unicode)?

Of course there many implications: Browser, Domain names servers, web
server..... May be exist yet a working group about this subject?

Many thanks
Eric Lannaud



This archive was generated by hypermail 2.1.2 : Wed Jan 23 2002 - 13:09:21 EST