In practice however, people usually write pages containing forms in a
specific charset and expect the URL to return in the same charset. The will
escape the chars if not ASCII but in the charset (Which could be UTF-8).
From: Mike Brown [mailto:email@example.com]
Sent: Friday, October 06, 2000 2:45 PM
To: Unicode List
Subject: RE: information request; using unicode in HTML form; urlencoded
> The last rule will clip Unicode charater to an 8-bit
The HTML Recommendation and the IETF RFC for URIs both cover this. Anything
URL-encoded is supposed to be UTF-8 encoded first (see the URI RFC).
However, the HTML Recommendation's section on form data is a little vague
about encoding, especially if you are using a MIME message instead of
URL-encoding. Also, the major browsers will typically submit form data with
the same charset as the HTML document containing the form.
To encourage the browser to send URL-encoded UTF-8 form data, you should
make sure that the HTML document with the form is itself UTF-8 encoded, and
declares itself as such, usually via the appropriate <meta> element. Beyond
that there is still a risk that the user might override the encoding on
their end, but what can you do.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT