Re: HTML forms and UTF-8

From: Erik van der Poel (
Date: Sun Nov 07 1999 - 19:41:45 EST

"François Yergeau" wrote:
> The traditional way that forms work is that the data is returned in the same
> encoding as the page containing the form. This kind of works when there is
> a single page in a single encoding (no transcoding proxy, for instance)
> handled by a single CGI script.

I'm just curious, but how often do people actually use transcoding
proxies? Do these proxies automatically update the HTTP and HTML META
charsets? HTTP charsets are very rare, but HTML META charsets are not so

> But it breaks in many cases and does not
> allow multilingual content (except when the page is in Unicode, of course).

Multilingual content is rare. Glen, do you need multilingual content?

> People usually deal with that by using a hidden field in the form (<INPUT
> TYPE="hidden">). You can put some text in there that will not be shown to
> the user but that will be returned as part of the form data. By looking at
> the bytes of that text, your CGI can determine its encoding (knowing the
> characters in advance) and that is also the encoding of the rest of the
> data.

In theory, if you can reliably label the charset of the HTML document
containing the form (via HTTP charset and HTML META charset), then the
form submission should be in that charset too. You can then simply
insert that charset label in the hidden input field too, and look at
that when the form submission arrives.

However, perhaps people have found that HTTP charset and HTML META
charset do not work with certain browser versions, and have therefore
come up with the hack mentioned above(?).

Some versions of Netscape do not have a useful default font for use with
documents in the Unicode-based charsets (utf-8, etc). Even if the user
has set a font for Unicode, it could be an ugly font. So it might be
better to send the form in a traditional charset (such as Shift_JIS,
Big5, etc), so that a more beautiful font is likely to be used on the
user's side. You can then convert the form submission to UTF-8 on the
server side.

This doesn't sound good, I know...


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT