RE: How-To handle i18n when you don't know charset?

From: Mike Brown (
Date: Thu Jul 06 2000 - 16:15:48 EDT wrote:
> Have all the pages generated include a META-CHARSET tag
> in the HTML Header. This will insure that the browser(s)
> submit form post data in the same encoding as the
> original html page.

and Chris Wendt at wrote:
> Simplest is to use UTF-8 throughout and label your
> <FORM> page with it, you just need to block browsers
> below version 4 or code specially for them.

> Other browsers will return data in the charset of the <FORM>
> page and if you can set the charset of the <FORM> page you
> can also set this field to indicate the charset used to the
> CGI.

My experimentation indicated that if the user did not have their browser set
to auto-select encoding, or if they manually overrode the encoding
selection, the form data would be sent in whatever they had chosen,
regardless of what charset may be in the <meta http-equiv="Content-Type"
...> in the HTML document head. So I don't think it's good practice to rely
on the assumption that the encoding of the form data submission will always
be the same as the encoding of the form itself. In != Out :)

Chris Wendt wrote:
> IE5 and later IE fill a field "_charset_" with the charset
> used for form submission, regardless of the initial value
> of this field.

Whoa, nice! I just tried it with

  <input type="hidden" name="_charset_">

and it works! Thanks!

> IE4 and IE5 will submit characters that do not fit into
> the charset used for form submission as HTML numeric
> character references (&#12345;)

I noticed this. It is an interesting workaround for a hole in the HTML specs
and RFC 2070, but it is also something that has to be specially decoded on
the receiving end.

Thanks again for this info.

   - Mike
Mike J. Brown, software engineer at My XML/XSL resources: in Denver, Colorado, USA

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT