HTML forms and UTF-8

From: Glen Perkins (Glen.Perkins@nativeguide.com)
Date: Sun Nov 07 1999 - 16:14:51 EST


What is the best approach for getting data submitted by an HTML form into
Unicode (presumably UTF-8) encoding?

I know how to specify the encoding of a Web (.html) page, via either the
meta tag or directly in the http header, so I can always use a meta tag to
make the form itself UTF-8 encoded. It just occurred to me, though, that I
don't know how to specify the encoding of the data returned by such a form,
or even if such a thing can be specified from the server side at all. Does
form data come back encoded in the encoding of the form page itself? Does it
come back in the default encoding of the client machine regardless of the
encoding of the form itself? Does it come back in an encoding that is
determined somehow by the user agent (client browser application),
independent of the default OS encoding or anything done by the server?

I'd like to be able to roll out forms in any number of languages/scripts and
have the data returned to the same CGI program (perl_mod or whatever) in the
same encoding, UTF-8, or else determine the encoding of the returning data
and convert to UTF-8 immediately as the first step in the CGI/server side
processing program.

If I can specify the desired return encoding, how is it done?

If not, what determines the returned data's encoding and how do I detect it?

In short, what would be the best approach for Unicodifying returned HTML
form data regardless of which localized form it came back from?

Thanks,

Glen Perkins



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT