Re: How-To handle i18n when you don't know charset?

From: Antoine Leca (
Date: Fri Jul 07 2000 - 07:11:22 EDT

Michael Kaplan wrote:
> > My experimentation indicated that if the user did not have their browser
> > set to auto-select encoding, or if they manually overrode the encoding
> > selection, the form data would be sent in whatever they had chosen,
> > regardless of what charset may be in the <meta http-equiv="Content-Type"
> > ...> in the HTML document head.
> My general feeling of people who specifically change settings so that the
> text was rendered properly and then they specificically changed it is as
> follows:

My own experimentations (and large practice, *UNFORTUNATELY*), is that
to have to manually specifying the encoding is a hack, being there to
avoid the initial overview of authoring software that does not enforce
an uniform *and* practical encoding scheme (either "all should be Unicode",
or "the day you use something outside ASCII, it should be tagged").

Problem is worse in some cases (mainly Cyrillic), because a number of
charsets are equaly in common use, mainly for historical reasons.
And the behaviour of Microsoft in this area is not necessary of help...

Now, most of the time I run with "default" on. Sometimes, I need to change.
And when I change, I let it in the changed position (Yes, I'm quite lazy),
unless there is a nuisance. So quite a time, I am running in "changed"

> The GIGO (garbage in, garbage out ) philospophy is the best way to go here,
> IMHO. How much more can you do other than provide a java applet that will
> hav a big hand come out of the screen and slap them silly?

And *I* would be quite upset if, when I answer in French (using French
accents) in an application that only proposes English as UI and asks for
e.g. my profession, so I would be upset if the application:
- either refuse to handle my accentuated profession
- or, perhaps worse, misinterprets it because the server-side insists on using
his charset instead of whatever character I really need.

But this is what happens every day, because the (U.S. based) programmer is
expecting everyone to use ASCII, of course. Here we cannot distinguish GIGO
for lazyness or plain ignorance.

Now you take the case of my friend M. Lebœuf, whom name includes a
character not easily available in common charsets, trying to answer such
a form included in a iso-8859-1 html page... I am not sure he will
appreciate to see his name considered as garbage...


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT