RE: How-To handle i18n when you don't know charset?

From: Jonathan Rosenne (
Date: Fri Jul 07 2000 - 12:44:01 EDT

Unfortunately, there are many Hebrew pages wrongly marked as 8859-1, and many more unmarked. So letting the user override the charset specification is necessary. I was told similar situations are known in Russia and Greece.


> -----Original Message-----
> From: Antoine Leca []
> Sent: Friday, July 07, 2000 2:06 PM
> To: Unicode List
> Subject: Re: How-To handle i18n when you don't know charset?
> Michael Kaplan wrote:
> >
> > > My experimentation indicated that if the user did not have
> their browser
> > > set to auto-select encoding, or if they manually overrode the encoding
> > > selection, the form data would be sent in whatever they had chosen,
> > > regardless of what charset may be in the <meta
> http-equiv="Content-Type"
> > > ...> in the HTML document head.
> >
> > My general feeling of people who specifically change settings
> so that the
> > text was rendered properly and then they specificically changed it is as
> > follows:
> >
> My own experimentations (and large practice, *UNFORTUNATELY*), is that
> to have to manually specifying the encoding is a hack, being there to
> avoid the initial overview of authoring software that does not enforce
> an uniform *and* practical encoding scheme (either "all should be
> Unicode",
> or "the day you use something outside ASCII, it should be tagged").
> Problem is worse in some cases (mainly Cyrillic), because a number of
> charsets are equaly in common use, mainly for historical reasons.
> And the behaviour of Microsoft in this area is not necessary of help...
> Now, most of the time I run with "default" on. Sometimes, I need
> to change.
> And when I change, I let it in the changed position (Yes, I'm quite lazy),
> unless there is a nuisance. So quite a time, I am running in "changed"
> position...
> > The GIGO (garbage in, garbage out ) philospophy is the best way
> to go here,
> > IMHO. How much more can you do other than provide a java applet
> that will
> > hav a big hand come out of the screen and slap them silly?
> And *I* would be quite upset if, when I answer in French (using French
> accents) in an application that only proposes English as UI and asks for
> e.g. my profession, so I would be upset if the application:
> - either refuse to handle my accentuated profession
> - or, perhaps worse, misinterprets it because the server-side
> insists on using
> his charset instead of whatever character I really need.
> But this is what happens every day, because the (U.S. based) programmer is
> expecting everyone to use ASCII, of course. Here we cannot
> distinguish GIGO
> for lazyness or plain ignorance.
> Now you take the case of my friend M. Lebœuf, whom name includes a
> character not easily available in common charsets, trying to answer such
> a form included in a iso-8859-1 html page... I am not sure he will
> appreciate to see his name considered as garbage...
> Antoine

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT