RE: How-To handle i18n when you don't know charset?

From: Jonathan Rosenne (
Date: Fri Jul 07 2000 - 12:44:01 EDT

Unfortunately, there are many Hebrew pages wrongly marked as 8859-1, and many more unmarked. So letting the user override the charset specification is necessary. I was told similar situations are known in Russia and Greece.


> >
> My own experimentations (and large practice, *UNFORTUNATELY*), is that
> to have to manually specifying the encoding is a hack, being there to
> avoid the initial overview of authoring software that does not enforce
> an uniform *and* practical encoding scheme (either "all should be
> Unicode",
> or "the day you use something outside ASCII, it should be tagged").
> Problem is worse in some cases (mainly Cyrillic), because a number of
> charsets are equaly in common use, mainly for historical reasons.
> And the behaviour of Microsoft in this area is not necessary of help...
> Now, most of the time I run with "default" on. Sometimes, I need
> to change.
> And when I change, I let it in the changed position (Yes, I'm quite lazy),
> unless there is a nuisance. So quite a time, I am running in "changed"
> position...
> > The GIGO (garbage in, garbage out ) philospophy is the best way
> to go here,
> > IMHO. How much more can you do other than provide a java applet
> that will
> > hav a big hand come out of the screen and slap them silly?
> And *I* would be quite upset if, when I answer in French (using French
> accents) in an application that only proposes English as UI and asks for
> e.g. my profession, so I would be upset if the application:
> - either refuse to handle my accentuated profession
> - or, perhaps worse, misinterprets it because the server-side
> insists on using
> his charset instead of whatever character I really need.
> But this is what happens every day, because the (U.S. based) programmer is
> expecting everyone to use ASCII, of course. Here we cannot
> distinguish GIGO
> for lazyness or plain ignorance.
> Now you take the case of my friend M. Lebœuf, whom name includes a
> character not easily available in common charsets, trying to answer such
> a form included in a iso-8859-1 html page... I am not sure he will
> appreciate to see his name considered as garbage...
> Antoine

