RE: How-To handle i18n when you don't know charset?

From: Mike Brown (mbrown@corp.webb.net)
Date: Fri Jul 07 2000 - 19:40:45 EDT


11digitboy wrote, in a gratuitously quoted contribution:
> > > > Now you take the case of my friend M. LebÅ"uf,
> [...]
> Over here, his name looks like garbage.
> What is that? Ell ee bee something something you
> eff.

The message headers on his email included:

Content-Type: text/plain; charset=UTF-8

Apparently your email reading software decided to ignore this and it showed
you the bytes of the message interpreted as ISO-8859-1, windows-1252,
MacRoman, or some such single-byte encoding. The "something something" is,
at a lower level, 2 bytes that, in UTF-8, mean the single character known in
Unicode as LATIN SMALL LIGATURE OE, or U+0153. It should look like "o"
joined to "e". In windows-1252 it's at 0x9C. In MacRoman, 0xCF.

Interestingly, your own message's headers included this humorous line:

X-Bloated-Content-Warning: Quotational content of 85% far exceeds
recommended daily dosage

   - Mike
____________________________________________________________________
Mike J. Brown, software engineer at My XML/XSL resources:
webb.net in Denver, Colorado, USA http://www.skew.org/xml/



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT