RE: OT Encoding query

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Mon Oct 22 2001 - 05:58:12 EDT


Edward Cherlin wrote:
> Can anyone tell me what encoding this is? It comes from an
> allegedly Chinese page at Wizards of the Coast,
> http://www.wizards.com/international/main.asp?x=welcome,3,zh&r
> egion=worldwide
> with the unfortunate heading
>
> charset=ISO-8859-1
>
> None of my browser character settings in either IE or Mozilla
> turn it into anything readable. I have tried Big 5, EUC-TW,
> GBK, GB2312, GB18030, and HZ.
>
> "¶Ó­·ÃÎÊ Íþ ÊÀ ÖÇ
> [...]

It looks like GB-2312 misinterpreted as Windows CP-1252 mislabeled as
ISO-8859-1...

What happened is that all bytes 0x80..0x9F (which are graphic characters in
CP-1252) have been best-fit converted to their nearest ISO-8859-1
equivalents.

For instance, the initial 0x22 (") originally must have been 0x93 or 0x94 (“
or ” = open double quotes or closed double quotes), both of which generate
valid GB-2312 double-byte characters, when coupled to the following 0xB6
(¶).

> "A knot! Oh, do let me help to undo it."
> Alice in Wonderland

Yeah, now you must hire Mad Hatter to undo that DBCS mess.

_ Marco



This archive was generated by hypermail 2.1.2 : Mon Oct 22 2001 - 07:02:15 EDT