Brendan Murray commented:
> Ken Whistler claims that the IANA character set
> "ISO-8859-1-Windows-3.1-Latin-1" is identical to CP1252.
> To the best of my knowledge, this was true only up to Windows 3.11, when MS
> added extra characters to the C1 range. In fact, in Win 2 and 3.1, the only
> characters available in that range were 0x91 and 0x92 (left & right single
> quotation mark). However, with the recent addition of the Euro at 0x80, as
> well as the other additions as of Win 3.11, CP1252 has deviated to the
> extent that it's no longer what was registered.
The "ISO-8859-1-Windows-3.1-Latin-1" registration (MIBenum 2001) is identical
to the mapping that Markus Kuhn was citing for Code Page 1252 in
the C1 range. I.e., it is exactly what Brendan is citing for Win 3.11,
regardless of what HP called it in its standard.
The "ISO-8859-1-Windows-3.0-Latin-1" registration (MIBenum 2000) is
what Brendan is citing for "Win 2 and 3.1", i.e. only with 0x91 and
0x92 for left & right single quotation mark. That was also registered
based on the Hewlett-Packard standard.
The addition of the Euro at 0x80 raises a separate but interesting
issue for the IANA charset registry. If addition of the Euro at 0x80
invalidates the identity of a charset--which I agree that technically
it does (since it changes for one octet the way the charset it
converted to characters, thereby creating a new MIME entity), then
not only does CP1252 diverge (by one code point) from
ISO-8859-1-windows-3.1-Latin-1, but also for each
of the other Windows code pages getting the Euro (probably all of
them), those code pages then diverge from the IANA charset registry
in the same way.
Thus "windows-1250" in the IANA charset registry no longer matches
Windows Code Page 1250 with the EURO added at 0x80, and so on for
1251, 1253, 1254, ...
What an unholy mess these MIME charsets are!
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT