Re: Mapping between Windows codepage and ISO codepage

From: Erik van der Poel (erik@netscape.com)
Date: Wed Jul 01 1998 - 00:35:41 EDT


Victor Tse wrote:
>
> On Windows, there are cp1252, cp1250, cp1251 and etc. On UNIX, there are
> 8859-1,9.
> I know that cp1252 is corresponds to 8859-1. Are they exactly the same
> code point by code point?

No, CP1252 is a superset of 8859-1. In 1252, the "C1" range (0x80-9f)
contains "graphic" characters, while 8859-1's C1 is only control
characters. Other than the C1 range, 1252 and 8859-1 are the same (as
far as I know).

> What about the other? Can you tell me their relationship?

1250 -- 8859-2
1253 -- 8859-7
1254 -- 8859-9

and so on. Look at the Unicode book, or Unicode Web site.

> Is cp1251 corresponds to 8859-5? I see that the encoding are very
> different between cp1251 and 8859-5.
> A conversion seen to be needed when a Windows(Russian) client talks to
> UNIX(Russian) server.

Yup, they are both for Cyrillic, but they are different, and conversion
is required.

> A mapping table between UNIX locale identifier (such as en_us) and the
> corresponding codepage used(such as 8859-1)
> will also be very helpful. Any pointer to literature that have those
> information?

Take a look at cmd/xfe/intl/*.lm in the Mozilla source:

ftp://ftp.mozilla.org/pub/mozilla/source/mozilla-19980603.tar.gz

> Any insight on why Windows do not use the ISO charset standard and
> invent their own charset?

Ha, you must be joking.

Erik



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT