Victor Tse wrote:
> On Windows, there are cp1252, cp1250, cp1251 and etc. On UNIX, there are
> I know that cp1252 is corresponds to 8859-1. Are they exactly the same
> code point by code point?
No, CP1252 is a superset of 8859-1. In 1252, the "C1" range (0x80-9f)
contains "graphic" characters, while 8859-1's C1 is only control
characters. Other than the C1 range, 1252 and 8859-1 are the same (as
far as I know).
> What about the other? Can you tell me their relationship?
1250 -- 8859-2
1253 -- 8859-7
1254 -- 8859-9
and so on. Look at the Unicode book, or Unicode Web site.
> Is cp1251 corresponds to 8859-5? I see that the encoding are very
> different between cp1251 and 8859-5.
> A conversion seen to be needed when a Windows(Russian) client talks to
> UNIX(Russian) server.
Yup, they are both for Cyrillic, but they are different, and conversion
> A mapping table between UNIX locale identifier (such as en_us) and the
> corresponding codepage used(such as 8859-1)
> will also be very helpful. Any pointer to literature that have those
Take a look at cmd/xfe/intl/*.lm in the Mozilla source:
> Any insight on why Windows do not use the ISO charset standard and
> invent their own charset?
Ha, you must be joking.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT