Re: Mapping between Windows codepage and ISO codepage

From: R.C. Bakhuizen van den Brink (dziewon@xs4all.nl)
Date: Tue Jun 30 1998 - 23:54:34 EDT


> Victor Tse wrote:
>
>> On Windows, there are cp1252, cp1250, cp1251 and etc. On UNIX, there are
>> 8859-1,9.
>> I know that cp1252 is corresponds to 8859-1. Are they exactly the same
>> code point by code point?

>No, CP1252 is a superset of 8859-1. In 1252, the "C1" range (0x80-9f)
>contains "graphic" characters, while 8859-1's C1 is only control
>characters. Other than the C1 range, 1252 and 8859-1 are the same (as
>far as I know).

CHARSET-NAME=ISO 8859-1 (Latin-1, Western Europe)
CHARSET-NAME-GERMAN=ISO 8859-1 (Lateinisch 1, Westeuropa)
CODEPAGE-NUMBER=819
EXPLANATION=Suited for (at least) Danish, Dutch, English, Faeroese,
EXPLANATION=Finnish, French, German, Icelandic, Irish, Italian,
EXPLANATION=Norwegian, Portuguese, Spanish and Swedish.
#
# Characters 20-7F are identical to ASCII (ISO 646)
# Characters 80-9F are unassigned
# Characters A0-FF are identical to the Unicode characters.
UNICODE-MAP=
# 0 1 2 3 4 5 6 7 8 9 A B C D E F
# ==============================================================================
A0: A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
B0: B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
C0: C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
D0: D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF
E0: E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF
F0: F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF

CHARSET-NAME=MS Windows Codepage 1252 (ANSI)
CHARSET-NAME-GERMAN=MS Windows Codeseite 1252 (ANSI)
CODEPAGE-NUMBER=1252
EXPLANATION=Same as ISO 8859-1, except quotes etc. that have been added
EXPLANATION=in the range 80-9F which is unused in ISO 8859-1
#
# Characters 20-7F are identical to ASCII (ISO 646)
# Characters 80-9F contain quotes and other special characters
# Characters A0-FF are identical to the Unicode or ISO 8859-1 characters.
UNICODE-MAP=
# 0 1 2 3 4 5 6 7 8 9 A B C D E F
# ==============================================================================
80: * * 201A 192 201E 2026 2020 2021 2C6 2030 160 2039 152 * * *
90: * 2018 2019 201C 201D 2219 2013 2014 2DC 2122 161 203A 153 * * 178
A0: A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
B0: B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
C0: C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
D0: D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF
E0: E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF
F0: F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF

>> What about the other? Can you tell me their relationship?

>1250 -- 8859-2
>1253 -- 8859-7
>1254 -- 8859-9

You must be kidding, there is a hell of a difference between 1250 and 8859-2 !
Probably between the other as well

CHARSET-NAME=ISO 8859-2 (Latin-2, Eastern Europe)
CHARSET-NAME-GERMAN=ISO 8859-2 (Lateinisch 2, Osteuropa)
CODEPAGE-NUMBER=912
EXPLANATION=Suited for (at least) Albanian, Czech, Hungarian, Polish,
EXPLANATION=Rumanian, (Serbo-)Croatian, Slovak and Slovene.
#
# Characters 20-7F are identical to ASCII (ISO 646)
# Characters 80-9F are unassigned
UNICODE-MAP=
# 0 1 2 3 4 5 6 7 8 9 A B C D E F
# ==============================================================================
A0: A0 104 2D8 141 A4 13D 15A A7 A8 160 15E 164 179 AD 17D 17B
B0: B0 105 2DB 142 B4 13E 15B 2C7 B8 161 15F 165 17A 2DD 17E 17C
C0: 154 C1 C2 102 C4 139 106 C7 10C C9 118 CB 11A CD CE 10E
D0: 110 143 147 D3 D4 150 D6 D7 158 16E DA 170 DC DD 162 DF
E0: 155 E1 E2 103 E4 13A 107 E7 10D E9 119 EB 11B ED EE 10F
F0: 111 144 148 F3 F4 151 F6 F7 159 16F FA 171 FC FD 163 2D9

CHARSET-NAME=MS Windows Codepage 1250 (Eastern Europe)
CHARSET-NAME-GERMAN=MS Windows Codeseite 1250 (Osteuropa)
CODEPAGE-NUMBER=1250
EXPLANATION=looks like modified Version of ISO 8859-2
#
# Characters 20-7F are identical to ASCII (ISO 646)
UNICODE-MAP=
# 0 1 2 3 4 5 6 7 8 9 A B C D E F
# ==============================================================================
80: * * 201A * 201E 2026 2020 2021 * 2030 160 2039 15A 164 17D 179
90: * 2018 2019 201C 201D 2219 2013 2014 * 2122 161 203A 15B 165 17E 17A
A0: A0 2C7 2D8 141 A4 104 A6 A7 A8 A9 15E AB AC AD AE 17B
B0: B0 B1 2DB 142 B4 B5 B6 B7 B8 105 15F BB 13E 2DD 13D 17C
C0: 154 C1 C2 102 C4 139 106 C7 10C C9 118 CB 11A CD CE 10E
D0: 110 143 147 D3 D4 150 D6 D7 158 16E DA 170 DC DD 162 DF
E0: 155 E1 E2 103 E4 13A 107 E7 10D E9 119 EB 11B ED EE 10F
F0: 111 144 148 F3 F4 151 F6 F7 159 16F FA 171 FC FD 163 2D9

>and so on. Look at the Unicode book, or Unicode Web site.

CHARSET-NAME=ISO 8859-5 (Latin/Cyrillic)
CHARSET-NAME-GERMAN=ISO 8859-5 (Lateinisch/Kyrillisch)
CODEPAGE-NUMBER=915
EXPLANATION=Suited for Bulgarian, Bielorussian, English, Macedonian,
EXPLANATION=Russian, Serbocroatian and Ukrainian.
#
# Characters 20-7F are identical to ASCII (ISO 646)
# Characters 80-9F are unassigned
UNICODE-MAP=
# 0 1 2 3 4 5 6 7 8 9 A B C D E F
# ==============================================================================
A0: A0 401 402 403 404 405 406 407 408 409 40A 40B 40C AD 40E 40F
B0: 410 411 412 413 414 415 416 417 418 419 41A 41B 41C 41D 41E 41F
C0: 420 421 422 423 424 425 426 427 428 429 42A 42B 42C 42D 42E 42F
D0: 430 431 432 433 434 435 436 437 438 439 43A 43B 43C 43D 43E 43F
E0: 440 441 442 443 444 445 446 447 448 449 44A 44B 44C 44D 44E 44F
F0: 2116 451 452 453 454 455 456 457 458 459 45A 45B 45C A7 45E 45F

CHARSET-NAME=MS Windows Codepage 1251 (Cyrillic)
CHARSET-NAME-GERMAN=MS Windows Codeseite 1251 (Kyrillisch)
CODEPAGE-NUMBER=1251
#
# Characters 20-7F are identical to ASCII (ISO 646)
UNICODE-MAP=
# 0 1 2 3 4 5 6 7 8 9 A B C D E F
# ==============================================================================
80: 402 403 201A 453 201E 2026 2020 2021 * 2030 409 2039 40A 40C 40B 40F
90: 452 2018 2019 201C 201D 2219 2013 2014 * 2122 459 203A 45A 45C 45B 45F
A0: A0 40E 45E 408 A4 490 A6 A7 401 A9 404 AB AC AD AE 407
B0: B0 B1 406 456 491 B5 B6 B7 451 2116 454 BB 458 405 455 457
C0: 410 411 412 413 414 415 416 417 418 419 41A 41B 41C 41D 41E 41F
D0: 420 421 422 423 424 425 426 427 428 429 42A 42B 42C 42D 42E 42F
E0: 430 431 432 433 434 435 436 437 438 439 43A 43B 43C 43D 43E 43F
F0: 440 441 442 443 444 445 446 447 448 449 44A 44B 44C 44D 44E 44F

>> Any insight on why Windows do not use the ISO charset standard and
>> invent their own charset?

>Ha, you must be joking.

>Erik



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT