Character encoding mappings and related files

  Date of last update: 1999-10-08

  Revision history:

    1. 1999-10-07: created for Unicode3.0

    2. 1999-10-08: editorial changes

This file consists of tables with links to mapping data files available. For the most current information please refer to the Unicode ftp site for mapping data (ftp://ftp.unicode.org/Public/MAPPINGS/).

1. ASCII based

1.1 Unicode, ISO, ISO/IEC, ...

iso8859/readme.txt 

Character encoding

Mapping to Unicode/UTF-16

Date of last update

Remark

UTF-16

Identity

 

UCS-2 extended to planes 0-16.

UTF-8

Given by algorithm (normative)

 

In Unicode UTF-8 is limited to planes 0-16.

SCSU

Given by algorithm (UTR 6)

 

Standard Compression Scheme for Unicode

UTF-32

Given by algorithm (UTR 19)

 

UCS-4 limited to planes 0-16.

 

 

 

 

ISO/IEC 646:1991-IR

(By implicit algorithm)

 

7-bit ASCII; US-ASCII.

ISO/IEC 646-SE/FI

 

 

 

ISO/IEC 646-DK/NO

 

 

 

ISO/IEC 646-DE

 

 

 

ISO/IEC 646-FR

 

 

 

ISO/IEC 646-IT

 

 

 

ISO/IEC 646-ES

 

 

 

 

 

 

 

ISO/IEC 6937:1994

 

 

Note that combining characters are stored before the base character for ISO/IEC 6937.

 

 

 

 

ISO/IEC 8859-1:1998

iso8859/8859-1.txt 

1999 July 27

Latin-1

ISO/IEC 8859-2:1999

iso8859/8859-2.txt 

1999 July 27

Latin-2

ISO/IEC 8859-3:1999

iso8859/8859-3.txt 

1999 July 27

Latin-3

ISO/IEC 8859-4:1998

iso8859/8859-4.txt 

1999 July 27

Latin-4

ISO/IEC 8859-5:1999

iso8859/8859-5.txt 

1999 July 27

Latin/Cyrillic

ISO/IEC 8859-6:1999

iso8859/8859-6.txt 

1999 July 27

Latin/Arabic

ISO/IEC 8859-7:1987

iso8859/8859-7.txt 

1999 July 27

Latin/Greek

ISO/IEC 8859-8:1999

iso8859/8859-8.txt 

1999 July 27

Latin/Hebrew

ISO/IEC 8859-9:1999

iso8859/8859-9.txt 

1999 July 27

Latin-5

ISO/IEC 8859-10:1998

iso8859/8859-10.txt 

1999 July 27

Latin-6

ISO/IEC 8859-11

 

 

Latin/Thai

12

 

 

Unused 8859 part number

ISO/IEC 8859-13:1998

iso8859/8859-13.txt 

1999 July 27

Latin-7

ISO/IEC 8859-14:1998

iso8859/8859-14.txt 

1999 July 27

Latin-8

ISO/IEC 8859-15:1999

iso8859/8859-15.txt 

1999 July 27

Latin-9 (Latin-1 replacement)

ISO/IEC 8859-16

 

 

Latin-10

 

 

 

 

1.2 Mac OS

vendors/apple/readme.txt 

Character encoding

Mapping to Unicode/UTF-16

Date of last update

Remark

Mac OS Arabic

vendors/apple/arabic.txt 

1999-Sep-22

 

Mac OS Central European

vendors/apple/centeuro.txt 

1999-Sep-22

 

Mac OS Chinese Simplified

vendors/apple/chinsimp.txt 

1999-Sep-22

 

Mac OS Chinese Traditional

vendors/apple/chintrad.txt 

1999-Sep-22

 

Mac OS Croatian

vendors/apple/croatian.txt 

1999-Sep-22

 

Mac OS Cyrillic

vendors/apple/cyrillic.txt 

1999-Sep-22

 

Mac OS Devanagari

vendors/apple/devanaga.txt 

1999-Sep-22

 

Mac OS Farsi

vendors/apple/farsi.txt 

1999-Sep-22

 

Mac OS Greek

vendors/apple/greek.txt 

1999-Sep-22

 

Mac OS Gujarati

vendors/apple/gujarati.txt 

1999-Sep-22

 

Mac OS Gurmukhi

vendors/apple/gurmukhi.txt 

1999-Sep-22

 

Mac OS Hebrew

vendors/apple/hebrew.txt 

1999-Sep-22

 

Mac OS Icelandic

vendors/apple/iceland.txt 

1999-Sep-22

 

Mac OS Japanese

vendors/apple/japanese.txt 

1999-Sep-22

 

Mac OS Korean

vendors/apple/korean.txt 

1999-Sep-22

 

Mac OS Roman

vendors/apple/roman.txt 

1999-Sep-22

 

Mac OS Romanian

vendors/apple/romanian.txt 

1999-Sep-22

 

Mac OS Thai

vendors/apple/thai.txt 

1999-Sep-22

 

Mac OS Turkish

vendors/apple/turkish.txt 

1999-Sep-22

 

Mac OS Ukrainian

vendors/apple/ukraine.txt 

1999-Sep-22

See vendors/apple/cyrillic.txt 

 

 

 

 

NEXTSTEP Encoding

vendors/next/nextstep.txt 

1999 September 23

 

 

 

 

 

CP 10007 MacCyrillic

vendors/micsft/mac/cyrillic.txt 

04/24/96

See vendors/apple/cyrillic.txt 

CP 10006 MacGreek

vendors/micsft/mac/greek.txt 

04/24/96

See vendors/apple/greek.txt 

CP 10079 MacIcelandic

vendors/micsft/mac/iceland.txt 

04/24/96

See vendors/apple/iceland.txt 

CP 10029 MacLatin2

vendors/micsft/mac/latin2.txt 

04/24/96

See vendors/apple/centeuro.txt 

CP 10000 MacRoman

vendors/micsft/mac/roman.txt 

04/24/96

See vendors/apple/roman.txt 

CP 10081 MacTurkish

vendors/micsft/mac/turkish.txt 

04/24/96

See vendors/apple/turkish.txt 

1.3 Windows

Character encoding

Mapping to Unicode/UTF-16

Date of last update

Remark

CP 874

vendors/micsft/windows/cp874.txt 

02/28/98

Latin/Thai

CP 932

vendors/micsft/windows/cp932.txt 

04/15/98

MS Shift-JIS

CP 936

vendors/micsft/windows/cp936.txt 

04/15/98

MS Chinese (Simpl.)

CP 949

vendors/micsft/windows/cp949.txt 

04/15/98

MS Korean

CP 950

vendors/micsft/windows/cp950.txt 

04/15/98

MS Big-5 (Trad. Chinese)

CP 1250

vendors/micsft/windows/cp1250.txt 

04/15/98

 

CP 1251

vendors/micsft/windows/cp1251.txt 

04/15/98

Latin/Cyrillic

CP 1252

vendors/micsft/windows/cp1252.txt 

04/15/98

Extends on ISO/IEC 8859-1 Latin-1

CP 1253

vendors/micsft/windows/cp1253.txt 

04/15/98

Latin/Greek

CP 1254

vendors/micsft/windows/cp1254.txt 

04/15/98

 

CP 1255

vendors/micsft/windows/cp1255.txt 

04/15/98

Latin/Hebrew

CP 1256

vendors/micsft/windows/cp1256.txt 

01/5/99

Latin/Arabic

CP 1257

vendors/micsft/windows/cp1257.txt 

04/15/98

 

CP 1258

vendors/micsft/windows/cp1258.txt 

04/15/98

 

1.4 DOS

See also the IBM README file (vendors/ibm/readme.txt) on encoding mappings.

Character encoding

Mapping to Unicode/UTF-16

Date of last update

Remark

CP 437 Latin (US)

vendors/micsft/pc/cp437.txt 

04/24/96

 

CP 737 Greek (A)

vendors/micsft/pc/cp737.txt 

04/24/96

 

CP 775 BaltRim

vendors/micsft/pc/cp775.txt 

04/24/96

 

CP 850 Latin (A)

vendors/micsft/pc/cp850.txt 

04/24/96

 

CP 852 Latin (B)

vendors/micsft/pc/cp852.txt 

04/24/96

 

CP 855 Cyrillic (A)

vendors/micsft/pc/cp855.txt 

04/24/96

 

CP 857 Turkish

vendors/micsft/pc/cp857.txt 

04/24/96

 

CP 860 Portuguese

vendors/micsft/pc/cp860.txt 

04/24/96

 

CP 861 Icelandic

vendors/micsft/pc/cp861.txt 

04/24/96

 

CP 862 Hebrew

vendors/micsft/pc/cp862.txt 

04/24/96

 

CP 863 Canada F

vendors/micsft/pc/cp863.txt 

04/24/96

 

CP 864 Arabic

vendors/micsft/pc/cp864.txt 

04/24/96

 

CP 865 Nordic

vendors/micsft/pc/cp865.txt 

04/24/96

 

CP 866 Cyrillic (B)

vendors/micsft/pc/cp866.txt 

04/24/96

 

CP 869 Greek (B)

vendors/micsft/pc/cp869.txt 

04/24/96

 

CP 874 Thai

vendors/micsft/pc/cp874.txt 

04/15/98

See vendors/micsft/windows/cp874.txt 

1.5 Other ASCII-based

Non-ISO encodings on Unixes, Adobe's, non-MS PC, GSM/SMS, RDS, ...

Character encoding

Mappingt to Unicode/UTF-16

Date of last update

Remark

ETSI 7-bit default alphabet

 

 

GSM/SMS (UCS-2 can also be used for GSM/SMS)

 

 

 

 

Adobe Standard Encoding

vendors/adobe/stdenc.txt 

30 March 1999

vendors/adobe/readme.txt 

 

 

 

 

IBM CP 1006

vendors/misc/cp1006.txt 

1999 July 27

ASCII+Arabic

CP 856

vendors/misc/cp856.txt 

1999 July 27

ASCII+Hebrew

KOI 8-R (RFC 1489)

vendors/misc/koi8-r.txt 

18 August 1999

ASCII+Cyrillic

 

 

 

 

JIS X 0201 (1976)

eastasia/jis/jis0201.txt 

8 March 1994

 

Shift-JIS

eastasia/jis/shiftjis.txt 

8 March 1994

 

Johab

eastasia/ksc/johab.txt 

08/16/99

 

2. EBCDIC based

See also:vendors/ibm/readme.txt.

Character encoding

Mapping to Unicode/UTF-16

Date of last update<

Remark

UTF-EBCDIC

Given by algorithm (UTR 16)

 

Only for use where EBCDIC is required.

 

 

 

 

IBM EBCDIC CP 424 (Hebrew)

vendors/misc/cp424.txt 

1999 July 27

 

 

 

 

 

CP 037 IBM US Canada

vendors/micsft/ebcdic/cp037.txt 

04/24/96

 

CP 500 IBM International

vendors/micsft/ebcdic/cp500.txt 

04/24/96

 

CP 875 IBM Greek

vendors/micsft/ebcdic/cp875.txt 

04/24/96

 

CP 1026 IBM Latin-5 Turkish

vendors/micsft/ebcdic/cp1026.txt 

04/24/96

 

3. Others

East Asian without ASCII/EBCDIC, symbol, dingbat, corporate zone, character entities, cross-references, ...

(Character encoding)

(Mapping to Unicode/UTF-16)

Date of last update

Remark

IBM PC memory-mapped video graphics

vendors/misc/ibmgraph.txt 

1999 July 27

 

 

 

 

 

SGML character entities

vendors/misc/sgml.txt 

25 July 1997

 

 

 

 

 

Adobe Symbol Encoding

vendors/adobe/symbol.txt 

30 March 1999

vendors/adobe/readme.txt 

Adobe Zapf Dingbats Encoding

vendors/adobe/zdingbat.txt 

30 March 1999

 

 

 

 

 

Registry of Apple use of Unicode corporate-zone

vendors/apple/corpchar.txt 

1999-Sep-22

Registry, not a mapping

Mac OS Dingbats

vendors/apple/dingbats.txt 

1999-Sep-22

 

Mac OS Symbol

vendors/apple/symbol.txt 

1999-Sep-22

 

 

 

 

 

TCVN-NSCII Stack 1.0 HyperCard stack

EASTASIA/TCVN/TCV-SEA.HQX 

 

eastasia/tcvn/readme.txt 

Unicode Han Character Cross-Reference

eastasia/cjkxref.txt 

14 March 1994

 

Unihan database

eastasia/unihan.txt 

23 September 1996

 

 

 

 

 

Korean Hangul Encoding Conversion

eastasia/ksc/hangul.txt 

Oct 04, 1995

 

KS C 5601

eastasia/ksc/old5601.txt 

6 December 1993

Note: For Unicode 1.1! Obsolete!

Unified Hangeul (KS C 5601-1992)

eastasia/ksc/ksc5601.txt 

07/24/95

For Unicode 2.0 and onwards.

Unified Hangul (KS X 1001)

eastasia/ksc/ksx1001.txt 

08/16/99

 

 

 

 

 

JIS X 0208 (1990)

eastasia/jis/jis0208.txt 

8 March 1994

 

JIS X 0212 (1990)

eastasia/jis/jis0212.txt 

8 March 1994

 

 

 

 

 

GB 12345-80

eastasia/gb/gb12345.txt 

6 December 1993

 

GB 2312-80

eastasia/gb/gb2312.txt 

6 December 1993

 

 

 

 

 

BIG5

eastasia/other/big5.txt 

11 February 1994

 

CNS 11643-1986

eastasia/other/cns11643.txt 

21 October 1994