The Hithhiker's Guide to Chinese Encodings
Thomas Emerson - Basis Technology Corporation
This paper presents an overview and analysis of the plethora of Chinese character encodings, describing their similarities and differences, and describing how they map to various versions of Unicode. For example, how does Big 5 compare with Big 5+ and Microsoft CP950? What about the various extensions to Big Five? How do the HKSCS, Eten and HKUST EUDC extensions to Big 5 compare and map to Unicode? How does one round-trip each of these? And then there is CNS-11643...
Unfortunately, dealing Simplified Chinese is no simpler: what is the relationship between GB 2312:80 and GB 12345:90 (GB 12345 is the traditional analog to GB 2312) and how does GB 12345 compare with Big 5? For that matter, how does GB 2312:80 compare with GBK and Microsoft CP936? And how do all of these map to Unicode 2.1 and 3.0.1? What do all these mean for the poor programmer who has to try and deal with them?
At the end of this presentation, you will leave with a better understanding of how these encodings relate and how to deal with them when authoring Chinese-language applications.
|When the world wants to talk, it speaks Unicode|
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS).
GMS is pleased to be able to offer the International Unicode Conferences under an exclusive
license granted by the Unicode Consortium. All responsibility for conference finances and
operations is borne by GMS. The independent conference board serves solely at the pleasure
of GMS and is composed of volunteers active in Unicode and in international software
development. All inquiries regarding International Unicode Conferences should be addressed
Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.
10 December 2000, Webmaster