Conversion Between Hong Kong Supplementary Character Set (HKSCS) and Unicode
Linus Toshihiro Tanaka - Oracle Corporation
There are two written Chinese languages well recognized in the computer industry. They are Simplified Chinese used primarily in Mainland China, and Traditional Chinese used primarily in Taiwan. There are three more places in the world where Chinese is one of the primary languages, Hong Kong, Macau, and Singapore. Hong Kong's written Chinese language is normally treated as Traditional Chinese. However, there are more than 1,000 characters used in Hong Kong and Mainland China but not frequently used in Taiwan. Therefore, Hong Kong's written Chinese language may be somewhere between Traditional Chinese and Simplified Chinese but much closer to Traditional Chinese than Simplified Chinese. Also, there are a few thousand characters used in Hong Kong that are not used or not frequently used in other Chinese speaking countries and regions. Some of these Hong Kong specific characters have not been included even in Unicode3.0.
In order to solve these two issues, Hong Kong government (currently called Hong Kong S.A.R. government) had defined Government Common Character Set (GCCS) based on Taiwan's Big-5 encoded character set. GCCS included around 3,000 extra characters over Taiwan's Big-5. About half of them are included in China's GBK encoded character set, thus also included in Unicode2.1. Remaining half were not included in Taiwan's Big-5, China's GBK, nor Unicode2.1. Some of these Hong Kong specific characters have been included in Unicode3.0, but there are still some characters not included in Unicode3.0.
In September 1999, Hong Kong S.A.R. government defined Hong Kong Supplementary Character Set (HKSCS) which is the successor of Government Common Character Set (GCCS). Unlike GCCS, HKSCS defines precise mapping between HKSCS and Unicode2.1, and also between HKSCS and Unicode3.0.
Oracle has implemented HKSCS in Oracle8i Release 3 (8.1.7). It handles mapping between HKSCS and Unicode3.0, as well as the compatibility mapping between HKSCS and Unicode2.1. Although HKSCS is very carefully defined by Hong Kong S.A.R. government, there are small number of implementation dependent issues.
In this paper, I explain the specific issues when implementing HKSCS, and what Oracle has done for them.
|When the world wants to talk, it speaks Unicode|
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS).
GMS is pleased to be able to offer the International Unicode Conferences under an exclusive
license granted by the Unicode Consortium. All responsibility for conference finances and
operations is borne by GMS. The independent conference board serves solely at the pleasure
of GMS and is composed of volunteers active in Unicode and in international software
development. All inquiries regarding International Unicode Conferences should be addressed
Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.
13 December 2000, Webmaster