UnicodeIUC18
Unicode Standard Conference Board Past Conferences Call for Papers Sponsors Showcase
Registration Accommodation Travel Program Talks and Papers Next Conference
Abstract

Two New Coded Character Standards from China and their Implementation: HK SCS & GB 18030

Dirk Meyer - Adobe Systems, Inc.

Intended Audience: Manager, Software Engineer, Systems Analyst, Marketer
Session Level: Intermediate, Advanced

Purpose:

The purpose of this presentation is to describe two Chinese "coded character sets" that have been published in August 1999 and in March 2000, respectively, as well as problems related to their implementation. These two standards are the Hong Kong Supplementary Character Set and the Chinese national standard GB 18030-2000.

Description:

The paper will provide detailed information about the history and the contents of both standards:

  • The HK SCS as well as its predecessor, the HK GCCS, will be described, as will be the efforts of the Hong Kong Government to put in place the HK SCS, representing a significant improvement.
  • The Chinese national standard GB 18030-2000 will be described as a character standard that attempts a complete re-mapping of Unicode 3.0's character repertoire into a "legacy" code space as defined through the specification GBK, now being extended to form the standard GB 18030. In order to achieve the task a four-byte encoding mechanism is being applied which will be described.

For developers of applications and fonts who intend to support the languages and scripts involved ("simplified" and "traditional" Chinese), the two standards provide specific challenges, especially when the working environment is partly of fully Unicode-based. In such an environment complete support for the HK SCS, for example, can only be achieved through effective use of the Private Use Area. Fonts and applications that fulfill this task will be introduced and described in form of a case study (Adobe Acrobat 5.0 and the underlying "substitution fonts").

When it comes to GB 18030-2000, it is of crucial importance that mapping mechanisms between the original encoding and Unicode are being adopted efficiently. Currently open questions about the coverage of GB 18030 (which may very well have been resolved at the time of the presentation) will be mentioned. Again, Acrobat 5.0 and its font machinery will be used to illustrate a possible approach how to handle the challenges here.

Conclusions:

Although covering different character repertoires and based on different encodings, the new Chinese standards are similar in that they both provide clearly defined conduits to Unicode and in that they both contain unique challenges when it comes to their implementation in a Unicode environment.

With regard to the standard HK SCS, not all of its characters are available in Unicode 3.0. Support for this character set in Unicode-based applications can be achieved only with the help of the Private Use Area.

GB 18030-2000, on the other hand, presents a challenge in that it applies a four-byte encoding scheme to map the complete character repertoire of Unicode 3.0 into its own encoding space in order to remain compatible with pre-existing national Chinese standards, namely the specification GBK.

While an obvious tendency can be noticed to support Unicode as much as possible within the framework of these relatively new standards, it remains the task of applications or fonts to create an environment that seamlessly integrates these standards' character repertoires into a Unicode-based environment.


Unicode
When the world wants to talk, it speaks Unicode

UnicodeIUC18
Unicode Standard Conference Board Past Conferences Call for Papers Sponsors Showcase
Registration Accommodation Travel Program Talks and Papers Next Conference
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

13 December 2000, Webmaster