GB18030 summary and issues

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Fri Oct 13 2000 - 14:13:41 EDT


Dear Uni-encoders and -decoders,

Dirk Meyer from Adobe has put together an extensive summary of the chinese GB 18030 encoding standard that was published on 2000-mar-17. Ken Lunde and I assisted Dirk with reviews and comments.

The summary is on the web site of Ken's famous CJKV book "with the fish":
ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf

To summarize the summary, we now have an english text describing the new encoding in its details. There are a few apparent errors, typos, and inconsistencies in the chinese standard text that need to be resolved.

For implementers, there is enough information in the summary to describe the encoding structure and to prepare an implementation.

What is still missing - aside from the resolution of the issues mentioned here - is a precise mapping table for how to map between at least the one-byte and two-byte portions of GB 18030 to and from Unicode.
In theory, it should be almost the same as GBK, but to be sure, we need precise, complete, and machine-readable mappings.
Given the one-byte and two-byte portions and the description in the standard and in the summary, the four-byte portion can be derived with a little bit of Perl or similar.

Anyone who needs to implement or know about GB 18030 should probably read this text.

Anyone who can contribute precise mapping tables and/or can help resolving the open issues please do so.

Best regards,

markus



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT