From: Christopher Fynn (cfynn@gmx.net)
Date: Fri Jan 07 2005 - 14:22:13 CST
Mike Ayers <mike.ayers@tumbleweed.com> wrote:
> Second, and more importantly, since GB18030 does not encode all of
> Unicode, it cannot be considered a Unicode encoding form.
While it it isn't exactly a "Unicode encoding form" I thought that while
GB18030 is a primarily a superset of GBK it is also in effect a superset
of ISO 10646 in that it includes all characters in ISO 10646 (though at
different positions) and has more code positions than ISO 10646 & Unicode.
For instance the document "IBM Simplified Chinese Graphic Character Set,
GB 18030 code: National Standard and DBCS-Host" (2001) says:
| 4.4 GB 18030
| GB 18030, PRC National Standard, contains all char-
| acters defined in ISO 10646-1, but they have totally
| different code assignment. In GB 18030, one-byte,
| two-byte and four-byte encoding systems are adopted.
| The total capability is over 1.5 millions of code posi-
| tions. Currently, GB 18030 contains more than 27 000
| Chinese characters which have been defined in the
| latest version of ISO 10646-1.
And Meyer's GB18030 Summary
<ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf>
says:
| The Significant properties of GB18030 are
| o It incorporates Unicode's Unihan Extension A completly.
| o It provides code-space for all used and unused code points of
| Unicode's Plane 0 (BMP)and it's 16 additional planes if these
| code points were not already included in GBK.
| Expressed differently: while being a code- and character
| compatible "superset" of GBK, at the same time intends to
| provide space for all remaining code points of Unicode.
| Thus it effectively provides a 1-to-1 relationship between
| parts of GB 18030 and Unicode's complete encoding space.
...
- Chris
This archive was generated by hypermail 2.1.5 : Fri Jan 07 2005 - 14:27:42 CST