Re: ISO 10646 & GB18030 repertoire

From: Christopher Fynn (
Date: Fri Jan 07 2005 - 14:22:13 CST

  • Next message: Philippe VERDY: "Re: Re: ISO 10646 & GB18030 repertoire"

    Mike Ayers <> wrote:

    > Second, and more importantly, since GB18030 does not encode all of
    > Unicode, it cannot be considered a Unicode encoding form.

    While it it isn't exactly a "Unicode encoding form" I thought that while
    GB18030 is a primarily a superset of GBK it is also in effect a superset
    of ISO 10646 in that it includes all characters in ISO 10646 (though at
    different positions) and has more code positions than ISO 10646 & Unicode.

    For instance the document "IBM Simplified Chinese Graphic Character Set,
    GB 18030 code: National Standard and DBCS-Host" (2001) says:

    | 4.4 GB 18030
    | GB 18030, PRC National Standard, contains all char-
    | acters defined in ISO 10646-1, but they have totally
    | different code assignment. In GB 18030, one-byte,
    | two-byte and four-byte encoding systems are adopted.
    | The total capability is over 1.5 millions of code posi-
    | tions. Currently, GB 18030 contains more than 27 000
    | Chinese characters which have been defined in the
    | latest version of ISO 10646-1.

    And Meyer's GB18030 Summary

    | The Significant properties of GB18030 are
    | o It incorporates Unicode's Unihan Extension A completly.
    | o It provides code-space for all used and unused code points of
    | Unicode's Plane 0 (BMP)and it's 16 additional planes if these
    | code points were not already included in GBK.
    | Expressed differently: while being a code- and character
    | compatible "superset" of GBK, at the same time intends to
    | provide space for all remaining code points of Unicode.
    | Thus it effectively provides a 1-to-1 relationship between
    | parts of GB 18030 and Unicode's complete encoding space.

    - Chris

    This archive was generated by hypermail 2.1.5 : Fri Jan 07 2005 - 14:27:42 CST