GB 18030 is a new Chinese codepage standard that extends GB 2312-1980 and GBK (which itself is an extension of GB 2312-1980).
It is a multi-byte encoding using
1-byte, 2-byte, and 4-byte codes. The 1-byte and 2-byte codes have the same
assignments as in GBK, which itself is a superset of GB 2312-1980.
There are about 1.6 million valid byte sequences.
It is not possible to determine if a byte sequence is either 2 or 4 bytes long by just examining the lead byte — the second byte must be examined as well.
The Chinese Government has mandated that all applications released on or after 2001-Sep-01 must support GB 18030.
The specification refers directly to a mapping of GB 18030 codes to and from Unicode to define most character assignments. Some characters that used to be mapped for GBK to the PUA (Private Use Area) for Unicode 2.1 are now assigned in Unicode 3.0, and their mappings from GB 18030 use only the Unicode 3.0 code points.
In addition, GB 18030 defines roundtrip mappings for all 1.1 million Unicode code points including unassigned and non-character ones, but excluding single surrogates. This makes GB 18030 functionally very similar to a UTF.
An article with more details and with implementation suggestions is available on the developerWorks site.