Re: Re: ISO 10646 & GB18030 repertoire

From: Philippe VERDY (verdy_p@wanadoo.fr)
Date: Fri Jan 07 2005 - 15:39:08 CST

  • Next message: Andrew C. West: "Re: GB18030 mapping"

    > De : "Christopher Fynn"
    > > Second, and more importantly, since GB18030 does not encode all of
    > > Unicode, it cannot be considered a Unicode encoding form.
    >
    > | Thus it effectively provides a 1-to-1 relationship between
    > | parts of GB 18030 and Unicode's complete encoding space.

    True. In addition, the GB18030 standard requires that if the GB18030 encoding is not used for internal processing, the internal encoding form must keep a roundtrip compatibility back to GB18030, for all code positions assigned in GB18030.
    GB18030 is related to GB13000.1, which specifies the same repertoire as standardized in GB18030, but encoded according to Unicode/ISO/IEC 10646.

    GB18030-2000 explicitly includes all GB13000.1 characters, but it does not specify how the remaining code positions of GB18030 will be allocated or mapped to ISO/IEC 10646 and codepoints (or equivalently to GB13000.1 codepoints).

    If China wants to allocate characters in GB18030 that don't have a roundtrip compatibility with ISO/IEC 10646, it will cause problems as well for its other normative GB13000.1 standard (which is a subset of ISO/IEC 10646), unless the ISO/IEC 10646 standard is updated to cover more than the existing 17 standard planes (very very unlikely, as it will also cause problems for all applications that depend on UTF-16 for internal processing, and that have now adopted the strict checking about valid UTF-8 and UTF-32 encoding forms and schemes), or unless the GB18030 removes or updates its current requirement for full roundtrip convertibility with internal processing units (including Unicode/IESO/IEC 10646 codepoints...)

    My view is that the current roundtrip convertibility is unmaintainable for the whole code space, but will be limited only to the supported mandatory repertoire that includes CJK characters and others that are part of the GB13000.1 subset. In that subset, I don't know how China will manage the case of PUAs it wishes to map for the new proposed precomposed forms... Lots of troubles are expected here, and precisions will need to be done by amending GB13000.1



    This archive was generated by hypermail 2.1.5 : Fri Jan 07 2005 - 15:49:01 CST