ISO 10646 & GB18030 repetoire [was: Re: ISO 10646 compliance and EU law]

From: Christopher Fynn (cfynn@gmx.net)
Date: Thu Jan 06 2005 - 13:16:38 CST

  • Next message: Peter Kirk: "Re: ISO 10646 compliance and EU law"

    Phillipe

    It appears that China do plan to add a two groups of pre-composed
    Tibetan characters to GB18030

    Group A: 1536 Characters for Modern Tibetan - uses GB18030 codes which
    map to PUA on BMP [F300-F8FF]

    Group B: 5664 additional Characters for traditional texts (there are a
    few combinations missing here so this number could grow) - this uses
    GB18030 codes which map to PUA on the Supplementary Private Use Area A
    plane [F0000 to F161F]

    All these precomposed characters can of course be represented by
    combinations of existing Unicode characters. To support text with these
    characters properly on a Unicode based system you'd probably want to
    convert the GB18030 pre-composed Tibetan characters to combinations of
    characters in the Tibetan block rather than just mapping them to PUA
    characters.

    I suspect that support of at least the combinations in Group A will
    become a requirement for some levels of GB18030 compliance.

    There seems to be one defect - the charts I've seen seem to contain a
    pre-composed character equivalent to the combination U+0F68 U+0F7C
    U+0F7E - It appears they've assumed that U+0F00 can be used as the
    equivalent to that string. However in Unicode U+0F00 is *not* equivalent
    to U+0F68 U+0F7C U+0F7E (U+0F00 has no de-composition). I think this
    means that there would be no round-trip compatibility for this combination.

    - Chris

    Philippe VERDY wrote:

    >>"Andrew C. West" wrote:
    >>
    >>>Kenneth Whistler at Sybase wrote:
    >>>
    >>>>Philippe Verdy wrote:
    >>>>(Since now the mapping between GB18030 and ISO/IEC 10646 is well defined and
    >>>>closed,
    >>>
    >>>False. Both GB18030 and ISO/IEC 10646 will be amended in the future,
    >>>and mappings will change, and neither has (in principle) a closed repertoire.
    >>>
    >>
    >>I don't get this. I understand that neither GB18030 or ISO/IEC 10646 has a
    >>closed character repertoire, but (and I think this is the point that Philippe
    >>was trying to make) they do both have a closed code point repertoire, and there
    >>is a one-to-one mapping between all ISO/IEC 10646 code points in planes 0-16 and
    >>GB18030 code points, and this code point mapping will never change. I've
    >>implemented Unicode to/from GB18030 conversion using the widely available
    >>mapping tables (e.g. from ),
    >
    >
    > Thanks for correcting this refutation by Kenneth. I thought I was very clear when explicitly speaking about the "mapping" (not the term repertoire, which I know is not closed in ISO/IEC 10646 and in GB18030: I never said that).
    > So I know that both ISO/IEC 10646 and GB18030 repertoires will be amended, but the current statements in the GB18030 standard is that its mapping with ISO/IEC 10646 will remain closed and compatible with all future amendments of ISO/IEC 10646 (and so, also with Unicode), in way similar to the synchronization of the repertoire and assignments used by Unicode. From my point of view, both Unicode and GB18030 have now a similar policy to remain synchronized with the base ISO/IEC 10646 character repertoire.
    > This effectively means that this statement implies If China wants to standardize in GB18030 some precomposed character that are not in ISO/IEC10646, this is possible only within the PUA. GB18030 will remain fully compatible with ISO/IEC10646 and Unicode, but will add a required mutual agreement about its PUA usage.
    >
    > And I don't think that GB18030 will be amended such a way: it could be only a temporary solution, before new assignments will be added in the ISO/IEC 10646 repertoire, and synchronized with Unicode which still must obey to its own policy about the stability of canonical and compatibility equivalence (this means that, if usch precomposed characters that GB18030 would allow to decompose to other existing characters already mapped standard ISO/IEC 10646 codepoints, will not have any canonical or compatibility equivalence in Unicode; this does not affect conformance with ISO/IEC 10646 where such equivalences do not exist and are not defined; but it would be a real inconvenience if these precomposed characters were assigned standard codepoints out of PUAs, because it would become impossible to define a mutual agreement about such additional equivalences, which are only permitted with PUAs).
    >
    > So in practice, the only extensions allowed for the GB18030 repertoire is within the PUAs, which already have a closed mapping with ISO/IEC 10646 (and Unicode) codepoints. All other extensions must be first approved and standardized in ISO/IEC 10646, before GB18030 can be extended with new characters in its repertoire; the only alternative would be that China breaks its existing policy about its closed mapping between its GB18030 encoding standard and ISO/10646 codepoints. This would be very bad news for developers that have to support GB18030 in their software, because this would mean specific solutions to support GB18030, without the possibility to map it safely to ISO/IEC 10646 and Unicode. This would be a new nightmare for interoperability of GB18030-enabled softwares and Unicode/ISO/IEC10646-enabled softwares, which would mean that existing softwares that comply to Unicode or ISO/IEC 10646 will no more be compatible with the required GB18030 standard for China.
    >
    > If Kenneth thinks otherwise, then he should explain why, because it would be a serious problem for those that think that their Unicode/ISO/IEC-10646 software will be compatible with the required GB18030 standard for China. I think it is extremely important that the mapping of codes between GB18030 and ISO/IEC10646 stay closed, even if these codes are still not all assigned to abstract characters. It is equally important that China then avoids any attempt to extend its GB18030 repertoire without first requesting and getting approval in the ISO/IEC 10646 standard respertoire.
    >
    > This is the job of the Ideographic working group and rapporter to avoid that such event will never occur, by negociating these amendments with China and with ISO working group. Discussing them with Unicode working group is not necessary, given that Unicode will obey to all decisions in ISO/IEC 10646. If Kenneth thinks otherwise, then he should explain too: this would be also bad news for those developers which do think today that their Unicode-enabled software will also be compatible with ISO/IEC 10646!



    This archive was generated by hypermail 2.1.5 : Thu Jan 06 2005 - 13:20:36 CST