Re: Re: ISO 10646 compliance and EU law

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jan 06 2005 - 14:08:55 CST

  • Next message: Kenneth Whistler: "Re: ISO 10646 compliance and EU law"

    Philippe,

    > Thanks for correcting this refutation by Kenneth.

    > So I know that both ISO/IEC 10646 and GB18030 repertoires will be
    > amended, but the current statements in the GB18030 standard is that
    > its mapping with ISO/IEC 10646 will remain closed

    Which is false.

    > and compatible with

    This probably will hold true.

    > all future amendments of ISO/IEC 10646
    > (and so, also with Unicode),
    > in way similar to the synchronization of the repertoire and assignments
    > used by Unicode. From my point of view, both Unicode and GB18030 have
    > now a similar policy to remain synchronized with the base ISO/IEC
    > 10646 character repertoire.
    > This effectively means that this statement implies If China wants to
    > standardize in GB18030 some precomposed character that are not in
    > ISO/IEC10646, this is possible only within the PUA.

    This is false. China can do what it wants in GB18030, and the decisions
    that they take will impact implementations, depending on how carefully
    they are synchronized or not.

    > GB18030 will remain fully compatible with ISO/IEC10646 and Unicode,
    > but will add a required mutual agreement about its PUA usage.

    False. As I will demonstrate below.

    > So in practice, the only extensions allowed for the GB18030
    > repertoire is within the PUAs,

    False.

    > which already have a closed mapping with ISO/IEC 10646 (and Unicode)
    > codepoints.

    False.

    > All other extensions must be first approved and standardized in
    > ISO/IEC 10646, before GB18030 can be extended with new characters
    > in its repertoire;

    False.

    > the only alternative would be that China breaks its existing policy
    > about its closed mapping between its GB18030 encoding standard and
    > ISO/10646 codepoints.

    It has done so in the past, and this will happen again in the future.

    > This would be very bad news for developers that have to support
    > GB18030 in their software, because this would mean specific solutions
    > to support GB18030, without the possibility to map it safely to
    > ISO/IEC 10646 and Unicode.

    At last, something indisputably true!

    > This would be a new nightmare for interoperability of
    > GB18030-enabled softwares and Unicode/ISO/IEC10646-enabled
    > softwares, which would mean that existing softwares that comply
    > to Unicode or ISO/IEC 10646 will no more be compatible with the
    > required GB18030 standard for China.

    Correct. Welcome to the wonderful world of GB18030 support for
    China.

    >
    > If Kenneth thinks otherwise, then he should explain why,
    > because it would be a serious problem for those that think
    > that their Unicode/ISO/IEC-10646 software will be compatible
    > with the required GB18030 standard for China.

    O.k.

    Example 1:

    GB 18030-2000 defines a CJK component at FE90 and maps that
    component to U+E854, because that component is not encoded
    in Unicode 3.0 or ISO/IEC 10646-1:2000.

    Because such PUA mappings for GB 18030-2000 have proven
    very problematical in implementations, the characters in
    question have been added to 10646 (under ballot currently
    in Amd 1 to ISO/IEC 10646:2003). This particular CJK component
    is to be encoded at U+9FBA.

    And this means that GB 18030 / Unicode mapping tables up
    to about March 31, 2005 will contain the mappings:

        FE90 <--> U+E854
    82359133 <--> U+9FBA

    After that time, they will contain the mappings:

        ???? <--> U+E854
        FE90 <--> U+9FBA
    82359133 <--> ???? (probably U+FFFD)

    Example 2:

    China decides to add Tibetan BrdaRten syllables to GB 18030
    and map them to PUA characters in 10646.

    Well, guess what -- *all* PUA code points in 10646 already
    have defined mappings to GB 18030. That means that the addition
    of the Tibetan BrdaRten syllables and definition of mappings
    will *change* those mappings, and will require changes to the
    mappings tables. The only way to avoid that would be for
    any GB 18030 additions to be defined at specific code points
    currently labelled as empty in GB 18030 but mapped to 10646
    PUA code points. For instance:

    TIBETAN CHARACTER KA U ==> AAA1 <--> U+E000

    That wouldn't change the code point mapping, but... to actually
    support the standardization of such a set of syllables in
    GB 18030, the vendor mapping tables will have to introduce,
    instead, the one-to-many mappings to actually intepret the
    Tibetan syllables as what they are, instead of PUA code points,
    so you would end up with the following entry in the mapping
    tables:

        AAA1 <--> <U+0F40, U+0F74>
        
    Both of these scenarios are either in the works right now, or
    will happen in the not-too-distant future.

    If you think the mapping tables will just stay pristine and
    unchanged forever, in the face of such changes, you are smoking
    something. The *REASON* for making such additions is either to
    enable or *force* vendors to change the tables.

    > I think it is extremely important that the mapping of codes
    > between GB18030 and ISO/IEC10646 stay closed, even if these
    > codes are still not all assigned to abstract characters.

    You can think that, but if you mean by "closed" that the mappings
    stay stable and need not be versioned as either or both of the
    standards change, then you are flat wrong. It won't happen that
    way.

    > It is equally important that China then avoids any attempt to
    > extend its GB18030 repertoire without first requesting and
    > getting approval in the ISO/IEC 10646 standard respertoire.

    It may be important, but China does not come to WG2 asking
    permission. They are a sovereign entity, and they change
    their own standards as they see fit.

    > This is the job of the Ideographic working group and rapporter
    > to avoid that such event will never occur, by negociating these
    > amendments with China and with ISO working group.

    The IRG and its rapporteur have no jurisdiction here. Sure
    its members and anyone else can get involved in the discussions
    to try to minimize the potential for damaging changes. But
    you *will not* be able to prevent changes.

    --Ken



    This archive was generated by hypermail 2.1.5 : Thu Jan 06 2005 - 14:13:35 CST