Re: Re: ISO 10646 compliance and EU law

From: Mark Davis (mark.davis@jtcsv.com)
Date: Thu Jan 06 2005 - 17:36:01 CST

  • Next message: Mark Davis: "Re: Unicode and Levenshtein?"

    I agree with Ken's statement, but would qualify one bit.

    > to about March 31, 2005 will contain the mappings:
    >
    > FE90 <--> U+E854
    > 82359133 <--> U+9FBA
    >
    > After that time, they will contain the mappings:
    >
    > ???? <--> U+E854
    > FE90 <--> U+9FBA
    > 82359133 <--> ???? (probably U+FFFD)

    The http://www.unicode.org/reports/tr22/ recommends mapping tables of the
    following form to handle that situation, by changing the old cases into
    one-way mappings. This provides a more graceful transition.

         FE90 <-- U+E854
         FE90 <--> U+9FBA
     82359133 --> U+9FBA

    This does not detract from the point that Ken is making.

    ‎Mark

    ----- Original Message -----
    From: "Kenneth Whistler" <kenw@sybase.com>
    To: <verdy_p@wanadoo.fr>
    Cc: <unicode@unicode.org>; <kenw@sybase.com>
    Sent: Thursday, January 06, 2005 12:08
    Subject: Re: Re: ISO 10646 compliance and EU law

    > Philippe,
    >
    > > Thanks for correcting this refutation by Kenneth.
    >
    > > So I know that both ISO/IEC 10646 and GB18030 repertoires will be
    > > amended, but the current statements in the GB18030 standard is that
    > > its mapping with ISO/IEC 10646 will remain closed
    >
    > Which is false.
    >
    > > and compatible with
    >
    > This probably will hold true.
    >
    > > all future amendments of ISO/IEC 10646
    > > (and so, also with Unicode),
    > > in way similar to the synchronization of the repertoire and assignments
    > > used by Unicode. From my point of view, both Unicode and GB18030 have
    > > now a similar policy to remain synchronized with the base ISO/IEC
    > > 10646 character repertoire.
    > > This effectively means that this statement implies If China wants to
    > > standardize in GB18030 some precomposed character that are not in
    > > ISO/IEC10646, this is possible only within the PUA.
    >
    > This is false. China can do what it wants in GB18030, and the decisions
    > that they take will impact implementations, depending on how carefully
    > they are synchronized or not.
    >
    > > GB18030 will remain fully compatible with ISO/IEC10646 and Unicode,
    > > but will add a required mutual agreement about its PUA usage.
    >
    > False. As I will demonstrate below.
    >
    > > So in practice, the only extensions allowed for the GB18030
    > > repertoire is within the PUAs,
    >
    > False.
    >
    > > which already have a closed mapping with ISO/IEC 10646 (and Unicode)
    > > codepoints.
    >
    > False.
    >
    > > All other extensions must be first approved and standardized in
    > > ISO/IEC 10646, before GB18030 can be extended with new characters
    > > in its repertoire;
    >
    > False.
    >
    > > the only alternative would be that China breaks its existing policy
    > > about its closed mapping between its GB18030 encoding standard and
    > > ISO/10646 codepoints.
    >
    > It has done so in the past, and this will happen again in the future.
    >
    > > This would be very bad news for developers that have to support
    > > GB18030 in their software, because this would mean specific solutions
    > > to support GB18030, without the possibility to map it safely to
    > > ISO/IEC 10646 and Unicode.
    >
    > At last, something indisputably true!
    >
    > > This would be a new nightmare for interoperability of
    > > GB18030-enabled softwares and Unicode/ISO/IEC10646-enabled
    > > softwares, which would mean that existing softwares that comply
    > > to Unicode or ISO/IEC 10646 will no more be compatible with the
    > > required GB18030 standard for China.
    >
    > Correct. Welcome to the wonderful world of GB18030 support for
    > China.
    >
    > >
    > > If Kenneth thinks otherwise, then he should explain why,
    > > because it would be a serious problem for those that think
    > > that their Unicode/ISO/IEC-10646 software will be compatible
    > > with the required GB18030 standard for China.
    >
    > O.k.
    >
    > Example 1:
    >
    > GB 18030-2000 defines a CJK component at FE90 and maps that
    > component to U+E854, because that component is not encoded
    > in Unicode 3.0 or ISO/IEC 10646-1:2000.
    >
    > Because such PUA mappings for GB 18030-2000 have proven
    > very problematical in implementations, the characters in
    > question have been added to 10646 (under ballot currently
    > in Amd 1 to ISO/IEC 10646:2003). This particular CJK component
    > is to be encoded at U+9FBA.
    >
    > And this means that GB 18030 / Unicode mapping tables up
    > to about March 31, 2005 will contain the mappings:
    >
    > FE90 <--> U+E854
    > 82359133 <--> U+9FBA
    >
    > After that time, they will contain the mappings:
    >
    > ???? <--> U+E854
    > FE90 <--> U+9FBA
    > 82359133 <--> ???? (probably U+FFFD)
    >
    > Example 2:
    >
    > China decides to add Tibetan BrdaRten syllables to GB 18030
    > and map them to PUA characters in 10646.
    >
    > Well, guess what -- *all* PUA code points in 10646 already
    > have defined mappings to GB 18030. That means that the addition
    > of the Tibetan BrdaRten syllables and definition of mappings
    > will *change* those mappings, and will require changes to the
    > mappings tables. The only way to avoid that would be for
    > any GB 18030 additions to be defined at specific code points
    > currently labelled as empty in GB 18030 but mapped to 10646
    > PUA code points. For instance:
    >
    > TIBETAN CHARACTER KA U ==> AAA1 <--> U+E000
    >
    > That wouldn't change the code point mapping, but... to actually
    > support the standardization of such a set of syllables in
    > GB 18030, the vendor mapping tables will have to introduce,
    > instead, the one-to-many mappings to actually intepret the
    > Tibetan syllables as what they are, instead of PUA code points,
    > so you would end up with the following entry in the mapping
    > tables:
    >
    > AAA1 <--> <U+0F40, U+0F74>
    >
    > Both of these scenarios are either in the works right now, or
    > will happen in the not-too-distant future.
    >
    > If you think the mapping tables will just stay pristine and
    > unchanged forever, in the face of such changes, you are smoking
    > something. The *REASON* for making such additions is either to
    > enable or *force* vendors to change the tables.
    >
    > > I think it is extremely important that the mapping of codes
    > > between GB18030 and ISO/IEC10646 stay closed, even if these
    > > codes are still not all assigned to abstract characters.
    >
    > You can think that, but if you mean by "closed" that the mappings
    > stay stable and need not be versioned as either or both of the
    > standards change, then you are flat wrong. It won't happen that
    > way.
    >
    > > It is equally important that China then avoids any attempt to
    > > extend its GB18030 repertoire without first requesting and
    > > getting approval in the ISO/IEC 10646 standard respertoire.
    >
    > It may be important, but China does not come to WG2 asking
    > permission. They are a sovereign entity, and they change
    > their own standards as they see fit.
    >
    > > This is the job of the Ideographic working group and rapporter
    > > to avoid that such event will never occur, by negociating these
    > > amendments with China and with ISO working group.
    >
    > The IRG and its rapporteur have no jurisdiction here. Sure
    > its members and anyone else can get involved in the discussions
    > to try to minimize the potential for damaging changes. But
    > you *will not* be able to prevent changes.
    >
    > --Ken
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Thu Jan 06 2005 - 17:44:07 CST