From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jan 06 2005 - 14:08:55 CST
Philippe,
> Thanks for correcting this refutation by Kenneth.
> So I know that both ISO/IEC 10646 and GB18030 repertoires will be
> amended, but the current statements in the GB18030 standard is that
> its mapping with ISO/IEC 10646 will remain closed
Which is false.
> and compatible with
This probably will hold true.
> all future amendments of ISO/IEC 10646
> (and so, also with Unicode),
> in way similar to the synchronization of the repertoire and assignments
> used by Unicode. From my point of view, both Unicode and GB18030 have
> now a similar policy to remain synchronized with the base ISO/IEC
> 10646 character repertoire.
> This effectively means that this statement implies If China wants to
> standardize in GB18030 some precomposed character that are not in
> ISO/IEC10646, this is possible only within the PUA.
This is false. China can do what it wants in GB18030, and the decisions
that they take will impact implementations, depending on how carefully
they are synchronized or not.
> GB18030 will remain fully compatible with ISO/IEC10646 and Unicode,
> but will add a required mutual agreement about its PUA usage.
False. As I will demonstrate below.
> So in practice, the only extensions allowed for the GB18030
> repertoire is within the PUAs,
False.
> which already have a closed mapping with ISO/IEC 10646 (and Unicode)
> codepoints.
False.
> All other extensions must be first approved and standardized in
> ISO/IEC 10646, before GB18030 can be extended with new characters
> in its repertoire;
False.
> the only alternative would be that China breaks its existing policy
> about its closed mapping between its GB18030 encoding standard and
> ISO/10646 codepoints.
It has done so in the past, and this will happen again in the future.
> This would be very bad news for developers that have to support
> GB18030 in their software, because this would mean specific solutions
> to support GB18030, without the possibility to map it safely to
> ISO/IEC 10646 and Unicode.
At last, something indisputably true!
> This would be a new nightmare for interoperability of
> GB18030-enabled softwares and Unicode/ISO/IEC10646-enabled
> softwares, which would mean that existing softwares that comply
> to Unicode or ISO/IEC 10646 will no more be compatible with the
> required GB18030 standard for China.
Correct. Welcome to the wonderful world of GB18030 support for
China.
>
> If Kenneth thinks otherwise, then he should explain why,
> because it would be a serious problem for those that think
> that their Unicode/ISO/IEC-10646 software will be compatible
> with the required GB18030 standard for China.
O.k.
Example 1:
GB 18030-2000 defines a CJK component at FE90 and maps that
component to U+E854, because that component is not encoded
in Unicode 3.0 or ISO/IEC 10646-1:2000.
Because such PUA mappings for GB 18030-2000 have proven
very problematical in implementations, the characters in
question have been added to 10646 (under ballot currently
in Amd 1 to ISO/IEC 10646:2003). This particular CJK component
is to be encoded at U+9FBA.
And this means that GB 18030 / Unicode mapping tables up
to about March 31, 2005 will contain the mappings:
FE90 <--> U+E854
82359133 <--> U+9FBA
After that time, they will contain the mappings:
???? <--> U+E854
FE90 <--> U+9FBA
82359133 <--> ???? (probably U+FFFD)
Example 2:
China decides to add Tibetan BrdaRten syllables to GB 18030
and map them to PUA characters in 10646.
Well, guess what -- *all* PUA code points in 10646 already
have defined mappings to GB 18030. That means that the addition
of the Tibetan BrdaRten syllables and definition of mappings
will *change* those mappings, and will require changes to the
mappings tables. The only way to avoid that would be for
any GB 18030 additions to be defined at specific code points
currently labelled as empty in GB 18030 but mapped to 10646
PUA code points. For instance:
TIBETAN CHARACTER KA U ==> AAA1 <--> U+E000
That wouldn't change the code point mapping, but... to actually
support the standardization of such a set of syllables in
GB 18030, the vendor mapping tables will have to introduce,
instead, the one-to-many mappings to actually intepret the
Tibetan syllables as what they are, instead of PUA code points,
so you would end up with the following entry in the mapping
tables:
AAA1 <--> <U+0F40, U+0F74>
Both of these scenarios are either in the works right now, or
will happen in the not-too-distant future.
If you think the mapping tables will just stay pristine and
unchanged forever, in the face of such changes, you are smoking
something. The *REASON* for making such additions is either to
enable or *force* vendors to change the tables.
> I think it is extremely important that the mapping of codes
> between GB18030 and ISO/IEC10646 stay closed, even if these
> codes are still not all assigned to abstract characters.
You can think that, but if you mean by "closed" that the mappings
stay stable and need not be versioned as either or both of the
standards change, then you are flat wrong. It won't happen that
way.
> It is equally important that China then avoids any attempt to
> extend its GB18030 repertoire without first requesting and
> getting approval in the ISO/IEC 10646 standard respertoire.
It may be important, but China does not come to WG2 asking
permission. They are a sovereign entity, and they change
their own standards as they see fit.
> This is the job of the Ideographic working group and rapporter
> to avoid that such event will never occur, by negociating these
> amendments with China and with ISO working group.
The IRG and its rapporteur have no jurisdiction here. Sure
its members and anyone else can get involved in the discussions
to try to minimize the potential for damaging changes. But
you *will not* be able to prevent changes.
--Ken
This archive was generated by hypermail 2.1.5 : Thu Jan 06 2005 - 14:13:35 CST