Re: GBK, HZ and EUC-TW

From: kline_s@cup.hp.com
Date: Wed Jan 10 2001 - 22:12:21 EST


> Lars Garshol wrote:
>
> * Tom Emerson
> |
> | As far as mapping tables go, the best one you'll find is the
> | Microsoft or ICU mapping tables. I personally have not seen an
> | official mapping table from GB 13000. As others have noted,
> | Microsoft has extended the "pure" GBK with Euro, and perhaps other
> | code points.
>
> Hmmm. Does this mean that it is best to support the Microsoft
> extensions, or that it is best not to do so? I guess we will be
> forced to support them sooner or later, and that we might as well do
> it now to save everyone some bother.

As others have already indirectly noted, the problem then is the Euro
is thus "double-defined" within GBK at code points GB 0x80 and GB 0xA2E3.
Consequently, round-trip conversions between GBK and the Unicode
0x20AC Euro are thereby not possible without some form of data
code value transformation on the return for one of these two GBK values.

The one alternative is to distinguish between the two forms of GBK,
supporting two forms of conversions - one to cp936 and the other to
"pure" GBK.

---

Out of curiosity, what does GB-18030 define for the Euro? Does it define both a single-width and a double-width form?

If so, does it include any reference to how interoperability should be handled in conversions with Unicode (or for that matter, any character set which defines a single code value for this character)?

(Lastly, throwing a lighted match onto gasoline...) If two forms are specified in GB-18030, should Unicode consider adding another code point in the fullwidth variant region to accomodate this?

- Sue



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT