Re: GBK, HZ and EUC-TW

From: Mark Davis (
Date: Mon Jan 08 2001 - 11:49:40 EST

In specific cases you may use one character conversion mapping instead of
two, but you should be very careful about that. See, especially "1.2.1 Best-Fit


----- Original Message -----
From: "Lars Marius Garshol" <>
To: "Unicode List" <>
Sent: Monday, January 08, 2001 06:53
Subject: Re: GBK, HZ and EUC-TW

> * Tom Emerson
> |
> | Ken Lunde's "CJKV Information Processing" has a good description of
> | the evolution and interrelationships between the GB standards.
> Actually, I disagree with that. It has a description, but IMHO it
> leaves much to be desired. I can't understand why people keep
> praising this book. You can get the information you need from it, but
> in my experience doing so involves a lot of flipping back and forth,
> several rereadings and some guesswork at the end.
> | As far as mapping tables go, the best one you'll find is the
> | Microsoft or ICU mapping tables. I personally have not seen an
> | official mapping table from GB 13000. As others have noted,
> | Microsoft has extended the "pure" GBK with Euro, and perhaps other
> | code points.
> Hmmm. Does this mean that it is best to support the Microsoft
> extensions, or that it is best not to do so? I guess we will be
> forced to support them sooner or later, and that we might as well do
> it now to save everyone some bother.
> | GB 2312:80 is a proper subset of GBK, so you can map EUC-CN encoded
> | text to Unicode using a GBK mapping table. Be aware, though, that
> | going the other direction can be problematical: GBK can contains
> | code points that do not exist within GB 2312:80, so you need to be
> | careful going the other direction.
> I was thinking of having a single X->Unicode converter for both GBK
> and EUC-CN. I am still uncertain as to whether that really is a good
> idea, though.
> --Lars M.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT