Re: GBK, HZ and EUC-TW

From: Lars Marius Garshol (larsga@garshol.priv.no)
Date: Mon Jan 08 2001 - 10:15:46 EST


* Tom Emerson
|
| Ken Lunde's "CJKV Information Processing" has a good description of
| the evolution and interrelationships between the GB standards.

Actually, I disagree with that. It has a description, but IMHO it
leaves much to be desired. I can't understand why people keep
praising this book. You can get the information you need from it, but
in my experience doing so involves a lot of flipping back and forth,
several rereadings and some guesswork at the end.
 
| As far as mapping tables go, the best one you'll find is the
| Microsoft or ICU mapping tables. I personally have not seen an
| official mapping table from GB 13000. As others have noted,
| Microsoft has extended the "pure" GBK with Euro, and perhaps other
| code points.

Hmmm. Does this mean that it is best to support the Microsoft
extensions, or that it is best not to do so? I guess we will be
forced to support them sooner or later, and that we might as well do
it now to save everyone some bother.
 
| GB 2312:80 is a proper subset of GBK, so you can map EUC-CN encoded
| text to Unicode using a GBK mapping table. Be aware, though, that
| going the other direction can be problematical: GBK can contains
| code points that do not exist within GB 2312:80, so you need to be
| careful going the other direction.

I was thinking of having a single X->Unicode converter for both GBK
and EUC-CN. I am still uncertain as to whether that really is a good
idea, though.

--Lars M.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT