Re: GBK, HZ and EUC-TW

From: Thomas Chan (thomas@atlas.datexx.com)
Date: Fri Jan 05 2001 - 12:09:30 EST


On Fri, 5 Jan 2001, Lars Marius Garshol wrote:

> * Thomas Chan
> | One way to find GBK pages is to look for "GB2312" pages (aka EUC-CN)
> | with codepoints outside the EUC ranges. e.g., pages discusing ZHU
>
> Can I take this to mean that it is common practice to use GBK in pages
> and to label them as GB2312?

If they aren't unlabeled, or mislabeled as ISO 8859-1 or CP1252...

> Given that the one is a subset of the other, it sounds as though my
> application really should use the GBK converter both for GBK pages and
> for GB2312 pages.

I'd compare the tables first, e.g., current versions of CP936 have a Euro
that snuck in there that isn't part of GBK. You might want to separate
them anyway for other reasons.
 

> I now have four test pages for GBK, a tiny one for HZ and none at all
> for EUC-TW. Unless someone knows of something I suppose I will have
> to make test pages myself with some conversion tool.

You can get HZ encoded pages from http://www.cnd.org/HZ/Classics/ . You
might have hunt around for ones that include rows of English text for
testing purposes.

I don't know where to get EUC-TW encoded data.

Thomas Chan
tc31@cornell.edu



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT