On Fri, 5 Jan 2001, Lars Marius Garshol wrote:
> * Thomas Chan
> | One way to find GBK pages is to look for "GB2312" pages (aka EUC-CN)
> | with codepoints outside the EUC ranges. e.g., pages discusing ZHU
> Can I take this to mean that it is common practice to use GBK in pages
> and to label them as GB2312?
If they aren't unlabeled, or mislabeled as ISO 8859-1 or CP1252...
> Given that the one is a subset of the other, it sounds as though my
> application really should use the GBK converter both for GBK pages and
> for GB2312 pages.
I'd compare the tables first, e.g., current versions of CP936 have a Euro
that snuck in there that isn't part of GBK. You might want to separate
them anyway for other reasons.
> I now have four test pages for GBK, a tiny one for HZ and none at all
> for EUC-TW. Unless someone knows of something I suppose I will have
> to make test pages myself with some conversion tool.
You can get HZ encoded pages from http://www.cnd.org/HZ/Classics/ . You
might have hunt around for ones that include rows of English text for
I don't know where to get EUC-TW encoded data.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT