Re: GBK, HZ and EUC-TW

From: Thomas Chan (thomas@atlas.datexx.com)
Date: Fri Jan 05 2001 - 08:16:45 EST


On Fri, 5 Jan 2001 Jukka.Korpela@hut.fi wrote:

> On Fri, 5 Jan 2001, Lars Marius Garshol wrote:
>
> > If anyone has
> > any a GBK-to-Unicode mapping table or knows of example web pages or
> > text documents in any of these encodings, I would be happy to hear
> > about them.
>
> Unless I'm missing something as usual, GBK is a Microsoft extension
> to GB 2312, so the resources at
[snip]
> I tried to find some sample pages in CGK encoding at
[snip]

GBK isn't Microsoft-specific. The problem is that there's no encoding tag
for it, so many people just use "GB2312" or the like, which is a subset.

One way to find GBK pages is to look for "GB2312" pages (aka EUC-CN)
with codepoints outside the EUC ranges. e.g., pages discusing ZHU Rongji
(premier of China) that for some reason or another, use as one of the
three characters in his name one that was discarded during the
simplification process (but still used in "traditional Chinese" locales),
and thus not in GB2312. I've found such pages before, but don't have a
URL now.

I did put together a demonstration of this, at:
http://deall.ohio-state.edu/grads/chan.200/cjkv/zhurongji_name_gbk.html
(page encoding is not explicitly tagged), which should read:

  ZHU Rong ji
  --------------------
  U+6731 U+9394 U+57FA (second character is not possible in GB2312; needs
                        GBK)

  U+6731 U+7194 U+57FA (normal form in GB2312)

  U+6731 U+9555 U+57FA (for reference, just to show off what else GBK can
                        do, over GB2312)

Please check to make sure U+9394 and U+9555 display correctly, as behavior
varies on software that officially only handles GB2312.

Thomas Chan
tc31@cornell.edu



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT