Questions and answers about Unicode vs Chinese Standards

From: Steven Kang (CZKBC@CUNYVM.CUNY.EDU)
Date: Tue Mar 28 1995 - 14:58:07 EST

Next message: Steven Kang: "Unicode vs GB 2312"
Previous message: Steven Kang: "Questions about taking part in the forum"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

To experts:

I posted my first few questions about Unicode vs ISO 10646-1 and got very
satisfactory answers. My thanks goes to Gary, Tom, and Heinz.

Now I have some further questions about Unicode vs Chinese standards.
Again please accept my thanks in advance for any answers for my questions.

1. Unicode is carefully designed to be consistant as much as possible
   with the currently existing coding standards. How consistant is it
   with the mainland Chinese character coding standard GB 2312. Is it
   possible to convert between the two standards by simply adding or
   substracting an offset. Or does the conversion has to be done by looking
   up a conversion table?

If the oddering of the characters in GB 2312 has been changed in Unicode,
how many has been changed?

2. Is the Chinese character coding in Unicode mainly based on GB 2312 or
any other Chinese character standards in Taiwan or Hongkong, etc?

3. What are the Chinese character coding standards in other Chinese character
   using regions in Asia (like Taiwan, Hongkong and Singapore) ? How
   consistant is Unicode with these standards so far as Chinese characters
   are concerned. Can the conversion between Unicode and these codes be done
   easily?

Steven
=======================================================================
Answers:
1.
Return-Path: <@CUNYVM.CUNY.EDU:uu1014!trial.raf.com!maverick@UU9.PSI.COM>
Date: Mon, 27 Mar 95 10:07:54 PST
From: maverick@trial.raf.com (Tom Fruchterman)
To: CZKBC@CUNYVM.CUNY.EDU
Subject: Re: Unicode vs Chinese coding standards

Dear Steven,

I think you should pick up a copy of "Understanding Japanese
Information Processing" by Ken Lunde. Although it has a concentration
on Japanese, it still discusses the other standards. Of course, all
the answers are also in "The Unicode Standard" Vol 2. Here are answers
to your questions in brief.

1, 2, 3. Unicode is not primarily based on any one of the standards,
because these standards are far more complex than latin character
standards. You can't just take one of the standards, add an offset,
and tack the missing characters on the end. There are programs that do
conversion for you -- One is called tcs and is from AT&T. There are
also tables at ftp.unicode.org.

The subtle politics of it I don't quite understand, but what you
won't read in these books is that the Unicode standard is biased
towards Chinese. I can't give you the technical explanation of it, but
there is a Japanese way of constructing standards, teaching the kids
and so on, and there is a Chinese way. They had to pick one and the
Japanese gripe about it.

Tom Fruchterman
RAF Technology

======================================================================
2.
Return-Path: <@CUNYVM.CUNY.EDU:sandstro@CAC.WASHINGTON.EDU>
Date: Mon, 27 Mar 1995 10:41:22 -0800 (PST)
From: Bob Sandstrom <sandstro@cac.washington.edu>
To: Steven Kang <CZKBC@CUNYVM.CUNY.EDU>
Subject: Re: Unicode vs Chinese coding standards

On Sat, 25 Mar 1995 unicode@Unicode.ORG wrote:

> 1. Unicode is carefully designed to be consistant as much as possible
> with the currently existing coding standards. How consistant is it
> with the mainland Chinese character coding standard GB 2312. Is it
> possible to convert between the two standards by simply adding or
> substracting an offset. Or does the conversion has to be done by looking
> up a conversion table?

Conversion has to be done by looking equivalents up in a conversion table.

There is the question of what to do with Unicode code points that have no
GB2312 equivalent.

> If the oddering of the characters in GB 2312 has been changed in Unicode,
> how many has been changed?

The GB 2312 ordering is, roughly:
punctuation, numerals, Latin, Kana, Greek, Cyrillic, Chinese phonetic,
lines, 3755 post-1960 Chinese characters in pinyin order, 3008
post-1960 Chinese characters in radical/strokes order.

Unicode covers a much richer set of scripts. The ordering of
Han characters in Unicode uses radical/strokes order only,
not a mixture of radical/strokes order and pinyin order.

> 2. Is the Chinese character coding in Unicode mainly based on GB 2312 or
> any other Chinese character standards in Taiwan or Hongkong, etc?

The Chinese character coding in Unicode is referred to as ``Han''
character coding, because it includes those Japanese and Korean characters
that come straight out of the Chinese writing tradition, as well as
Chinese characters. The Han character coding in Unicode is not mainly
based on GB 2312; rather, it is based on a large number of character
standards in Mainland China, Japan, Korea, Taiwan, Hong Kong, and North
America, including GB 2312.

> 3. What are the Chinese character coding standards in other Chinese character
> using regions in Asia (like Taiwan, Hongkong and Singapore) ? How
> consistant is Unicode with these standards so far as Chinese characters
> are concerned. Can the conversion between Unicode and these codes be done
> easily?

Taiwan uses a mixture of character coding standards. Almost everybody uses
the family of several conflicting standards known as BIG5, but the
government advocates another standard known as CNS 11643, and certain
agencies who have to deal with people's names and birthplaces use CCCII.
GB2312 can be found. Some Taiwan computer researchers are working on other
standards.

Hong Kong uses a mixture of everything that Taiwan uses, plus
everything that Mainland China uses. That is, you'll find conflicting
versions of BIG5, and you'll find GB 2312.

Singapore is like Hong Kong, but with a heavier emphasis on
Mainland Chinese usage, and a lighter emphasis on Taiwan usage.
That is, you'll find mostly GB 2312.

Unicode is consistent with the standards listed immediately below.
GB 2312-80
GB 12345-90
GB 8454-89, CNS 11643 (1st plane)
CNS 11643 (2nd plane)
Japanese standards
Korean standards

Unicode and a subset of BIG5 covering 13,053 Chinese
characters can be reconciled quite well.

> Steven

Bob Sandstrom
U of Washington
=================================================================
3.
Return-Path: <@CUNYVM.CUNY.EDU:capteo@PARACEL.COM>
Date: Mon, 27 Mar 95 11:02:34 PST
From: capteo@paracel.com (Eric M. Olafson)
To: czkbc@cunyvm.cuny.edu
Subject: uni-bounce

As you appear to be bouncing questions all over everywhere on the
UNICODE forum, it might be a good idea to post the answers to your
questions as well. The line "thanks goes to Gary, Tom, and Heinz" is
very nice, but you appear to be operating as input-only.

Do you have any access to the World-Wide Web (via an HTML browser, such
as MOSAIC)? If you do, you can get quite a bit of info at the site
http://www.unicode.org. They even have far-east mapping tables, for
Chinese, Japanese, and Korean, I think.

If you need to get a browser, you can ftp one from ftp.ncsa.uiuc.edu.
Look for MOSAIC, it's free... though commercial browsers such as Netscape
are better ($39 a copy).

Mail me at capteo@paracel.com if I can be of any assistance.

Next message: Steven Kang: "Unicode vs GB 2312"
Previous message: Steven Kang: "Questions about taking part in the forum"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:32 EDT