Re: Unicode CJK Language Myth

From: David Goldsmith (
Date: Wed May 08 1996 - 20:16:23 EDT

>A misunderstanding about Unicode Han Unification seems to be prevalent in
>certain circles. For example, a recent IAB character set document contained
>the following quote, which implies that Unicode is not usable for CJK without
>additional language information.

I believe the reason that this misunderstanding persists is that Unicode,
Inc. has not been agressive about disseminating its viewpoint on this
issue. Even the official web page covers this issue in a neutral way, not
making the points Mark made in his message.

To address this issue, I think the Cosortium should write a white paper
which discusses Han unification in a detailed way, covering all the
points Mark has mentioned, and with extensive illustrations, showing:

1. The actual extent of the differences between Japanese and Chinese
variants of unified Han characters,

2. The stylistics variations present within the Chinese and Japanese font
industries for the same characters,

3. Sample passages of Japanese and Chinese rendered with "correct" and
"incorrect" fonts,

4. Statements from Chinese and Japanese typographic and Han character
experts supporting the Consortium's position, to debunk the notion that
Han unification is a plot (or ignorant mistake) on the part of Western
software companies.

This white paper should be distributed extensively and put on the
Cosortium's WWW site, and sent directly to key people in the appropriate
standards organizations (such as the W3 consortium and the IETF). It
should also be translated into Japanese and distributed in Japan.
Opponents of Unicode have already written papers attacking Han
unification and distributed them; the Consortium should be at least as
agressive. Maybe it needs to hire a lobbyist ;-).

On another topic, I think part of the reason it has taken the Internet
community so long to consider using Unicode is because the Consortium has
not been agressive about pushing Unicode on the Internet. It's all been
done as volunteer efforts by disconnected people. Progress is starting to
occur, but it's taken a while, and people are still leery about
supporting Unicode in Internet products. The adoption of Unicode by Java
has helped more than any other recent event (never mind that the current
Java distribution truncates it all to 8859-1...). The Internet standards
process is driven by implementations; with the dearth of Internet
software that uses Unicode, the community has been reluctant to accept it.

The fact that Unicode is starting to make headway in the Internet
standards process is fine, but what will really drive it forward is to
have mainstream mailers, browsers, and servers that support Unicode and
take advantage of it.

This is my 2 cents, anyway.

David Goldsmith
Senior Scientist
Taligent, Inc.
10355 N. DeAnza Blvd.
Cupertino, CA 95014-2233

