Re: Unicode CJK Language Myth

From: Martin J Duerst (
Date: Thu May 16 1996 - 05:46:11 EDT

Ken'ichi Handa wrote:

> writes:
>> Not exactly. If you know only Japanese, then your system is
>> with very high probability set up so that it uses a Japanese font
>> ...
>Do you mean that one should setup his environment for his locale? How
>a novice user can setup appropriate fonts when he borrows a terminal
>of his friend?

Well, usually you indeed set up your environment to meet your needs,
which are in general represented by your "locale". Of course, you may
prefer another setup, e.g. English menu texts instead of localized

The problem of setup is a problem of the user interface. In some
cases, this has to be done in cryptic configuration files, and this
will be difficult for a novice to do. However, it is also possible
to have a nice Control Pannel (as on the Mac) or some other
device, to be able to change settings on the fly. Ideally, somebody
should be able to choose a "locale" from a menu, and all the
menu texts, message texts, and so on, of the applications should
immediately change.
Locale changes in general should not affect the data one works on,
just the presentation (i.e. number presestation, date presentation).
They should also affect the priority of font choices for raw, not just
for CJK, but also for other things such as Latin languages.

So your novice friend, who does not understand the Japanese
menus of your mail reader, will just select a Chinese configuration
from a menu.

>All your discussions are based on bilingual environemnt, not on
>multilingual environment. Switching Japanese and Chinese font just
>for reading a plain text? What happens when a Japanese and a Chinese
>communicate with each other in Japanese and Chinese mixed text?

If you look at how mixed Japanese and Chinese is treated in print,
you will realize that in cases where there is a "major" and a "minor"
language (e.g. an article about China in a Japanese magazine, where
most of the text is in Japanese, and only some names or terms are
Chinese) the characters of the minor language are written with
glyph shapes used in the major language. In other cases, such
as dictionaries, where there is about a 1 to 1 mixture of languages,
these are distinguished not only by using different glyph shapes,
but also by using different fonts, usually with different weights.

I do not know exactly what kind of mixture you are thinking
about, but even if you have one sentence in Japanese and one
in Chinese, it is not difficult to write a little piece of code that
detects the languages and displays them with appropriate
glyphs, if this is really what you want. For single characters
or words e.g. of Chinese inside a Japanese sentence, I would
not suggest to change glyph style anyway for typographic reasons.

>And, your statements contains many "will", "high probability", and
>"most probably". There are surely many cases that some of these
>assumptions are false.

The assumptions may be false in some cases. But there is no real
excuse for software bugs and bad implementations.

>> - The difference between the various glyph shapes in this
>> case is one of the largest in Unicode and one of
>> those where the glyphs are most difficult to associate
>> for an uninformed reader. Most differences are much
>> smaller.
>So, anyway, you also admit that Unicode has unified some characters
>which may cause difficulty in readability.

Given a combination of rather rare circumstances, namely:

- Out of context, single character
- Font not matched to reader's preferences
- Reader only familliar with Japanese

this may indeed happen. But the chances that readability is
endangered by other factors, which have nothing to do with
Unicode, is indeed by many magnitudes larger. These factors
are mainly:

- No appropriate font installed, so that display is not possible.
- Character not known to reader.

The character we are discussing here is about the only example
that might cause real difficulties in the above rare circumstances.
If anybody thinks there are others, please tell us.
The other cases where unification is frequently criticised,
such as the "grass" radical or the "bone" radical, do not cause
any difficulties even for single characters for an average
Japanese or Chinese.

Indeed, many readers will easily identify a lot of differences
that are larger than those unified in Unicode. One example
is the (Chinese) simplified form of the radicals for "door" and
"to speak". These and other abbreviations have been routinely
used in Japanese scientific publications at the time camera-ready
manuscripts for conferences still had to be written by hand
(which goes well into the 1980s). Similarly, in Hong Kong,
where traditional characters are used as in Taiwan, one often
comes across simplified forms e.g. of the thread radical in
advertisements and the like.

So one might even take the position (which I don't) that
unification in Unicode should have gone further!

Regards, Martin.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT