Re: Unicode CJK Language Myth

From: Martin J Duerst (
Date: Thu May 16 1996 - 15:10:51 EDT

Ken'ichi Handa wrote:

> writes:
>> Ken'ichi Handa wrote:
>>> Do you mean that one should setup his environment for his locale? How
>>> a novice user can setup appropriate fonts when he borrows a terminal
>>> of his friend?
>> Well, usually you indeed set up your environment to meet your needs,
>> which are in general represented by your "locale". Of course, you may
>> prefer another setup, e.g. English menu texts instead of localized
>> ones.
>Actually, I set no locale, using the default one which is English.

So do I. But then we both are not novices. And you were speaking
about a novice user above.

>> The problem of setup is a problem of the user interface. In some
>If we have to change some setup just to read a character in the
>correct glyph, it's a mistake of some design. If that is because of
>the charater set being used, it's a mistake of design of the character

If we select some setup anyway, why not just have it display our
preferred glyphs, too? I know quite some people that change all
the colors on their screen, select their preferred fonts, and so on.
The idea that software has to come in one single configuration
that should be used by all users is too sad to be true.

>> They should also affect the priority of font choices for raw, not just
>> for CJK, but also for other things such as Latin languages.
>Yes, but the priority should be changed only among the correct fonts,
>just to select a style. The difference between Chinese and Japanese
>variations for `choku' and many other unified CJK characers in Unicode
>are beyond style, one is correct and the other is incorrect, which is
>correct based on the intention of the writer.

For most characters, the variation of what is to be considered correct
and what incorrect is very unclear. For a trained eye, and on close
examination, some shapes in some fonts look clearly ugly even if
rarely somebody notices it. On the other hand, fancy fonts distort
glyph shapes in ways that would never be accepted by a primary
school teacher, and nevertheless, we cannot call these shapes
wrong, they might be perfectly in sync with the whole font, be nicely
readable, and might transmit just exactly the feeling that the
designer had in mind.
Anyway, school is a particular case. Most children learn exactly
one glyph variant in school, although the governement and the
books for teachers allow quite some variation. It makes the work
of the teachers easier, and helps children to firmly aquire writing.
Many people then think that the way they learned it is the only
correct way, and that there must be a correct way and nothing
else, but typographic reality is (fortunately) quite different.

With this, I don't want to say that the Japanese governement, or
any other relevant document, mention the "Chinese" variant
of "choku" as correct, but any font designer that would want
to design a new font, maybe with some special Chinese touch,
could use such a variation.

>> If you look at how mixed Japanese and Chinese is treated in print,
>> you will realize that in cases where there is a "major" and a "minor"
>> language (e.g. an article about China in a Japanese magazine, where
>> most of the text is in Japanese, and only some names or terms are
>> Chinese) the characters of the minor language are written with
>> glyph shapes used in the major language. In other cases, such
>Yes, there surely exist many printed matter which fails to use the
>correct shape. But, why should we (or Unicode) follow this kind of

You might call it a failure, but it is not. It is accepted typographic
practice, and as most such practices, it has very good reasons.
The same is done in Europe, you would never use a typical
French font for a single word in an English or German text.
Unless it reflects structure, such as in a dictionary, a multifont/
multiglyph hodgepodge looks very bad and is not easily readable.
Unicode is not following a failure, it is helping to do the right thing!

>> as dictionaries, where there is about a 1 to 1 mixture of languages,
>> these are distinguished not only by using different glyph shapes,
>> but also by using different fonts, usually with different weights.
>Yes. Correct glyph comes first, prefered font (style or weight) comes
>next. Unicode fails to send information of variations of CORRECT glyph.

If you have to mark up the text in a dictionary anyway to guarantee
structure and readability, and this will guarantee the display of what
you call correct glyph, why do you worry that much?

>Why do you keep using the word "appropriate" and "different"? The
>current problem is "correct" or "incorrect"? And I believe correct
>glyph is what everyone want.

See above. Correctness is a very relative term. Also, writers rarely
care that much about "correct" glyphs, otherwise most people
would write much more carefully and clearly. A case in point is
radical 162 (shinnyou in Japanese), which has led to endless
discussions in anti-Unicode circles because Han-Unification
unified variants with one and two dots. Many if not most people,
in daily usage, will write all connected, with zero dots, so to say.

What people care about is the content of what they write, and
cases where they have to care about how they write are
mostly discussions about glyph shapes and typography and
such, i.e. discussions not with and by using characters, but
about them. Taking such kinds of discussions into account
when designing a character set would be deadly, because
then you would have to add meta-level after meta-level,
with no end.

So what a writer wants to read is "choku", and if the computer
at the other end can display it in a form that the reader
at the other end can read, the writer will be happy with it.
Whether this is a Japanese with a usual Japanese setup,
a Chinese that prefers to read Japanese texts with the
glyphs (s)he is used to, or by rare chance a Japanese
that sees the wrong glyph but might not even notice
it in context, is not something the writer usually cares about.

>> For single characters
>> or words e.g. of Chinese inside a Japanese sentence, I would
>> not suggest to change glyph style anyway for typographic reasons.
>I repeat, it's not the difference of style, just correct or incorrect.
>If one write the left glyph in the examination of Japanese elementary
>school, he failed to pass the examination.

Japanese elementary school, as said above, does not reflect
typographic reality and possibility. If you take Japanese elementary
school as a standard, there is much more wrong in today's
Japanese printed material than the single character in 20'000
we are discussing here.

>> * *
>>************* *************
>> * *
>> ********* *********
>> * * * *
>> ********* *********
>> * * versus * *
>> ********* *********
>> * * * * *
>> ********* * *********
>> * * *
>>************** **************
>> The character we are discussing here is about the only example
>> that might cause real difficulties in the above rare circumstances.
>If it is known, why don't fix this bug?

It's not a bug. Chinese, as far as I know, are familliar with the
right variant, although they use the left one more often.
For them, it would be very strange to have separate codepoints.
And most Japanese won't see the left form anyway, or recognize
it immediately if they happen to see it in context.

>> The other cases where unification is frequently criticised,
>> such as the "grass" radical or the "bone" radical, do not cause
>> any difficulties even for single characters for an average
>> Japanese or Chinese.
>I admit that most Japanese can understand a character of the radical
>displayed in Chinese font. But "can understand" and "displayed in
>CORRECT glyph" the different thing. Perhaps, most Europeans can read
>a text in which all 'l' letters are shown in '|' (vertical bar), but
>with unpleasant sense.

Most Europeans, if they are not computer scientists, have never
seen such a vertical bar. They will not notice it is supposed
to be something else than an "l". Must Europeans, for the
long time they were using typewriters, were used to have
exactly the same shape for "1" and "l", and for "0" and "O".

Regards, Martin.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT