Re: Unicode CJK Language Myth

From: Martin J Duerst (
Date: Wed May 15 1996 - 12:54:28 EDT

Ken'ichi HANDA writes:

> writes:
>>> I was unaware that both `choku' in "choku-setsu (in Kanji)" and
>>> `zhi' in "yi-zhi (in Simplified Hanzi)" derived from the same
>>> character (U+76F4 in Traditional Hanzi) until I saw the word "choku-setsu"
>>> written in a Chinese-first Unicode font because their glyphs are quite
>>> different. A Japanese who doesn't know Chinese could fail to identify
>>> the character if it is written in a Chinese font.
>> The important measure of legibility is legibility in context --
>> not identification of isolated glyphs out of context.
>You ignore the fact that Chinese characters are ideograms, not
>phonograms, so each character has some meaning by itself. I
>(Japanese) have a chance to get a mail from some American friends as
>follows in English (but `choku' is represented by Unicode character):
> "What't the meaning of `choku', how to pronounce it?"
>If `choku' is displayed by a correct Japanese font, I can give an
>answer to him, but if `choku' is displayed by a Chinese font, I may
>fail. Since the mail is in English, there's no way to find that
>`choku' should be displayed in Japanese font. Unicode by itself seems
>to be of no use in multilingual text (at least, for CJK characters).

Not exactly. If you know only Japanese, then your system is
with very high probability set up so that it uses a Japanese font
for that character. If you see the character in a Chinese font,
then most probably you also know Chinese. And if you know
both Japanese and Chinese to some extent, you will have noticed
that the Japanese and the Chinese form are actually the same
character, so you will again have no problem.

Should it happen that you only know Chinese, you will have
installed a Chinese font, will recognize the character,
and will be able to tell your friend what the character means
(with the problem that there might be smaller or larger
differences in meaning depending on the character,
but taking advantage of the nature of ideograms).

In addition, if you know some of the history of characters,
you will (as a Chinese) know that the Japanese form is
very close to the original in seal script, whereas the
Chinese form is a later invention. As a Japanese, you
will know the character "shin" (used in combinations
such as massugu, makkaka, and so on), where the original
form underwent almost the same simplification as
for the other character in Chinese.

In more general terms:

- The difference between the various glyph shapes in this
        case is one of the largest in Unicode and one of
        those where the glyphs are most difficult to associate
        for an uninformed reader. Most differences are much

- Readability counts mostly in context, but there is no
        real problem with readability out of context. Anyway,
        I guess there are many characters that are difficult
        to explain (let alone to give the correct reading in
        Japanese) if they appear alone, even on paper.

- It is in general very difficult to know whether two glyph
        shapes belong to the same character (in the sense of
        having the same historical origin and meaning) or not.
        One always finds out new things about CJK characters,
        one never knows all. All Unicode can do here, and for
        which it does a remarkably good job, is to avoid producing
        more errors than if traditional means (i.e. ink on paper)
        are used.

Regards, Martin.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT