Re: Unicode CJK Language Myth

From: Martin J Duerst (mduerst@ifi.unizh.ch)
Date: Fri May 17 1996 - 14:45:27 EDT

Next message: MHSackett@aol.com: "Remove from list"
Previous message: Misha Wolf: "Java steps beyond the first 256 Unciode chars"
Maybe in reply to: Mark Davis: "Unicode CJK Language Myth"
Next in thread: Kenichi Handa: "Re: Unicode CJK Language Myth"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Ken'ichi Handa answered me:

>I agree. And, I've never claimed that all unifications done by
>Unicode are incorrect. Most of them are ok, I think.

Very nice to hear that from you. In general, most of the
people in Japan I met and discussed about Unicode say
they dislike or hate Unicode. Apart from those cases where
this is due to misunderstandings, it usually turns out that
it is just a small feature or decision they dislike, and that
otherwise, they feel okay with the rest. The problem is
that most of them don't realize that given all the different
requirements from all over the world, and in particular all
the different ways, in particular, of viewing and thinking
about Kanji, there are actually so few points in Unicode
any single person is not exactly happy about.

>>>> as dictionaries, where there is about a 1 to 1 mixture of languages,
>>>> these are distinguished not only by using different glyph shapes,
>>>> but also by using different fonts, usually with different weights.
>>>
>>> Yes. Correct glyph comes first, prefered font (style or weight) comes
>>> next. Unicode fails to send information of variations of CORRECT glyph.
>
>> If you have to mark up the text in a dictionary anyway to guarantee
>> structure and readability, and this will guarantee the display of what
>> you call correct glyph, why do you worry that much?
>
>Sorry, I can't follow your logic. Could you explain it in another way?

Okay, I'll start again. If you look at what we might call "multilingual
typography", then you find mainly two cases, namely:

- The case where a few words from another language are incorporated
        in the text of another language. In this cases, there is no font
        change; glyph differences that may exist between those two
        languages are eliminated in favor of glyphs in the base language.

- The case where there is an abrupt change between different languages,
        such as in a dictionary, in a text to learn a language, or in a
        scientific paper that uses another language only in examples.
        In these cases, there is not just a glyph change, but also a
        font change to make the differences obvious.

For structural and typographic reasons, texts where there are changing
glyph shapes without font changes are virtually non-existent.

So it is fair to conclude that a system such as Unicode, which relegates
glyph differences to be resolved by higher-level information such
as font information, is a very reasonable solution for multilingual
text processing and typography.

>> So what a writer wants to read is "choku", and if the computer
>> at the other end can display it in a form that the reader
>> at the other end can read, the writer will be happy with it.
>> Whether this is a Japanese with a usual Japanese setup,
>> a Chinese that prefers to read Japanese texts with the
>> glyphs (s)he is used to, or by rare chance a Japanese
>> that sees the wrong glyph but might not even notice
>> it in context, is not something the writer usually cares about.
>
>No. Most writers want that a character he enters and sees on his
>display is displayed/printed with one of acceptable variants of glyphs
>on a readers display/printer.

We are saying the same, if we assume that software is
behaving reasonably and is not implemented by beginners.

>> Japanese elementary school, as said above, does not reflect
>> typographic reality and possibility. If you take Japanese elementary
>> school as a standard, there is much more wrong in today's
>> Japanese printed material than the single character in 20'000
>> we are discussing here.
>
>Could you give some example?

Just open your eyes, and have a look at advertisement and logos
around you. If you are more interested, have a look at some books
on Japanese logo design and modern typography. I have some such
books here, but giving you the ISBN number won't help you as they
are somewhat outdated (late 80s) and won't be on sale in Japan
anymore.

>> It's not a bug. Chinese, as far as I know, are familliar with the
>> right variant, although they use the left one more often.
>> For them, it would be very strange to have separate codepoints.
>> And most Japanese won't see the left form anyway, or recognize
>> it immediately if they happen to see it in context.
>
>Hmm, you come to the key point of unification problem. Please ask any
>Japanese if he think it's a bug of something (not knowing exactly
>which of display handler, font, input method, or the character set
>itself) or not if he sees Chinese glyph when he enters `choku' by some
>Japanese input method?

I have mentionned "guessing" techniques and setup scenarios to get
the best glyph shapes before. If a system is not able to conclude that
most probably the user wants to see a Japanese glyph when using
a Japanese input method, please don't blame it on the character set.

>I claimed in many places (but perhaps not in this mailing list) that
>unification of two characters in two different cultures has potential
>difficulty especially in the case of ideograms. If culture A want
>characters X and Y be unified but not with Z, and culture B want X and
>Z be unified but not with Y, what kind of unification is good? I
>think the character X (Y and Z also) for culture A and X for culture B
>are different characters even if they have exactly the same glyph.

This is indeed a potential cause of troubles. Some cases with
this structure indeed exist, but in these cases, the differences
between the shapes X, Y, and Z are so large that they all have
their own codepoints. For any shapes actually unified in Unicode,
I don't know any cases where shapes X and Y could be interchanged
(and therefore unified) in Japanese, but Z would have a different
meaning, but on the other hand, shapes X and Z could be
interchanged in Chinese, but Y would have a different meaning.

The case we are discussing does not have this structure, Z is not
a different character in Japanese, and it certainly does not
require a separate codepoint in Japanese because the glyph Z
can only be seen as a (very rare) variant of the standard glyph
(and thus suitable for unification) or an error (and in this case,
there is no need for it to have a separate codepoint; if we
would code all the errors made in the history of kanji writing,
even UCS-4 might not be enough).

Also, there is no possibility of misunderstanding. If a Japanese
sees the Chinese glyph for "choku" without context, and does
not recogize it, there is absolutely no danger of confusing it
with something else. The only thing (s)he can say is
"sorry, I don't know". That will happen for the majority of
the kanji characters in Unicode anyway. Therefore, a good
system (not yet available, as far as I know) will have something
like baloon help (or what it is called on the various systems),
which will just tell you anything you ever wanted to know about
any character you are interested in. With this, your friend
will not even have to ask you anymore about the meaning
of such a kanji.

Regards, Martin.

Next message: MHSackett@aol.com: "Remove from list"
Previous message: Misha Wolf: "Java steps beyond the first 256 Unciode chars"
Maybe in reply to: Mark Davis: "Unicode CJK Language Myth"
Next in thread: Kenichi Handa: "Re: Unicode CJK Language Myth"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT