Re: Unicode, Cure-all or Kill-all?

From: Martin J Duerst (mduerst@ifi.unizh.ch)
Date: Sun Aug 11 1996 - 12:28:48 EDT


Timothy Hwang wrote:

>Dear Werner,
>
>Werner LEMBERG wrote:

>> Christian Wittern <cwittern@central.conline.de>, a former employee of the
>> IRIZ in Kyoto, writes in the June 1995 release of the Electronic
>> Bodhidharma:
>>
>> "...there are many cases where CCCII has more than one code point for the
>> same character. When encountering such multidefined characters, the user
>> has to decide which code point to use. Since these codepoints have
>> different semantics, this is a quite impossible task for most input
>> operators...The relationship of orthographic characters and variant forms
>> is very complex and can not be expressed in a fixed, one-dimensional,
>> hard-wried codetable...
>>
>> ...the character glyphs are neither well defined nor consistent..."

>
>Regarding to "more than one code point for the same character": This is
>a misunderstanding. There are some Chinese characters have the "same"
>shape -- they look like the same character, but in fact they are NOT.
>Example: the character Tai2 (a triangle on top of a square). This can be
>the simplified form of Tai(wan), or the variant of Typhoon, AND also as
>an orthographic in Sir. You see the meaning of such a character is very
>important. When the Chinese Character Analysis Group (CCAG) did the
>complilation, they had several top notch Chinese scholars to review the
>characters. The key point I want to say is this, to the Western eyes,
>the shape of a given Chinese character decide everything. However, in
>reality, that is not always right. There are many situations where a
>variant is an orthographic of another character, and vise versa. I
>don't think Mr. Christian Wittern understand this point.

I know that Christian, and many others, understand this point quite
well. It comes up in a very similar form from time to time in the
discussion about unification, but it can also be analized in the context
of a single language and typographic tradition.
Assume I show you the character Tai2 (a triangle on top of a square),
alone. If you can tell me whether this is Taiwan, Typhoon, or Sir,
I will accept that we can use three separate codepoints. But I am
sure you can't. It's similar to taking a single letter and asking
an English speaker how to pronounce it in any word. She won't know.
Paper shows all three meanings of the character as the same, and
only the reader derives the meaning from the context. There are
functions where the computer can do much more than paper can
do, but "meaning" is definitely none of them at the current point
of time, and once it might be in the future, the computer will,
in the same way as the human reader, not need any distinction
on the character level to distinguish overall meaning.

In addition, while for some characters, such as Tai2, many people
know it has different origins, there are other characters where
almost nobody knows about different origins.

Regards, Martin.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT