Re: Unicode & Han

From: Edward Cherlin (cherlin@snowcrest.net)
Date: Sat Aug 10 1996 - 14:43:25 EDT


>Dear Michael,
>
>I would like to be corrected if you can provide me more latest
>information I don't know of. May I ask, as far as you know, which
>version of Unicode, 2-byte or UTF-16 or whatever, is been used or under
>implementation by any computer company?

There are numerous implementations of Unicode in operating systems
(Microsoft, Apple, IBM, and others) and in programming tools and libraries
(C language standard, IBM APL2, and others) and in applications (word
processing, E-mail, Web browsers, database, and others). See
http://www.unicode.org. In each case that I have seen, Unicode data are
represented in the standard two-byte form for data interchange (Unicode
text files, for example) and also in one or more other formats for data
compression, transmission by 7-bit E-mail, and so on. These encodings are
not separate implementations. They are representations of some subset of
the same set of code points, defined as Unicode, which in turn is defined
as Plane 1 of ISO 10646.

>As far as my statement on Microsoft looking for other coding scheme, if
>you can read Chinese computer news, you will know I am LIVE RIGHT. The
>captain is abandoning the ship. Why? Because the coding structure and
>the implementation of Unicode are DEADLY WRONG.

Citations or quotations, please. (Some of us can read Chinese computer
publications, but we have to know which ones you mean.) We can't tell what
you are talking about. We all know that Microsoft is using a variety of
coding schemes for character sets, and further encoding schemes for fonts
(since Unicode, by design, does not encode variant glyphs and ligatures).
This by itself does not mean that Microsoft is abandoning Unicode. We all
know that Microsoft is making its own plans for dealing with characters not
included in the Unicode repertoire. This means nothing also. We are all
doing that. What is Microsoft doing that indicates a desire to replace,
rather than supplement, Unicode?

>For example, can anyone
>tell me what is the definition of a Character? And what is a glyph?

Unicode defines a character pragmatically. Any code point in any national
or international character code set standard, and any distinctive character
in any very widely used pi font (such as Zapf Dingbats and the Computer
Modern math fonts) is a character. If one of the standards is in error,
Unicode does not presume to correct it. It includes the erroneous
characters in order to maintain complete character set compatibility, and
then notes that their use in new text is deprecated. A glyph is any visual
representation of any character. Glyphs are defined in fonts, not in
character code sets.

>In
>version 1.0 of Unicode book, code 337B ~ 337F, can these be called as
>characters? If so, I can give you many more examples in Chinese.

These are the condensed two-character Japanese reign names and the
condensed four character form for kabushiki kaisha (corporation). They are
in Unicode because they are in some Japanese standard. There are many more
examples in Japanese that are not in Unicode because they are not in any
standard. Likewise for your Chinese examples.

>And
>then, why the Japanese emperor's names can be coded, but not the
>Chinese? In Chinese history, there were more than 500 emperors, some had
>more than one name.

As far as I know, the Japanese put them in a standard, and the Chinese didn't.

>Why the Wester Chess symbols were coded as
>characters?, but not the Chinese Mar-jhon? Cultural superiority?

Unicode has gone to great lengths to avoid cultural bias. Chess is in
because it is available in character sets used world-wide. I don't know why
the Ma chiao tiles (sorry, different dialect) is not in. Has it been
proposed? I am sure it is not in any standard, and I have never seen it in
a commercial font (I have looked).

>I think
>the root of the problem is that the Unicoder DOES NOT understand what is
>a character. And this is the deadly vital problem. And in my opinion,
>until the Unicoders start to respect different culture and language,
>they won't be able to do the coding right.
>
>Smiles,
>Timothy Huang

I get annoyed when people tell me I don't respect their culture. I am a
Buddhist, and I read Chinese. I have been to monasteries in Singapore and
Malaysia, though not long enough to learn Hakka. I also listen to Chinese
music, play wei-chi, and cook Sichuan style. I have played the part of
Pigsy in The Journey to the West (Monkey). Chinese culture (along with
several others) is *my* culture. Many others in the Unicode movement have
as good or better credentials for appreciating and respecting various
aspects of Chinese culture.

I am well aware that many of the characters I need are not in Unicode.
However, I have taken the trouble to find out what Unicode proposes to do
about that problem. The answer is ISO 10646. Unicode is meant to include
every character in every script

*** that is in an existing standard, or is in _widespread_ modern use. ***

ISO 10646 is meant to include every character in every language ever.

It is estimated that there are more than 200,000 characters that someone
might want in a character set, but fewer than 250,000. Unicode cannot
handle every possible requirement, but the upward migration path is clearly
defined. ISO 10646 will include every character in every published Chinese
text from the last 4,000 years, as defined by some group of scholars who
will go over all the dictionaries, all the dynastic histories, all the
classics, and perhaps even all the cast bronzes and inscribed oracle bones.
Actually, I exaggerate. They will leave out a few neologisms from the last
decade or two.

So far, all the complaints you have set before us are nitpicks and niggles.
You cite a handful of characters here and there, but you do not tell us
what principles we are supposed to be violating. You are blaming Unicode
for errors made by the standards bodies of China and Taiwan. Not guilty,
your honor.

Edward Cherlin Helping Newbies to become "knowbies" Point Top 5%
Vice President http://www.newbie.net/Mentors/Cherlin of Web sites
NewbieNet, Inc. Everything should be made as simple as possible,
cherlin@newbie.net __but no simpler__. Albert Einstein



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT