Re: Unicode-32 = Godot?

From: Timothy Huang (timd_huang@mail.formac.com.tw)
Date: Fri Aug 23 1996 - 22:26:06 EDT


Hello, Dan

>I think that encoding is a separate issue from character set design.
>It seems to me that CCCII/EACC's real reason for using 3 bytes was
>"we need more than 16 bits of info", not "we need to go thru 7 bit lines".

Wrong! Encoding is absolutely related to the character set. Actually,
the number of characters affects how big the coding space shall be, and
therefore how many bytes per codepoint. The relationship, such as
orthographic/variant, polyphonetics, radical-grouping, etc., between
characters will then determine how the coding structure shall be. The
existing standards, including software and hardware standards, will then
be considered. And these were some of the considerations used during the
implementation of CCCII/EACC.

From my understanding of both Unicode and CCCII/EACC, these factors were
taken quite differently. For example, when the Unicode 1.0 draft was
released, the character sequence was just mathematical union of four
local standards. Pure mathematic may be very "beautiful" in some
people's eyes, but in the ideographic world, it does not work at all.
Phonetical, radical, stroke-count, or combination-of-them sequence are
the people here used to.

>Unicode can happily go thru 7 bit lines if you encode it in 3 butes.

If you want to, you certainly can re-encode any standard to any number
of bytes. However, if a character code was designed right, no such extra
process will be needed. Every extra gear in the machine will provide
more chances for problems. Think about this, due to various data
formats, we all already experienced so much trouble of conversions. If
the Internet does not have some sort of character standard, to what
degree we can communicate (provided that both of us are using English)?
I personally can NOT read any pure text file encoded from the DOS/Window
world. I use Mac only.

>I suspect you'll never be happy with Unicode, but that's ok, because systems
>that support Unicode will be good at letting you use whatever character
>set you like. You'll simply benefit from the fact that Unicode, by
>serving as a lingua franca, makes supporting multiple character sets easy.

Wrong again. Actually, I respect Unicode more than the Unicode people
think. I was the very first person translated the very early Unicode
document into Chinese. Did any of the Unicode members thought about
that? If the Unicode wants to be understood by the non-English speaking
people, the consorsium must take the responsibility of making it into
different translations. Also, from my observations, the people worked in
the "mathematic union" spent a great deal of times and efforts -- this
should make the Taiwan CNS and III people feel very shameful, if they
know what's shame means. The Yanks actually did a much better job of
compiling the Chinese ideographs than them (Chinese).

I really hate to say this, but now I will let it out -- Up to now, the
Chinese people the Unicode consorsium deals with are not the right
persons. They are just a bounch of politicians with very bad records of
screwing up during the past (Big-5, GB, TCA, CNS-11643, etc.).

I am almost come to a point of saying: "Experts from abroad, we need
your helps to solve the Chinese character coding problem, because the
Chinese people are incompetent and can NOT really do this for
themselves." But, please understand, I say "I am almost come to". The
time for me to make such a statement is not here yet. There are still
some hopes.

As for supporting multiple character sets -- this can be easily done.
And as a matter of fact, it's done in many cases. Unicode can and is
treated as one of many existing codes.

Smiles,
Timothy Huang



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT