Re: "Giga Character Set": Nothing but noise

From: Jon Babcock
Date: Sun Oct 15 2000

"Carl W. Brown" <> wrote:

> If you were to start all over again with no interest in
> compatibility with existing code pages, you could drop the preformed
> characters.

Since I've commented about the possibility of using a set of less than
2000 or so characters to represent all Chinese graphs more than once
on this mailing list over the past few years, I'll be brief this time.

Such a system was developed nearly fifty years ago by Peter
A. Boodberg, at the Department of Oriental Languages at the University
of California, Berkeley. His work was based directly on a study of
Chinese sources, especially the Shuowenjiezi Dictionary. I was
fortunate to be able to study under Professor Boodberg during his last
couple years at Berkeley, shortly before his death in 1972. I've
rewritten some of his ideas and placed them on my web site (
under the name of CHA (Chinese Hemigram Annotation). And because it
is difficult to find his original writings on this subject, I intend
to host a few of Boodberg's key 'cedules' soon.

When I first heard about Unicode (probably in late 1991), I naively
assumed that it would employ some version of the Boodberg approach,
i.e., the use of a 'small' subset of Chinese from which the entirety
is composed. But, as has been stated many times on this list, the
preferred approach was to base the Unicode Han repertoire on lists of
precomposed hanzi/hanja/kanji that were actually in use in computers
and, for the most part, were sanctioned by national governments. This
was natural given the fact that the details (and here the details mean
everything) of a system such as the one Dr. Boodberg envisioned were
probably not available to the Unicode people, not were they in use by
any national, commercial, or even academic body. In other words, it
would have meant that such an approach would have had to have been
developed by what came to be known as the Unicode Consortium itself.

Although difficult, I believe that within the decade, the composition
of the Chinese script will be recognized and well-understood, and the
option to treat each of the tens of thousands of Chinese graphs,
including new ones but excluding of course the 300 or so unsegmentable
wen, as a digraph that can be decomposed into hemigrams will be made
available, perhaps even in Unicode.

In the meantime, vis-a-vis Unicode and the Han repertoire, it's a case
of 'get over it'. I had to.


Jon Babcock <>

