Re: Unicode 2.0 Unihan standard

From: Kai-hsu Tai (
Date: Tue Jun 10 1997 - 17:28:21 EDT

> Does anyone know about or have an opinion on the comprehensiveness of the
> Unicode 2.0 Unihan standard? What I mean is, are there characters which
> you would like to use, either in your work or personally, that aren't in
> that standard? I, for example, would like to know what the statistics are
> for _Shuowen_ or _Guangyun_ characters in that standard. And what about
> dialect characters and unusual surnames etc. There's nothing like trying
> to type Chinese and not being able to find the words. Talk about writer's
> block! :-)

In addition to the ideological whining and Unihan bashing from the
mainstream C/J/K users we hear so often....

Prof. Robert Cheng of the University of Hawai`i have a few hundred
non-Big-5 (thus many of them are most likely non-Unicode) characters for
the Taiwanese language Holooe (a.k.a. Southern Fujian or Southern Min),
which he collected and organized from different sources. He has contacted
me multiple times asking how to get these characters encoded in a good

Dr. Paul McLean of Toronto also have quite a few non-Big-5 characters he
collected during the compilation of the new translation of the Taiwanese
Hakka New Testament (Bible Society, Taipei, 1993).

As you can see, since I try my best to avoid messing with any Han
characters, I am not sure how many of these characters there are, nor am I
sure exactly how many of them are non-Unicode. However, I'd be glad to
help people get in contact with them if anyone is interested. I will
probably also help establish some channels between them and the mainstream
Han character people in Taiwan during my trip back this summer. (I guess
it'll take a miracle for these Han characters to get onto BMP.)

On a less relevant note, the user communities of both Taiwanese Holooe and
Taiwanese Hakkafa (a.k.a. Hakka) heavily employ mixtures of Latin
characters with diacritics and Han characters. This is impossible in any
common character set (but Unicode) without using higher-level protocols.
Now they are fighting in the ugly mudpit of Microsoft Word, using font
markings as character set tags. Visionaries in the communities are looking
very closely at Unicode. See my paper at for more information.

Khai-su Te

