Hong Kong Characters / WAS: [long] Use of Unicode in AbiWord

From: Dirk Meyer (dmeyer@adobe.com)
Date: Thu Mar 25 1999 - 20:36:45 EST

[Back from the conference in Boston, having no email access there, I found
out that there was something interesting hidden behing this strange subject
:-) .]


you definitely deserve some support here (and given the bandwidth this
thread already had, I want to keep my remarks short):

Rick and Chris,

Rick's observations and statements with regard to the Hong Kong specific
characters are correct. GCCS is reality in Hong Kong. The Government of the
Hong Kong SAR made support for their character set a condition for buying
products from font vendors, for example. Monotype Hong Kong, Dynalab,
Arphic Technology have products in place, which support the HKGCCS, parts
of it, or a different coded character set, sometimes OS-specific.

In a nutshell, when it comes to GCCS, we are looking at 1471 characters not
yet in Unicode, partly not even in the CJK Extension 1.1, the 6000+
characters of which will be part of Unicode 3.0. [I might have understood
John Jenkins incorrectly during the conference, but some of them might not
even be in the SuperCJK set.]

Last year, I did some research on these things, which resulted in an
article in Multilingual Computing & Technology (#19 Volume 9 Issue 3)
<http://www.multilingual.com/>, and in the character collection
Adobe-CNS1-1 (used in CID technology, containing 3309 characters), which
supports a superset of the characters contained in different Hong Kong
specific collections. All material I used is listed there, if you have
specific questions I am happy to share my information or to forward a PDF
of that article.

Immediately adding to the information Chris has given so far more stuff
[sometimes hard to find, but not at all private]:

1) <http://www.juxian.com.hk>, a publisher who distributes Yeong Tze-loi's
"Big character dictionary of standardized Chinese input methods", very
useful and including the complete GCCS (ISBN: 962-436-287-4).

2) <http://www.cmex.org.tw/service/cmex/project/htm>, this web-site
provides official, or at least as official as a semi-official standard like
Big-5 can make it, information about Big-5, including bitmap font files.

3) <http://www.dynalab.com>, the web-sites of this Taiwanese font vendor
include lots of information about both its own Hong Kong character
collection and the official GCCS. Documents containing the collections can
be viewed and printed using Dynalab's proprietary Dynadoc technology.

All other stuff has been mentioned throughout the thread.

No reason to doubt, Rick.




While we can discuss what characters "deserve" immediate support in
Unicode, we will never be in a position to discuss whether there is
something like a character "more or less important", "valid",
"historic-only", or whatever [insert your favorite one].
To make it clear, the latter fortunately has never been the Unicode
approach and I am convinced that it will never be, this has been made very
clear during this year's conference. I am glad about that.

>-----Original Message-----

At 6:38 PM -0800 3/23/99, Chris Pratley wrote:
>Sure. Check out http://www.info.gov.hk/gccs/ and related sites. For an
>example of a site that needs these chars, check out the Mingpao newspaper:
>I'm not sure if the list of chars is publicly available from the HK Gov't,
>but it should be. They are also working on a new version that adds an
>additional 1000 characters, most of which are already in Unicode, but a few
>which are not. You can download the package - I believe it has a character
>picker tool for the new characters. I have seen the list before, and the
>cross-check results with Unihan (incl Ext A) are believable IMHO.
>As for the Taiwan Gov't, I don't know if their info is on the web. I do know
>that their database cannot be supported in Unicode yet (even with Ext A),
>which is why there is an effort to add additional chars in Ext B and later.
>Obviously the SuperCJK effort from the Chinese Government and other similar,
>recent governmental efforts to get all known Chinese characters encoded is
>driven by a desire to preserve culture, and to represent things on computers
>like classical literature which may use ancient forms no longer in use. Like
>you, until last year I assumed that the chars not in unihan now were not
>used in modern times. However, that assumption is proving wrong especially
>in the case of Hong Kong, whose characters were apparently left out of the
>national standards of both the PRC and Taiwan, and therefore didn't make it
>into Unicode when Unihan was created based on super-setting existing
>national standards. Fortunately the governments now seem to be aware of the
>importance of cataloguing and encoding these characters, so progress is
>finally being made.


Dirk Meyer CJKV Type Development ----------------------------------------------- Adobe Systems Inc. dmeyer@adobe.com 345 Park Avenue M/S W8-322 voice: 408-536-4387 San Jose, CA 95110-2704 fax: 408-537-4008

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT