Re: kMandarin and kCantonese in Unihan

From: Andrew C. West (
Date: Tue Oct 07 2003 - 10:04:28 CST

On Tue, 7 Oct 2003 21:42:09 +0800, Anthony Fok wrote:
> What is a good place for discussions on these issues? And which
> personnel and which sources are involved with esp. the CJK-Ext-A
> kCantonese data? It would be nice to talk with the original people to
> find out how these errors crept in, e.g. errors of the original source?
> Systematic errors due to mistakes in conversion from e.g. Jyutping to
> Yale? Inappropriate use of "Fanqie"? Other human errors? etc. so
> that we can find a good ways to correct these mistakes.

The latest draft version of the Unihan database (Unihan-4.0.1d1.txt) is
currently subject to public review (see

This forum is a suitable place for discussing the Unihan database, but in order
to ensure that your errata are taken note of you should report them using the
Unicode reporting form ( by
October 27.

The failings of the Unihan database have been the subject of much discussion in
the past, especially the kMandarin field which got rather mangled in Unicode
3.1. Happily the 4.0.1d1 version of Unihan fixes most of the kMandarin problems,
although the quality of many of the provided Mandarin readings still leaves much
to be desired. (The Mandarin readings really need to be completely overhauled,
based on a single authoritative source such as _Hanyu Da Zidian_ ... but that's
just my personal opinion).

> Furthermore, is there something like CVS web or changelogs to see the
> history of modifications of Unihan? (when, by whom, and why, from what
> source, etc.) What other fixes have been done to Unihan.txt since
> 19 June 2003?

There is no public CVS repository, but the various incarnations of the Unihan
database may be downloaded from the "Official Unicode Online Data" site at

I suppose there won't be another release of Unihan until after the public review
period ends at the end of this month.


