kMandarin and kCantonese in Unihan

From: Anthony Fok (
Date: Tue Oct 07 2003 - 07:42:09 CST

Re: Errors Chinese pronunciations in Unihan

In Unihan-4.0.1d1b.txt:

U+4C5B kMandarin XU4M

The trailing "M" is extraneous. I do not know about the actual
pronunciation of the U+4C5B character, however. :-)

The Cantonese pronunciations of characters in CJK Extension A seem
problematic. There seems to be a _consistent_ (?) mix-up of "AA" and
"A" (long "a" and short "a"). There also seems to be an _occasional_
(?) mix-up of "J" and "Y" (probably due to the confusion between Yale
and Jyutping romanization?).

For example, if U+3400's kDefinition claims that it is same as U+4E18,
then it should be pronounced as "YAU1", not "JAAU1". (I have no idea
about the "KAAU1" reading.)

U+3558 shows another error. It is listed as "CHAM1 SAM1". Here, only
CHAM1 is incorrect; it should be listed as "CHAAM1 SAM1" instead. SAM1
here means Ginseng. Hmm... speaking of which, its more conventional
forms (U+53C2, U+53C3, U+53C4) are missing the "SAM1" pronunciation as
well as the corresponding "Ginseng" definition!

On the other hand, some "J"s are correct, e.g. "JUNG3" for U+343A.

Some kCantonese pronunciations are joined together. For instance, the
following grep command yields:

$ grep kCantonese.*[0-9][A-Z] Unihan-4.0.1d1b.txt
U+36D3 kCantonese CHI1HEI1 DOU1
U+36DB kCantonese SAAN1DZAAN3
U+3851 kCantonese HAU1DZIU2
U+3997 kCantonese GAAM1GAAM3 KAAM4 NAAP1
U+3BA7 kCantonese WU1WAAT1
U+3C04 kCantonese JIN1DZIN3
U+3C7E kCantonese GOI1HOI1
U+3C80 kCantonese DAAI2 JAAN1DZEUN1 SAAN4
U+3C8E kCantonese DAAU1 LAAU4 SYU1JYU4
U+3CD9 kCantonese GYUN1JYUN5
U+3DD1 kCantonese JAAN1 JIN1 SEUNG1NIM6
U+3E62 kCantonese GA1GO1
U+3F39 kCantonese HONG1HONG1
U+4003 kCantonese DEUI1SEUI1 TEUI4
U+4050 kCantonese JING1JING3
U+4053 kCantonese JUNG1GAI3
U+4167 kCantonese JAAM1JAAM3 JIM3
U+4185 kCantonese CHI4 JI1DAIK1
U+423E kCantonese SAU1SOK3 SE3
U+441F kCantonese HONG6 NGAAU1GONG2
U+4492 kCantonese JAAU4 JIU5 SEUI1WAAI2 TIU4
U+44D6 kCantonese KEA1WU4 KUNG4
U+4543 kCantonese JAAM1JAAM3
U+4CC9 kCantonese DUNG1DAM1 DUNG6

I also caught the following error by chance:

U+4C8E kCantonese NEOYU5

What is a good place for discussions on these issues? And which
personnel and which sources are involved with esp. the CJK-Ext-A
kCantonese data? It would be nice to talk with the original people to
find out how these errors crept in, e.g. errors of the original source?
Systematic errors due to mistakes in conversion from e.g. Jyutping to
Yale? Inappropriate use of "Fanqie"? Other human errors? etc. so
that we can find a good ways to correct these mistakes.

Furthermore, is there something like CVS web or changelogs to see the
history of modifications of Unihan? (when, by whom, and why, from what
source, etc.) What other fixes have been done to Unihan.txt since
19 June 2003?

Many thanks!

Anthony Fok

Anthony Fok Tung-Ling
ThizLinux Laboratory   <>
Debian Chinese Project <>
Come visit Our Lady of Victory Camp! 

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST