Re: Errors in Unihan?

From: John Jenkins (
Date: Tue Nov 14 2000 - 12:42:24 EST

On Tuesday, November 14, 2000, at 08:24 AM, Pierpaolo Bernardi wrote:

> In the Unihan.txt database, in the kMandarin field there are entries
> with duplicate pronunciations. For example:
> U+4E21 kMandarin 1 LIANG3 2 LIANG3 3 LIANG4
> U+4E4E kMandarin 1 HU1 HU2 2 HU1
> U+4E86 kMandarin 1 LIAO3 2 LE LIAO3
> Is there a reason for these duplicates? If this is the case, the
> format of this field should be documented better in the header. If
> these duplications are errors, I can supply a list of them.

That would be very helpful, yes.

> Also, what's the meaning of the isolated numbers?

The value of the field was obtained from dictionaries. When a dictionary provides more than one meaning, it is not infrequent that one pronunciation is specific to a particular meaning and another pronunciation specific to another. This is where the numbers come from.

Inasmuch as the database doesn't maintain the link between specific definitions and pronunciations, the isolated numbers should also be removed.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:15 EDT