kKangXi and kIRGKangXi fields in Unihan

From: Leonardo Boiko <leoboiko_at_gmail.com>
Date: Wed, 23 May 2012 16:57:39 -0300


As you know, the Unihan database has two fields listing indexes for
the Kāngxī Zìdiǎn dictionary, kKangXi and kIRGKangXi (where IRG is the
Ideographic Rapporteur Group). If I’m counting correctly, 106
characters have values only in kKangXi, while 49300 have a value only
in kIRGKangXi . The remaining usually have the same value for the two
fields, but they differ in 252 cases.

Earlier[1] someone asked about which field was correct when there’s a
conflict. John H. Jenkins replied that “whichever one has the correct
data is the correct one. :-) ”, and invited help in finding errors.

Well I wanted to help, but I can’t read Chinese properly so I have
trouble validating the characters in the Kāngxī (I can recognize them
visually, but without understanding the definitions I might mistake
some Z-variant or something). However, after a few Emacs macros, I
came up with this simple HTML form to help check which one is correct:


The first link lists conflicting pairs where at least one of the
indexes claim the character is actually present in the Kāngxī, while
the other lists the remaining “virtual” indexes. Each pair is listed
with links to the relevant Kāngxī pages (courtesy of the online
edition[2]), and a link to Unihan. Once the form is submitted, it
makes a list of the entries chosen as correct by the user. The
results are shown in plain text, and it should be simple to compare
several tries for double-checking.

I don’t know if there’s interest in such a thing at the moment, but if
so, there you go. All values apply to Unihan data downloaded a week
ago or so.

Leonardo Boiko
[1] http://unicode.org/mail-arch/unicode-ml/y2007-m03/0014.html
[2] http://www.kangxizidian.com/
Received on Wed May 23 2012 - 14:59:27 CDT

This archive was generated by hypermail 2.2.0 : Wed May 23 2012 - 14:59:28 CDT