L2/09-175

Report on Problems in Unihan Charts

Date/Time: Thu Mar 19 12:51:43 CST 2009
Contact: mamandel@ldc.upenn.edu
Name: Mark A. Mandel
Report Type: Error Report
Opt Subject: unlabeled and misplaced data in Unihan charts

Problem area: the "Chinese compounds" tables of the Unihan data charts, for those characters that have such a table.

Problem #1: There are no titles on the columns.

The first column in each row shows the compound. The second, when not empty, is a transcription, presumably of Mandarin Chinese. The third column, when not empty, may contain transcriptions that are not Mandarin, or sometimes something else. The fourth contains a gloss or definition.

I have not been able to find descriptions of these data columns on the website, and it should not be necessary: they should have headers like the other tables in these charts.

Example:

U+4e00 一 http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=4e00&useutf8=true

1: 八紘一宇
2: ba hong2 yi4 yu3
3: baat3 wang4 yat1 yu5
4: to unite the whole world under one sovereign by force of arms

Problem #2: Some rows of these charts describe compounds that do not contain the character in question. Where I have seen this problem, the gloss contains the character, and that is probably how these data were incorrectly placed in these charts.

Example:

U+4e2d 中 http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=4e2d&useutf8=true

1: 四書
2: Si4shu
3: sei3 syu1
4: the Four Books (the Great Learning, the Doctrine of the Mean, the Analects, and the Book of Mencius 大學, 中庸, 論語, 孟子)

(See also next.)

Problem #3:

Also in the page for U+4e2d 中, there is an evident error, perhaps a conflation of parts of two or more rows:

U+4e2d:
1: 包公 包拯
2: Baogong Baozheng3
3: zzz悲從中來
4: bei cong2 zhong lai2

Problem #4:

Several rows contain non-hanzi characters in the first column. Again from U+4e2d 中:

1: 外?中乾 (2nd character = question mark)
2: wai4qiang2-zhonggan
3: [blank]
4: to put up a bold front; a paper tiger; a bold front; an outward show

1: 在﹍之中 (2nd character looks like dashed underline, three short horiz. strokes in a row at bottom of line)
2: zai4 zhi1 zhong1
3: [blank]
4: amid; among

These last may be due to my browser. I am using FireFox 3.0.7 under WinXP with SP3 with, well, truckloads of fonts installed; but if you cannot reproduce this problem, please tell me what to look for on my system.

Sincerely,
Mark A. Mandel