Errors in Unihan data : simplified/traditional variants

From: Koxinga (koxinga@wanadoo.fr)
Date: Sat Oct 30 2010 - 21:42:20 CDT


Hello,

I recently looked up the relationships traditional-simplified in the
Unihan database (Unihan_Variants.txt).

I knew it had mistakes and I wanted to help correct some of them, but
the first thing that stand out and surprised me was the large number of
lines like :

U+346F kSimplifiedVariant U+3454
U+346F kTraditionalVariant U+3454

which should be (if I didn't mix them up ...)

U+3454 kTraditionalVariant U+346F
U+346F kSimplifiedVariant U+3454

My quickly done parsing program counted 1154 such pairs, where the head
character was the same as the character above. It seems to be always in
the order "kTraditionalVariant" then "kSimplifiedVariant", so can maybe
be automatically corrected. It seems to be a very evident mistake, and
the correction should be easy. I can help with that, I am just waiting
to see if this is the right place to report problems in Unihan. I also
considered http://www.unicode.org/reporting.html , would it be better ?

I have a lot of other questions and comments on these
simplified/traditional relationships, but I guess it will wait the
resolution of this problem, this would make for a too long email.

Regards,

Koxinga



This archive was generated by hypermail 2.1.5 : Sun Oct 31 2010 - 17:43:53 CST