Hanzi trad-simp folding and z-variants

From: John D. Burger <john_at_mitre.org>
Date: Thu, 6 Jun 2013 15:54:02 -0400

Hi there -

I'm working on an information retrieval application for a collection of Chinese documents, which appear to use a mix of traditional and simplified characters. My intuition is that it makes sense to do traditional to simplified folding for indexing and query processing (when the mapping is unambiguous), but I'd be interested in opinions about this.

Second, I just noticed the kZVariant field in the Unihan.zip file. It seems to me that it makes sense to fold these together as well, correct?

Thanks for any information you care to provide.

- John Burger
 MITRE
Received on Thu Jun 06 2013 - 14:59:40 CDT

This archive was generated by hypermail 2.2.0 : Thu Jun 06 2013 - 14:59:42 CDT