Re: Hanzi trad-simp folding and z-variants

From: Stephan Stiller <stephan.stiller_at_gmail.com>
Date: Sat, 08 Jun 2013 01:02:13 -0700

> http://www.unicode.org/reports/tr38/ does a good summary of the
> possibilities.
Which and where?

> Trying to "fold" from one locale to another, which is what folding
> from traditional to simplified would be is not a good idea, best
> practice is not bear in mind the locale being used, and do information
> retrieval on a locale by locale basis.
What do you mean?

Put simply: Either you don't let someone search a TW database with
simplified characters or you convert either the search terms or the
searched documents internally for the duration of your search – or some
combination of these options. It is not at all obvious to me what the
fastest way in a big data context is. There's gotta be research about this.

Stephan
Received on Sat Jun 08 2013 - 03:05:40 CDT

This archive was generated by hypermail 2.2.0 : Sat Jun 08 2013 - 03:05:41 CDT