I know this question is a bit off the topic of Unicode, but this group
seems very aware of the latest in text processing.
The parsing of CJK text to find meaning tokens (word equivalents) seems
to be a daunting problem due to lack of word boundaries. Are there any
techniques, tools or algorithms (free or licensable) that do a good job
of parsing "words" out of a CJK string.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT