> The parsing of CJK text to find meaning tokens (word equivalents) seems
> to be a daunting problem due to lack of word boundaries. Are there any
> techniques, tools or algorithms (free or licensable) that do a good job
> of parsing "words" out of a CJK string.
Verity's partners in China (Sino-Software) and Japan (NEC and Omron) develop
tokenizers which break text into smaller elements (words/phrases/tokens)
for Verity's text search and retrieval products.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT