Re: CJK Parsing Techniques

From: Roland Wang (
Date: Tue Jul 01 1997 - 17:26:29 EDT

> The parsing of CJK text to find meaning tokens (word equivalents) seems
> to be a daunting problem due to lack of word boundaries. Are there any
> techniques, tools or algorithms (free or licensable) that do a good job
> of parsing "words" out of a CJK string.

Verity's partners in China (Sino-Software) and Japan (NEC and Omron) develop
tokenizers which break text into smaller elements (words/phrases/tokens)
for Verity's text search and retrieval products.

_Roland Wang

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT