CJK Simple Algorithm

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Feb 10 1997 - 19:33:31 EST


Murray asked:

> Does anyone have a simple algorithm to
> distinguish between Chinese and Japanese text?

The human algorithm is approximately:

if HangulPresent ==> Korean
elif KanaPresent ==> Japanese
elif SimplifiedCharactersPresent ==> Chinese (simplified)
elif ChineseCharacterNotInJapanesePresent ==> Chinese (traditional)
else SearchForWordsIRecognizeInText.

No doubt a number of people have done a table- and range-based simple
version of the first four steps that ought to provide a 99+% accurate
heuristic based on a single line of text.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT