CJK Simple Algorithm

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Feb 10 1997 - 19:33:31 EST

Previous message: Mark Leisher: "Re: nbsp and symbol boxes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Murray asked:

> Does anyone have a simple algorithm to
> distinguish between Chinese and Japanese text?

The human algorithm is approximately:

if HangulPresent ==> Korean
elif KanaPresent ==> Japanese
elif SimplifiedCharactersPresent ==> Chinese (simplified)
elif ChineseCharacterNotInJapanesePresent ==> Chinese (traditional)
else SearchForWordsIRecognizeInText.

No doubt a number of people have done a table- and range-based simple
version of the first four steps that ought to provide a 99+% accurate
heuristic based on a single line of text.

--Ken

Previous message: Mark Leisher: "Re: nbsp and symbol boxes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT