extracting words

From: Brahim Mouhdi (brahim.mouhdi@cmg.nl)
Date: Mon Jan 29 2001 - 04:27:52 EST

Hello all,

I'm writing a C-program that is called Blacklist, It's purpose is to accept
a string (unicode) and extract words from it, then hash the found words
according to a hashing algorythm and see if the word is in blacklist

This is all very straightforward, but the problem is the extracting of
wordsfrom this string.
How do i determine what a word is in Japanese or Korean or whatever other
language? { a space ? }
I think somebody must have had this problem and solved it, or maybe my
approach to the problem is wrong.

I hope somebody can give me some good pointers, directions or suggestions.

Thanks for your time,

Brahim Mouhdi


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT