RE: extracting words

From: jarkko.hietaniemi@nokia.com
Date: Mon Feb 12 2001 - 11:21:35 EST


> > - line break (wrapping lines on the screen)
> > - word break (for selection)
> > - word/root extraction (for search)
>
> I recognize that the second and third case are really
> difficult to handle.

Root extraction is decidecly non-trivial and a highly language-specific
problem, even more so than word breaking, it's a messy linguistic problem
instead of a clean algoritmic problems.
To start with, the choice of the term "extraction" shows that one has not
understood the problem in all its g(l)ory: a more appropriate term would be
"finding", or maybe, "reducing" the root.

Also, I would add

- "syllablization" (is that a word?) as a third problem (for breaking words
more nicely into lines), it would rank in difficulty somewhere between word
breaking and root extraction.

> But for word wrapping I assume line
> breaking is sufficient. But when I don't have spaces to use
> for wrapping and/or don't know whether the actual text part
> uses spaces at all (what about exotic languages like Ogham or
> Anglo-saxon?) then how can I go to implement word wrapping?
> Simply do it character by character?
 



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT