RE: extracting words

From: jarkko.hietaniemi@nokia.com
Date: Mon Feb 12 2001 - 11:21:35 EST

Next message: J M Sykes: "Re: The normalization form of the result of a dyadic operation."
Previous message: Jungshik Shin: "Re: [OT] RE: FW: extracting words"
Maybe in reply to: Brahim Mouhdi: "extracting words"
Next in thread: Mark Leisher: "RE: extracting words"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> > - line break (wrapping lines on the screen)
> > - word break (for selection)
> > - word/root extraction (for search)
>
> I recognize that the second and third case are really
> difficult to handle.

Root extraction is decidecly non-trivial and a highly language-specific
problem, even more so than word breaking, it's a messy linguistic problem
instead of a clean algoritmic problems.
To start with, the choice of the term "extraction" shows that one has not
understood the problem in all its g(l)ory: a more appropriate term would be
"finding", or maybe, "reducing" the root.

Also, I would add

- "syllablization" (is that a word?) as a third problem (for breaking words
more nicely into lines), it would rank in difficulty somewhere between word
breaking and root extraction.

> But for word wrapping I assume line
> breaking is sufficient. But when I don't have spaces to use
> for wrapping and/or don't know whether the actual text part
> uses spaces at all (what about exotic languages like Ogham or
> Anglo-saxon?) then how can I go to implement word wrapping?
> Simply do it character by character?

Next message: J M Sykes: "Re: The normalization form of the result of a dyadic operation."
Previous message: Jungshik Shin: "Re: [OT] RE: FW: extracting words"
Maybe in reply to: Brahim Mouhdi: "extracting words"
Next in thread: Mark Leisher: "RE: extracting words"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT