Re: search ignoring diacritics

From: Mark Leisher (mleisher@crl.nmsu.edu)
Date: Mon May 21 2001 - 17:16:00 EDT


    Peter> - normalise both data and search string - delete / ignore all
    Peter> characters with general category Mn

That's the way we've been doing it for a long time now. Normalization is a
bit expensive at times with very large corpora, but if you have the disk
space, it is a one-time cost.
-----------------------------------------------------------------------------
Mark Leisher Times are bad. Children no longer obey
Computing Research Lab their parents, and everyone is writing
New Mexico State University a book.
Box 30001, Dept. 3CRL -- Marcus Tullius Cicero
Las Cruces, NM 88003



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT