Re: Unicode Searching

From: Mark Davis (marked@best.com)
Date: Thu Apr 29 1999 - 10:51:32 EDT


You can break searching down into the following steps:

1. Handling Unicode itself, at least at a binary level, as opposed to
byte-streams.
2. Handling Unicode canonical equivalence: e.g.. identifying and a (see Ch
3, TR15)
3. Handling character (e.g. grapheme) and word boundaries, so that you don't
match across them. (see Ch 5)
4. Handling locale conventions, e.g. "ae" ~ "" (see Ch 5, TR10)

Depending on the OS, many of these steps should be handled for you. Laura
Werner did a nice paper on this at the last Uniocde conference, you should
look at that also.

Mark

Randy Hughes wrote:

> I have written a Searching application for Windows. I am interested in
> adding Unicode searching capability to it. Can someone give me a brief
> list of issues to consider, or point me to a good starting point for adding
> this capability. If you need to see the product it can be downloaded from
> my website listed below. It will currently handle only single-byte, and I
> am trying to figure out how to get it to Double-Byte.
>
> Thanks
> Randy Hughes
> Jr Computing
> http://www.jrcomputing.com

--
business: mark.davis@us.ibm.com, mark@unicode.org
personal: mark@macchiato.com, http://www.macchiato.com
--



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT