Re: Unicode Searching

From: Mark Davis (
Date: Thu Apr 29 1999 - 10:51:32 EDT

You can break searching down into the following steps:

1. Handling Unicode itself, at least at a binary level, as opposed to
2. Handling Unicode canonical equivalence: e.g.. identifying and a (see Ch
3, TR15)
3. Handling character (e.g. grapheme) and word boundaries, so that you don't
match across them. (see Ch 5)
4. Handling locale conventions, e.g. "ae" ~ "" (see Ch 5, TR10)

Depending on the OS, many of these steps should be handled for you. Laura
Werner did a nice paper on this at the last Uniocde conference, you should
look at that also.


Randy Hughes wrote:

> I have written a Searching application for Windows. I am interested in
> adding Unicode searching capability to it. Can someone give me a brief
> list of issues to consider, or point me to a good starting point for adding
> this capability. If you need to see the product it can be downloaded from
> my website listed below. It will currently handle only single-byte, and I
> am trying to figure out how to get it to Double-Byte.
> Thanks
> Randy Hughes
> Jr Computing


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT