Re: Unicode Searching

From: Mark Davis (marked@best.com)
Date: Thu Apr 29 1999 - 10:51:32 EDT

Next message: Addison Phillips: "RE: Unicode Searching"
Previous message: Alfinito, Charles: "Basic question or maybe not"
Maybe in reply to: Randy Hughes: "Unicode Searching"
Next in thread: Addison Phillips: "RE: Unicode Searching"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

You can break searching down into the following steps:

1. Handling Unicode itself, at least at a binary level, as opposed to
byte-streams.
2. Handling Unicode canonical equivalence: e.g.. identifying ä and a¨ (see Ch
3, TR15)
3. Handling character (e.g. grapheme) and word boundaries, so that you don't
match across them. (see Ch 5)
4. Handling locale conventions, e.g. "ae" ~ "ä" (see Ch 5, TR10)

Depending on the OS, many of these steps should be handled for you. Laura
Werner did a nice paper on this at the last Uniocde conference, you should
look at that also.

Mark

Randy Hughes wrote:

> I have written a Searching application for Windows. I am interested in
> adding Unicode searching capability to it. Can someone give me a brief
> list of issues to consider, or point me to a good starting point for adding
> this capability. If you need to see the product it can be downloaded from
> my website listed below. It will currently handle only single-byte, and I
> am trying to figure out how to get it to Double-Byte.
>
> Thanks
> Randy Hughes
> Jr Computing
> http://www.jrcomputing.com

--
business: mark.davis@us.ibm.com, mark@unicode.org
personal: mark@macchiato.com, http://www.macchiato.com
--

Next message: Addison Phillips: "RE: Unicode Searching"
Previous message: Alfinito, Charles: "Basic question or maybe not"
Maybe in reply to: Randy Hughes: "Unicode Searching"
Next in thread: Addison Phillips: "RE: Unicode Searching"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT