Re: Regular expressions in Unicode (Was: Ethiopic text)

From: Kolbjrn Aamb (k.h.aambo@ub.uio.no)
Date: Fri Mar 13 1998 - 07:59:23 EST


Would not something like:

Aa:::::::,Bb,Cc:,Dd,Ee:,Ff,Gg,Hh,I:i,Jj,Kk,Ll,Mm,Nn:,O
o:::::,Pp,Qq,Rr,Ss,Tt,Uu:,Vv,Ww,Xx,Yy:,Zz.

be apropriate for english searching?

Then you would find ngstrm by searching for Angstrom.

A little problem though: I have a problem matching
KVRNER by searching for KVAERNER using the above relation, any suggestion?

By the way I have seen this way of putting relation among characters in
several other peoples work.

Peter Westlake <peter@harlequin.co.uk> wrote:
:
>Now, if I want to find a word beginning with A in a list of
>scientific words used in English, then I would hope to find
>"ngstrm". But if I were searching for names beginning with
>A in the Danish telephone directory, it would be a mistake to
>find "ngstrm". So I need to say what I mean. If I want to
>match A-F in English, I need a short way of saying whether to
>include accents and case and of saying that I mean English.
>Something like [A-F::u,a,uk] where u means upper case, a means
>any accent, uk is from a standard list of codes. The range is
>interpreted in the context of the UK collating sequence. To
>omit ngstrms, I would ask for ^[A::u,a,dk]* meaning "a string
>beginning with a letter that matches A in Danish". In this context,
>"Danish" and "English" can be seen as equivalence relations that
>partition the character set into equivalence classes. Kolbjrn
>gave an example of such a relation.
>
:
:



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:39 EDT