Re: regular expressions

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jan 30 1997 - 14:16:04 EST


Tony Harminc writes:

>
> > 2. Has anyone given any serious thought to extensions of said Unixoid regular
> > expression syntax to handle non-English alphabets used as "ranges" for pattern
> > matching?
>
> Presumably any such extensions would need to include concepts of sort
> order if they are going to handle ranges. In other words the whole
> Unicode character properties database is not a sufficient resource;
> some sort standard (preferably the nascent ISO 14651) tables are also
> needed. Doesn't make for a small grep (or whatever).
>

I believe that the issue of how to specify the syntax for Unicode
character ranges for a regular expression syntax is independent of
any sorting/collation issues.

Specification of a match against, for instance, any Devanagari character,
by some such syntax as [\u0901-\u0970] does not imply any particular
approach to how Devanagari data would be collated and sorted. So the
complexities implied in ISO 14651 do not really impinge on the size or
efficiency of a Unicode-based implementation of grep.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT