Re: regular expressions

From: Mark Davis (
Date: Thu Jan 30 1997 - 14:42:01 EST

unicode@Unicode.ORG wrote:
> Mark> We did have to expand the normal syntax; the direction we chose was
> Mark> to allow positive or negative matches against unicode character
> Mark> properties within a character range list. (I think we also
> Mark> considered adding block names.)
> Indeed. I made up the "\p" and "\P" constructs which, followed by a list of
> character property numbers, expands into an appropriate character class.
> For example, the RE construct (assuming 7 == decimal digits and 15 == currency
> symbols) "\p7,15" would expand into a character class containing all decimal
> digits and currency symbols, and "\P7,15" would expand to every character
> *except* the decimal digits and currency numbers. Their appearance within
> *another* character class means they are unioned with the other characters
> contained in the class.
> A bit awkward syntactically, but it provides very fine control over matching.
> Character classes can be specified in the usual RE manner containing ranges
> specified as the UCS2 codes themselves or in Java form, \uXXXX.
> -----------------------------------------------------------------------------
> Mark Leisher "A designer knows he has achieved perfection
> Computing Research Lab not when there is nothing left to add, but
> New Mexico State University when there is nothing left to take away."
> Box 30001, Dept. 3CRL -- Antoine de Saint-Exup'ery
> Las Cruces, NM 88003

I don't remember the precise syntax off hand, but it was functionally
something like:

range := [exp(,exp)*{!exp(,exp)*}]
exp := unichar1-unichar2 // all characters within the range
exp := @category // e.g. Pc (closing punctuation) or Sc (currency
exp := @majorCategory // e.g. P (any punctuation) or S (any symbols)

! means "except for"

So you could say

[@L!A] meaning all letters except for A
[A-C,a-c,@Sc!$] meaning letters a-z and currency symbols, except for $

Interesting other options would be blocknames, and choice of whether to
decompose when matching.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT