Re: regular expressions

From: Mark Leisher (mleisher@crl.nmsu.edu)
Date: Wed Jan 29 1997 - 18:48:30 EST


    Mark> We did have to expand the normal syntax; the direction we chose was
    Mark> to allow positive or negative matches against unicode character
    Mark> properties within a character range list. (I think we also
    Mark> considered adding block names.)

Indeed. I made up the "\p" and "\P" constructs which, followed by a list of
character property numbers, expands into an appropriate character class.

For example, the RE construct (assuming 7 == decimal digits and 15 == currency
symbols) "\p7,15" would expand into a character class containing all decimal
digits and currency symbols, and "\P7,15" would expand to every character
*except* the decimal digits and currency numbers. Their appearance within
*another* character class means they are unioned with the other characters
contained in the class.

A bit awkward syntactically, but it provides very fine control over matching.
Character classes can be specified in the usual RE manner containing ranges
specified as the UCS2 codes themselves or in Java form, \uXXXX.
-----------------------------------------------------------------------------
mleisher@crl.nmsu.edu
Mark Leisher "A designer knows he has achieved perfection
Computing Research Lab not when there is nothing left to add, but
New Mexico State University when there is nothing left to take away."
Box 30001, Dept. 3CRL -- Antoine de Saint-Exup'ery
Las Cruces, NM 88003



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT