Re: regular expressions

From: Mark Leisher (mleisher@crl.nmsu.edu)
Date: Thu Jan 30 1997 - 13:19:56 EST


    Geoffrey> May I suggest that using POSIX style equivalence classes would
    Geoffrey> be a better syntax? Eg. [[:alnum:]] specifies all alpha-numeric
    Geoffrey> characters. [^[:alnum:]] specifies anything but alpha-numeric
    Geoffrey> characters. [^[:alnum:][:space]] specifies anything but
    Geoffrey> alpha-numeric and whitespace characters.

    Geoffrey> From what I can clean from my BSD man page, it is even POSIX
    Geoffrey> compliant to add new classes such as non-spacing characters or
    Geoffrey> the different blocks. Of course providing something like
    Geoffrey> [[:greek:]] leaves open for debate whether it should be the
    Geoffrey> U+0370 - U+03FF block or if should also include the other bits
    Geoffrey> of Greek scattered around.

I already have the basic Posix equivalence classes, but was under the
impression they were simply codifying existing practice and didn't know there
was room for additions. Looks like there's gonna be a bit of debate over
naming and constituents of additional equivalence classes :-)

I don't want to spend a lot of time guessing which combinations of Unicode
character type properties are going to be wanted just to generate equivalence
class names. Besides, the business of naming inevitably provokes argument.

So, I'll stick with two non-standard, but simple and flexible constructs to
give us the matching resolution we need until strict conformance to some set
of conventions is dictated by circumstances.
-----------------------------------------------------------------------------
mleisher@crl.nmsu.edu
Mark Leisher "A designer knows he has achieved perfection
Computing Research Lab not when there is nothing left to add, but
New Mexico State University when there is nothing left to take away."
Box 30001, Dept. 3CRL -- Antoine de Saint-Exup'ery
Las Cruces, NM 88003



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT