Re: regular expressions

From: Mark Leisher (mleisher@crl.nmsu.edu)
Date: Thu Jan 30 1997 - 18:13:19 EST


    Rick> While I've had replies about proprietary or "not released"
    Rick> technology, it sounds like Mark Leisher at UNM may have a solution;
    Rick> and it would be advantageous in its free availability.

Aaaagh!! It is bad enough that many people do not know that New Mexico is
part of the United States (I refer you to the book "One Of Our 50 Is Missing"
published by New Mexico magazine), but to lump NMSU (the good school) with UNM
(the bad school 354 klicks North of the good school) is mildly offensive :-)

My apologies, but nearly 100 years of intra-school competition has inculcated
the students of both schools with a good-natured, knee-jerk rivalry.

    Rick> I think sort order a la 14651 might be overkill here. What I want
    Rick> in a regex language/syntax is to be able to specify not so much
    Rick> ranges per se, as ranges in reference to something. (The default
    Rick> regex used by "ed", "grep" and other tools implicitly has an
    Rick> "alphabet" of the ASCII range, in that order.) I interpret the
    Rick> spirit more like "A-Za-z" means "the alphabet, upper and lower
    Rick> case". The extension I'd like is to be able to specify "the
    Rick> alphabet" I'm concerned with, in some specified order, and then use
    Rick> the regex short-hand to point out ranges within it. So my
    Rick> "alphabet" would need to be defined somewhere (maybe in the
    Rick> environment) and its start/end points delineated also.

I understand the desire for some multi-level symbolic reference capability
(syntactic sugar) in regular expressions, but I can't think of any particular
situations where an ordering other than the binary patterns of the Unicode
characters themselves would be really useful.

In fact, defining a regular expression usually describes an ordering. For
example:

  Exercise 7-100:

    Write a regular expression to find the secondary (conjoined/subjoined)
    Kannada consonants in an akshara (akshara: consonant+consontant+...+vowel
    "syllable" group).

  Answer to Exercise 7-100:

   "[^\uC95-\uCB9][\uC95-\uCB9]([\uC95-\uCB9]+)[^\uC95-\uCB9]"

Perhaps I am missing your point. Maybe a specific example would help clarify.

    Rick> I do hope that the Unix people, whoever they are, are listening and
    Rick> we don't end up with too many different syntaxes for doing basically
    Rick> the same thing.

Most of our recent and pending Unix-based contracts require Posix compliance
whenever possible AND reasonable (our way out of locales), and anything else
needed to fulfill the contract requirements. I suspect others in similar
funding situations are getting similar requirements.

Well, I've enjoyed the brief respite from homework :-) Back to the salt mines
with me.
-----------------------------------------------------------------------------
mleisher@crl.nmsu.edu
Mark Leisher "A designer knows he has achieved perfection
Computing Research Lab not when there is nothing left to add, but
New Mexico State University when there is nothing left to take away."
Box 30001, Dept. 3CRL -- Antoine de Saint-Exup'ery
Las Cruces, NM 88003



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT