RE: FYI: Regex paper for UTC

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Oct 13 2007 - 14:36:53 CDT

  • Next message: Hans Aberg: "Re: FYI: Regex paper for UTC"

    Hans Aberg [mailto:haberg@math.su.se]wrote:
    > That operation is not very useful, because the language complement of
    > say a single character c is the set of all other strings. So if one
    > is finding the longest string in the language from a point on, and
    > the string isn't c, all will be eaten.

    No, such operation is typically used in association with a "&&" operator
    that restricts the set of matchable strings. They are used also for matching
    left and right contexts without including these contexts in the returned
    match.

    But as you said, "all will be eaten" ONLY IF "the string is not c", so the
    effect of negation is NOT producing the whole set of possible texts.

    Even the negation of the empty string is not the whole set of possible
    texts, but ONLY the set of NON-EMPTY strings. It has a very well-defined
    meaning in set algebras.

    The fact that the result set of an negated RE is infinite is not a problem
    conceptually: in fact, even the RE "r*" is also matching an infinite set,
    and despite this, it is productive, because we can build an engine to match
    it using a finite number of tests (this number of tests and branches to
    implement is also bounded, and only depends on the length of the regular
    expression).

    Really I should try to convince you by making a reusable and simplified
    version of my regexp matcher that supports negations of any regexps
    (including those with capturing groups and left or right contextual matches)



    This archive was generated by hypermail 2.1.5 : Sat Oct 13 2007 - 14:38:13 CDT