Re: FYI: Regex paper for UTC

From: Hans Aberg (
Date: Sat Oct 13 2007 - 17:10:30 CDT

  • Next message: Addison Phillips: "Re: Use of interum PUA encodings for 85 letters"

    On 13 Oct 2007, at 23:21, Philippe Verdy wrote:

    >> The problem is that any string starting with c will also be in the
    >> language complement and matched.
    > No, it won't. A string containing a c (at any position, and in any
    > non-null
    > number of positions), contains a match for the regexp "c", so it is
    > not part
    > of the complement.

    This is not the language complement. The language set operators are
    just the normal set operators inside the set C*, where C is the
    character set. See for example, Aho, Sethi & Ullman, "Compilers" (the
    "dragon book") on REs. And even if you exclude all strings containing
    "c" as a substring, you get strings of arbitrary length, which is not
    what I want.

    But the ordinary language set operators can be used in the way I
    described in another post.

    > The expression "a -- b" matches "a" only if it does not contain
    > "b". Both
    > "a" and "b" are tried in parallel (trying to match "b" is performed
    > repeatedly on every position), and as soon as a match on "b" is
    > found, it
    > eliminates the ongoing candidate match on "a".

    So in particular, one will have that if L(a\b) = L(a\b).

    > The "--" and "~" operators are not operators acting on a single
    > instance of
    > the language, they are set operations acting on ALL possible matches
    > simultaneously, i.e. in the domain of sets of strings, not on the
    > domain of
    > isolated strings.

    So you don't need this, I think. Essentially, what you are describing
    is a new closure operator x´, with the definition perhaps L(x') =
    {y ∈ C*| y is a substring of a string in L(x) }. (To be a "closure
    operator", it should satisfy L(x'') = L('x).) Then your complement is
    ~(x'), where L(~x) := C* \ L(x). But I am not sure of the usability
    of this x' operator.

       Hans Åberg

    This archive was generated by hypermail 2.1.5 : Sat Oct 13 2007 - 17:11:44 CDT