Re: FYI: Regex paper for UTC

From: Hans Aberg (haberg@math.su.se)
Date: Sat Oct 13 2007 - 17:10:30 CDT

Next message: Addison Phillips: "Re: Use of interum PUA encodings for 85 letters"

Previous message: Philippe Verdy: "RE: FYI: Regex paper for UTC"
In reply to: Philippe Verdy: "RE: FYI: Regex paper for UTC"
Next in thread: Philippe Verdy: "RE: FYI: Regex paper for UTC"
Reply: Philippe Verdy: "RE: FYI: Regex paper for UTC"
Reply: Philippe Verdy: "RE: FYI: Regex paper for UTC"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 13 Oct 2007, at 23:21, Philippe Verdy wrote:

>> The problem is that any string starting with c will also be in the
>> language complement and matched.
>
> No, it won't. A string containing a c (at any position, and in any
> non-null
> number of positions), contains a match for the regexp "c", so it is
> not part
> of the complement.

This is not the language complement. The language set operators are
just the normal set operators inside the set C*, where C is the
character set. See for example, Aho, Sethi & Ullman, "Compilers" (the
"dragon book") on REs. And even if you exclude all strings containing
"c" as a substring, you get strings of arbitrary length, which is not
what I want.

But the ordinary language set operators can be used in the way I
described in another post.

> The expression "a -- b" matches "a" only if it does not contain
> "b". Both
> "a" and "b" are tried in parallel (trying to match "b" is performed
> repeatedly on every position), and as soon as a match on "b" is
> found, it
> eliminates the ongoing candidate match on "a".

So in particular, one will have that if L(a\b) = L(a\b).

> The "--" and "~" operators are not operators acting on a single
> instance of
> the language, they are set operations acting on ALL possible matches
> simultaneously, i.e. in the domain of sets of strings, not on the
> domain of
> isolated strings.

So you don't need this, I think. Essentially, what you are describing
is a new closure operator x´, with the definition perhaps L(x') =
{y ∈ C*| y is a substring of a string in L(x) }. (To be a "closure
operator", it should satisfy L(x'') = L('x).) Then your complement is
~(x'), where L(~x) := C* \ L(x). But I am not sure of the usability
of this x' operator.

Hans Åberg

Next message: Addison Phillips: "Re: Use of interum PUA encodings for 85 letters"
Previous message: Philippe Verdy: "RE: FYI: Regex paper for UTC"
In reply to: Philippe Verdy: "RE: FYI: Regex paper for UTC"
Next in thread: Philippe Verdy: "RE: FYI: Regex paper for UTC"
Reply: Philippe Verdy: "RE: FYI: Regex paper for UTC"
Reply: Philippe Verdy: "RE: FYI: Regex paper for UTC"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Oct 13 2007 - 17:11:44 CDT