Re: FYI: Regex paper for UTC

From: Hans Aberg (haberg@math.su.se)
Date: Sat Oct 13 2007 - 16:11:03 CDT

  • Next message: Philippe Verdy: "RE: FYI: Regex paper for UTC"

    On 13 Oct 2007, at 21:36, Philippe Verdy wrote:

    > ...such operation is typically used in association with [an] operator
    > that restricts the set of matchable strings.

    You might merely add operations that correspond to the set operations
    of the languages, and let the user figure out their usability. L(x|y)
    is already the union of L(x) and L(y). And if, for a natural number
    k, L(x^k) = L(x)^k, then one can extract all strings of length k to
    complement against (see below).

    So let L(x&y) be the intersection of L(x) and L(y), L(~x) the set
    complement of L(x), L(x \ y) = L(x) \ L(y) (set difference).

    Then, if "." matches all characters, all strings of length 2 that do
    not match "ab" can be gotten from .^2 \ "ab" (or .^2 & ~"ab"). If U
    is the set of legal Unicode strings, and U_k the subset of strings of
    Unicode length 2 (which might be different from the string length),
    then all Unicode strings of Unicode length 2 that do not match "ab"
    can be gotten from U_2 \ "ab".

    Mostly, one would use the set difference \ operator, rather than the
    complement ~.

       Hans Ĺberg



    This archive was generated by hypermail 2.1.5 : Sat Oct 13 2007 - 16:13:36 CDT