Re: FYI: Regex paper for UTC

From: Hans Aberg (
Date: Sat Oct 13 2007 - 13:50:05 CDT

  • Next message: Philippe Verdy: "RE: Use of interum PUA encodings for 85 letters"

    On 13 Oct 2007, at 20:07, Philippe Verdy wrote:

    >> If the RE r generates the language L(r), and ~r is the complement,
    >> what is the definition of L(~r)? Specifically:
    >> If L(ε) is the empty string, what is L(~ε)?
    > It's the set of non-empty texts. So given the alphabet A (which is not
    > necessarily a single character, but could be any localized set of
    > collation
    > element) matched by the "." class, it generates L(".+"), in other
    > words:
    > RE "~ε" = RE ".* -- ε"
    > = RE ".+"

    >> What is the definition of L(~(r*))?
    > For me it's simply:
    > L(".*") - L("r*") = L(".* -- r*")
    > where L(".*") is the set of all texts (including the empty text)
    > based on
    > alphabet A, and where "--" is the difference operator in regexp
    > syntax.

    It seems me you are using the language complement (possibly cut down
    by the language of legal sequences).

    That operation is not very useful, because the language complement of
    say a single character c is the set of all other strings. So if one
    is finding the longest string in the language from a point on, and
    the string isn't c, all will be eaten.

    But Mark Davis said just that such stuff isn't considered, but merely
    set operations on character classes, which is pretty straightforward.

       Hans Åberg

    This archive was generated by hypermail 2.1.5 : Sat Oct 13 2007 - 13:51:18 CDT