RE: Proposal for matching negated sets (was Re: New Public Review Issue: Proposed Update UTS #18)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Oct 09 2007 - 23:30:54 CDT

  • Next message: Hans Aberg: "Re: Proposal for matching negated sets (was Re: New Public Review Issue: Proposed Update UTS #18)"

    Hans Aberg [mailto:haberg@math.su.se] wrote:
    > On 10 Oct 2007, at 01:11, Philippe Verdy wrote:
    > > Anyway, you're still not replying to the question: which is the
    > > expected
    > > ordering for the returned matches?
    >
    > If you know what language the grammar produces, then you know the
    > language complement. And the DFA construct shows that if the grammar
    > is RE (Chomsky hierarchy type 3), there exists an RE grammar that
    > produces the complement language.

    I know all that (in fact I had already said the same thing before: the
    language depends on the alphabet that "." recognizes partly, with the
    exception of newlines that may be outside of the subset of the alphabet
    matched by ".").

    But a DFA or NFA approach absolutely does not matter at all semantically. It
    absolutely does not influence the language recognized, and so cannot
    influence the negation of this same language. Don't mix the implementation
    (NFA/DFA...) with the language that the implementation is supposed to
    recognize.

    > > I absolutely don't care if you use a DFA or not.
    >
    > So once you have decided what the abstract symbols you use in your RE-
    > like algebra should mean grammar-wise (hopefully producing an RE
    > grammar), or what DFA they should produce, you can use your favorite
    > method to compute an RE grammar that produces the language complement.

    Once again, the DFA argument does not matter at all. It absolutely does not
    influence the language or its algebra. But you are still missing the point:
    the order of matches DOES matter in all practical uses: whatever a regexp
    will match, according to its algebra, this will be according to it only a
    unordered set, that the client application will process in an ordered
    fashion; just consider the two typical cases:

    - of a text editor performs a regexp search to move the cursor position or
    the current selection to the next or previous match: it can't use all the
    possible matches at the same time. This means that it will take a decision
    about the first match that satisfies the regexp. No DFA involved here.

    - of a automated search-&-replaceoperation: it is critical for the order to
    be consistent, because it will not replace all matches at the same time
    (don't forget that within the set of matches, there may be a lot that
    overlap within the set of possible matched). Here again there's anarbitrary
    decision, but to make the process predictable, you need an ordering of
    matches where the first of them will be the first considered for
    substitution. No DFA involved here, it does not matter, theonly thing that
    matters is the ordering of matches, not the set of matches returned by the
    regexp.

    > I do not know why this complement operation is important, though.

    Because it appears in contextual searches. Ofcourse you may implement a
    regexp engine that supports NO negated regexp searches (and no negated
    character classes), but it will be impossible to match some items, using a
    finite regexp string. And even if you don't want to support negated
    searches, the question of ordering still remains, and becomes critical when
    you want to support capturing groups (due to conflicting "interest" between
    the multiple capturing groups present in the regexp). And even if you don't
    support capturing groups, the question of order still remains with regexps
    that terminate by a "*" or "+" repeater (but then only a small part of the
    order matters).



    This archive was generated by hypermail 2.1.5 : Tue Oct 09 2007 - 23:35:11 CDT