RE: FYI: Regex paper for UTC

From: Philippe Verdy (
Date: Wed Oct 10 2007 - 20:39:07 CDT

  • Next message: James Kass: "Re: New FAQ page"

    No problem for me too: it effectively gives a strong semantic to the NOT

    Except that I use the "rewrite" operations exactly in the reverse direction,
    the result of these rewrites is that there may be NOT operators everywhere
    in the final regexp, but only ONE binary operator, i.e. the || operator for
    alternation (it eliminates all && and all -- from the regexp):

    x -- y ==> ^(^x || y)
    x && y ==> ^(^x || ^y)
    ^^x ==> x

    (only three rewrite rules needed, instead of 10, only the last above is
    common between the two sets of rewrites).

    This gives identical results (because the expressions are equivalent), but
    uses a single transition rule in the matcher, where each transition can be
    either positive or negative, but part of the deterministic state.

    All the parentheses (except capturing groups) are eliminated as part of the
    construction of the transition graph, so this is no complication. The
    remaining NOT operations all become the negation of the transition
    condition. So the transition graph can be flattened in a single layer with
    only positive conditions.

    This is simpler (for me) to implement than supporting the three junction
    types (&&, ||, --) in junction nodes (because it would require three
    junction types in nodes, in addition to the complexity of
    epsilon-transitions needed for capturing groups), and allows all transitions
    to be performed much more simply by using only two sets: the union of all
    positive input transitions and the union of all negative input transitions.

    > -----Message d'origine-----
    > De : [] De la
    > part de Mike
    > Envoyé : mercredi 10 octobre 2007 21:51
    > À : Unicode
    > Objet : Re: FYI: Regex paper for UTC
    > > Andy and I put together a paper of recommendations for the UTC at
    > >
    > These recommendations look good to me.

    This archive was generated by hypermail 2.1.5 : Wed Oct 10 2007 - 20:43:09 CDT