Re: Re: Regular Expressions and Canonical Equivalence

From: Stephen E Slevinski Jr <slevin_at_signpuddle.net>
Date: Thu, 14 May 2015 12:25:06 -0500

On 5/14/15 5:58 AM, Philippe Verdy wrote:
> Yes it is problematic: (ab)* is not the same as (a|b)* as this
> requires matching pairs of letters "ab" in that order in the first
> expression, but random strings of "a" and "b" i nthe second one (so
> the second matches *more* input samples.
>
> Even if you consider canonical equivalences (where the relative order
> of "ab" does not matter for example because they have distinct
> non-zero canonical) this does not mean that "a" alone will match in
> the first expression "(ab*)", even though it MUST match in "(a|b)*".
>
> So the solution is just elegant to simplify the first level of
> analysis of "(ab)*" by using "(a|b)*" instead. But then you need to
> perform a second pass on the match to make sure it is containing only
> complete sequences "ab" in that order (or any other order if they are
> all combining with a non-zero combining class) and no unpaired "a" or "b".

If you always want to find "a" and "b" in a pair without regard to the
order, how about the regex:
((ab)|(ba))*

∼Steve
Received on Thu May 14 2015 - 12:25:34 CDT

This archive was generated by hypermail 2.2.0 : Thu May 14 2015 - 12:25:36 CDT