Re: Regular Expressions and Canonical Equivalence

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Sat, 16 May 2015 00:31:53 +0200

2015-05-15 23:57 GMT+02:00 Richard Wordingham <
richard.wordingham_at_ntlworld.com>:

> On Fri, 15 May 2015 22:09:13 +0200
> Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:
>
> > 2015-05-15 9:10 GMT+02:00 Richard Wordingham <
> > richard.wordingham_at_ntlworld.com>:
>
> > This is because you don't understand the issue !
>
> > > Now, a program to check whether a trace matching
> > > {\u0323|\u0302)* matches (\u0323\u0302)* is very simple. It just
> > > counts the number of times \u0323 occurs and the number of times
> > > \u0302 occurs, and returns whether they are equal.
>
> > This is wrong. \0323\0323\0302\0302 and \0323\0302\0323\0302 would
> > pass your counting test (which does not work in a FSA) but they are
> > NOT canonically equivalent because the identical combining characters
> > are blocking each other (so arbitrary ordering is not possible).
>
> TUS7.0: D108 Reorderable pair:
> Two adjacent characters A and B in a coded character sequence
> <A, B> are a Reorderable Pair if and only if ccc(A) > ccc(B) > 0.
>
> Now, ccc(U+0302) = 230 > 220 = ccc(U+0323) > 0, so (U+0302, U+0303) is
> a reorderable pair.
>

I do NOT contest that U+0323 and U+0302 can reorder, but the fact that
U+0323 blocks another occurence of U+0323 because it has the **same**
combining class.
Received on Fri May 15 2015 - 17:33:13 CDT

This archive was generated by hypermail 2.2.0 : Fri May 15 2015 - 17:33:13 CDT