Re: Regexes, Canonical Equivalence and Backtracking of Input

From: Richard Wordingham <>
Date: Mon, 18 May 2015 22:14:11 +0100

On Mon, 18 May 2015 22:56:47 +0200
Philippe Verdy <> wrote:

> Isn't it possible for your basic substitution to transform \uf073
> into a character class [\uf071\uf072\uf073] that the regexp considers
> as a single entity to check ?
> In that case, backtracking for matching \u0F73*\u0F72 is simpler:
> [\uF071\uF072\uF073]*\u0F72, as it just requires backtracking only
> one character class (instead of one character).

I'm still waiting for your explanation of how your scheme for European
diacritics (as used in SE Asia) would work. This thread is intended for
the idea of using the regex to decide which character to take as the
next character from the input trace. In the other thread, I'm still not
sure whether you're working with traces or strings.

Received on Mon May 18 2015 - 16:15:11 CDT

This archive was generated by hypermail 2.2.0 : Mon May 18 2015 - 16:15:11 CDT