Re: Pure Regular Expression Engines and Literal Clusters

From: Richard Wordingham via Unicode <unicode_at_unicode.org>
Date: Thu, 10 Oct 2019 22:54:35 +0100

On Tue, 8 Oct 2019 15:25:34 +0100
Richard Wordingham via Unicode <unicode_at_unicode.org> wrote:

> An example UTS#18 gives for matching a literal cluster can be
> simplified to, in its notation:
>
> [c \q{ch}]
>
> This is interpreted as 'match against "ch" if possible, otherwise
> against "c". Thus the strings "ca" and "cha" would both match the
> expression
>
> [c \q{ch}]a
>
> while "chh" but not "ch" would match against
>
> [c \q{ch}]h
>
> Or have I got this wrong?

After comparing this with the Perl behaviour of /(:?ch|c)
and /(:?ch|c)h, I've come to the conclusion that I've got the
interpretation wrong. The former may match "ch" or "c", and I
conclude that the only funny meaning of \q is to indicate a preference
for the sequence of two characters - if the engine yields all matches,
it has no meaning.

This greatly simplifies matters.

Richard.
Received on Thu Oct 10 2019 - 16:55:23 CDT

This archive was generated by hypermail 2.2.0 : Thu Oct 10 2019 - 16:55:24 CDT