Re: Proposal for matching negated sets (was Re: New Public Review Issue: Proposed Update UTS #18)

From: Andy Heninger (andy.heninger@gmail.com)
Date: Sun Oct 07 2007 - 20:27:47 CDT

Next message: Javier SOLA: "Re: Khmer, display of consonant shifters"

Previous message: Eric Muller: "Khmer, display of consonant shifters"
In reply to: Mark Davis: "Re: Proposal for matching negated sets (was Re: New Public Review Issue: Proposed Update UTS #18)"
Next in thread: Philippe Verdy: "RE: Proposal for matching negated sets (was Re: New Public Review Issue: Proposed Update UTS #18)"
Reply: Philippe Verdy: "RE: Proposal for matching negated sets (was Re: New Public Review Issue: Proposed Update UTS #18)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 10/5/07, Mark Davis <mark.davis@icu-project.org> wrote:
>
> Going back to my doc, and tweaking it, I think what we are saying is that
>
> [a-z \q{ch} \q{rr}]
>
> is equivalent to
>
> ( ch | rr | [a-z] )
>
> in matching, so however the latter works, the former should work.

How an expression like ( ch | rr | [a-z] ) works is in fact different
between the Perl family of regular expression implementations and the POSIX
family

Consider the expression
(ch | hz | a-f)+
and the text
chz

POSIX matching would match the entire string.
Perl (and Java, ICU, etc.) would match only the ch.

Do we care? I don't know. Maybe not, maybe the situation doesn't come up
often enough to worry about. Ignoring the differences has the practical
advantage that each type of implementation could actually do an
implementation within the structure of what they already have.

-- Andy

I think complement is the only really tricky problem.
>
> Mark
>
> On 10/5/07, Andy Heninger <andy.heninger@gmail.com> wrote:
> >
> >
> >
> > On 10/4/07, Mike <mike-list@pobox.com > wrote:
> > >
> > > > With strings in sets at all, separately from the question of how to
> > > do
> > > > set negation, I'm not sure how matching should work. Which choice
> > > is
> > > > selected if more than one is possible? Should backtracking try
> > > > additional choices if the first one doesn't lead to an overall
> > > match?
> > > > If sets don't have an implied ordering, do we need to require a
> > > POSIX
> > > > style longest match, which could be slow?
> > >
> > > In a set, I keep track of the strings by their length, so the longest
> > > match is always found. I don't think you want to backtrack and try a
> > > shorter string since the longer match is supposed to be treated as a
> > > unit....
> > >
> > > > Should the set [^xyz\q{ch}] match the 'c' in "ch" ?
> > >
> > > I don't think so; since the \q{ch} matches "ch", the negated set does
> > > not match at the first position.
> >
> >
> > The choices you have made seem reasonable to me.
> >
> > But what would implementations with a DFA (non-backtracking)
> > implementation do? It would be very difficult for them to not take a
> > shorter string from a set if that led to an overall longer match. Would it
> > be OK - still useful- if the UTS left what happens unspecified?
> >
> > -- Andy
> >
> >
> > > I'm half inclined to move strings, or literal clusters, into section
> > > 3,
> > > > then move the entire section 3 of UTS-18 into a separate document
> > > for
> > > > interesting, but not fully worked out, ideas.
> > >
> > > This seems like a good idea.
> > >
> > > Mike
> > >
> >
> >
>
>
> --
> Mark

Next message: Javier SOLA: "Re: Khmer, display of consonant shifters"
Previous message: Eric Muller: "Khmer, display of consonant shifters"
In reply to: Mark Davis: "Re: Proposal for matching negated sets (was Re: New Public Review Issue: Proposed Update UTS #18)"
Next in thread: Philippe Verdy: "RE: Proposal for matching negated sets (was Re: New Public Review Issue: Proposed Update UTS #18)"
Reply: Philippe Verdy: "RE: Proposal for matching negated sets (was Re: New Public Review Issue: Proposed Update UTS #18)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Oct 07 2007 - 20:32:09 CDT