Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

From: Ilya Zakharevich <nospam-abuse_at_ilyaz.org>
Date: Wed, 23 Apr 2014 16:41:15 -0700

On Wed, Apr 23, 2014 at 09:21:04AM -0700, Asmus Freytag wrote:
> > a parsing is good if it satisfies all conditions below:
> >
> > 0) Some delimiters in the string are marked as “non-matching”; the rest
> > is broken into disjoint “matched” pairs;
> >
> > MATCH) A “matched” pair consists of an open-delimiter and matching close-
> > delimiter (in this order in the string).
> >
> > NEST) “Matched” pairs are properly nested (meaning that 2 pairs cannot be
> > positioned as Open1 Open2 Close1 Close2 in the string order).
> >
> > MINLEN) “Inside” a “matched” pair, every delimiter which could match elements
> > of the pair but is marked as “non-matching” must nest inside
> > some deeper-nested “matched” pair.
> >
> >(I hope that the meaning of the word “inside” in MINLEN is clear.)
> >
> > GREED) Given any close-delimiter marked as “non-matching”, its
> > pre-context does not contain any open-delimiter which could
> > match it.
> >
> > Here pre-context of a position is a concatenation of substrings of the
> > initial string:
> > • Take the most deeply nested “matched pair” containing the position
> > (if none, the whole string);
> > • take the part of the string inside this pair AND before the position;
> > • remove all “matched” pairs completely contained insidde this substring
> > together with what they enclose.
>
> This is a very nice formal definition. I'm surprised that your "GREED"
> statement needs such a complex auxiliary concept (pre-context).
>
> Can you explain why, if you make "pre-context" simply the part of the
> whole string that precedes the unmatched close-delimiter, the words
> "which could match it" are insufficient?

Aha, this means that my description is INCOMPLETE: you got a wrong
impression what “match” means! Everywhere, this word means exactly
the same as in the MATCH rule: that Unicode codepoints match following
Unicode properties.

This is non-recursive definition. All rules are independent. Without
complicated notion of pre-context, matching [] in

  ( [ ) ]

would be an acceptable match.

Thanks for your corrections,
Ilya
_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Wed Apr 23 2014 - 18:43:01 CDT

This archive was generated by hypermail 2.2.0 : Wed Apr 23 2014 - 18:43:01 CDT