Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

From: Asmus Freytag <>
Date: Wed, 23 Apr 2014 18:15:44 -0700

On 4/23/2014 4:41 PM, Ilya Zakharevich wrote:
> On Wed, Apr 23, 2014 at 09:21:04AM -0700, Asmus Freytag wrote:
>>> a parsing is good if it satisfies all conditions below:
>>> 0) Some delimiters in the string are marked as “non-matching”; the rest
>>> is broken into disjoint “matched” pairs;
>>> MATCH) A “matched” pair consists of an open-delimiter and matching close-
>>> delimiter (in this order in the string).
>>> NEST) “Matched” pairs are properly nested (meaning that 2 pairs cannot be
>>> positioned as Open1 Open2 Close1 Close2 in the string order).
>>> MINLEN) “Inside” a “matched” pair, every delimiter which could match elements
>>> of the pair but is marked as “non-matching” must nest inside
>>> some deeper-nested “matched” pair.
>>> (I hope that the meaning of the word “inside” in MINLEN is clear.)
>>> GREED) Given any close-delimiter marked as “non-matching”, its
>>> pre-context does not contain any open-delimiter which could
>>> match it.
>>> Here pre-context of a position is a concatenation of substrings of the
>>> initial string:
>>> • Take the most deeply nested “matched pair” containing the position
>>> (if none, the whole string);
>>> • take the part of the string inside this pair AND before the position;
>>> • remove all “matched” pairs completely contained insidde this substring
>>> together with what they enclose.
>> This is a very nice formal definition. I'm surprised that your "GREED"
>> statement needs such a complex auxiliary concept (pre-context).
>> Can you explain why, if you make "pre-context" simply the part of the
>> whole string that precedes the unmatched close-delimiter, the words
>> "which could match it" are insufficient?
> Aha, this means that my description is INCOMPLETE: you got a wrong
> impression what “match” means! Everywhere, this word means exactly
> the same as in the MATCH rule: that Unicode codepoints match following
> Unicode properties.
> This is non-recursive definition. All rules are independent.

That explains why you repeat most of the other constraints in your

> Without
> complicated notion of pre-context, matching [] in
> ( [ ) ]
> would be an acceptable match.
> Thanks for your corrections,
> Ilya
For a static definition, would it have been simpler to break the
definition into
two - say a "tentative parsing" (all conditions but greed) and "selected
which the could be defined as the parsing that starts closest to the left.

(I don't have the time as I write this to work out whether that's the
condition, as I am about to board a ride, but just as a trigger to
thought what
a split definition might achieve).


Unicode mailing list
Received on Wed Apr 23 2014 - 20:17:16 CDT

This archive was generated by hypermail 2.2.0 : Wed Apr 23 2014 - 20:17:17 CDT