Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

From: Asmus Freytag <>
Date: Tue, 22 Apr 2014 09:06:27 -0700

On 4/22/2014 2:19 AM, Ilya Zakharevich wrote:
> I think the crucial problem is with
> 1( 2[ 3( 4] 5) 5b] 6)
> I have two possible interpretations: one matches 2 with 5b, another
> leaves 2 unmatched.


if you read UAX#9, the way the algorithm works is by pushing openers on
a stack, then, on finding the first closer, going down the stack and
attempting to locate a match, then, on finding a match, discarding any
enclosed openers, on not finding a match, discarding the closer.

(discard = ignore for further matching, don't treat as bracket any longer).

So, when we reach 4] we have


on the stack. The match is with 2[ and 3 is ignored. 1( remains and can
be matched later to 5).

Ultimately 5b] and 6) are ignored.

I believe that your scheme does not match the PBA in that it assumes
that brackets are hierarchical and attempts to preserve the best
hierarchy, whereas PBA assumes that pairs that are closer together are
more likely to be correct matches (for non-mathematical texts
hierarchies are not the norm (and they are shallow at best)).

What the PBA actually does can now be put into a definition plus a rule,
neither of which use "stack" or other implementation details, such as
"variables" or "lists".

D A bracket pair is a pair of an opening paired bracket and a closing
   paired bracket characters within the same isolating run sequence,
   such that the Bidi_Paired_Bracket property value of the former
   character or its canonical equivalent equals the latter character or
   its canonical equivalent.

R Characters are resolved into resolved bracket pairs as follows:
   Starting at the beginning of the text, when the a closing bracket
   is encountered, find the nearest preceding opening character that is
not part
   of a resolved pair, and not ignored for pair resolution and that can
form a
   bracket pair. If one exists, resolve the pair, and mark any enclosed
   brackets of any kind as ignored. Otherwise, if no pair can be
resolved, mark
   the closing bracket as ignored.

What this shows is that what the text in BD16 of UAX#9 tries to cover is
both a definition
and a rule; which makes it so difficult to follow.

I think what should be proposed is such a breakdown into a smaller
definition that
speaks to the matching of properties (modulo canonical equivalence) separate
from the strategy for resolving actual pairs, which is better stated as
a rule.

The rule does not need to use implementation language to be definite.

A "resolved" bracket pair is simply the actual pair resolved by rule "R"
and the
rest of the PBA acts on "resolved" pairs.


Unicode mailing list
Received on Tue Apr 22 2014 - 11:07:33 CDT

This archive was generated by hypermail 2.2.0 : Tue Apr 22 2014 - 11:07:33 CDT