Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

From: Eli Zaretskii <eliz_at_gnu.org>
Date: Tue, 22 Apr 2014 19:02:00 +0300

> Date: Mon, 21 Apr 2014 23:25:05 -0700
> From: Asmus Freytag <asmusf_at_ix.netcom.com>
> Cc: verdy_p_at_wanadoo.fr, ken_at_unicode.org, Eli Zaretskii <eliz_at_gnu.org>,
> James Clark <jjc_at_jclark.com>,
> unicode Unicode Discussion <unicode_at_unicode.org>
>
> > And I think I can even invent an example which I cannot parse using
> > your definition:
> >
> > 1( 2[ 3( 4] 5) 6)
> >
> > Is looking-at-1 forcing match of 3-and-5? Or what?
>
>
> Let's see what the text gives (before we improve it further).
>
> 1. - 1( or 3( could match 5) or 6) , 2[ could only match 4]
>
> a. - we have only one isolating run, so this is a no-op
> b. - all opening characters follow their putative closing characters, so
> this is a no-op
> d. - at location 5 is the earliest opportunity to match a pair
> (before we get to 5 we don't have a opening and closing)
> c. - we could match 1( or 3( but we use 3, because it spans less text
> e. , f. - can probably combine these, but 4] is now inside a resolved
> pair and is ignored.
>
> Now, when we reach 6) we have another pair, and per d, it's the earliest
> possible moment
> we can resolve it, so we match 1) and 6).

But that's wrong, isn't it? If I follow the algorithm in BD16 (which
is really our only reference at this point), I get
this:

    input results
    1( push 1)
    2[ push 2]
    3( push 3)
    4] produce a pair 2[ 4] and pop through and including 2]
    5) produce 1( 5) and pop the entire stack
    6) nothing (remains unmatched)

The reference implementation (after I managed to understand how to
invoke it for this case) agrees with me.

This once again underlines the problem with the original "definition"
in BD16, which does not lend itself to a useful and yet intuitive
notion of what is "right".

> Eli's definition starts
>
> A bracket pair is a pair of an opening paired bracket and a closing
> paired bracket characters within the same isolating run sequence,
> such that the Bidi_Paired_Bracket property value of the former
> character or its canonical equivalent equals the latter character or
> its canonical equivalent, ....
>
> and continues:
>
> ....and all the opening and closing bracket
> characters in between these two are balanced.
>
> That continuation we found out was incorrect, so we would need to fix it.

Indeed.

> Here's an attempt:
>
> ... subject to the following conditions:
>
>
> a. a match is attempted at the left-most closing bracket character
> unmatched at this point
> b. the closest earlier matching opening bracket, that is unmatched
> at this point is used to form the pair
> c. any unmatched bracket character enclosed in a pair is ignored
> for further matching
> d. matching ends when no more pairs can be formed

I agree, but let me try to say the same more concisely:

   A bracket pair is a pair of an opening paired bracket and a closing
   paired bracket characters within the same isolating run sequence,
   such that the Bidi_Paired_Bracket property value of the former
   character or its canonical equivalent equals the latter character
   or its canonical equivalent, and provided that a closing bracket is
   matched to the closest match candidate, disregarding any candidates
   that either already have a closer match, or are enclosed in a
   matched pair of other 2 bracket characters.
_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Tue Apr 22 2014 - 11:03:33 CDT

This archive was generated by hypermail 2.2.0 : Tue Apr 22 2014 - 11:03:34 CDT