Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

From: Asmus Freytag <asmusf_at_ix.netcom.com>
Date: Thu, 24 Apr 2014 00:28:50 -0700

On 4/23/2014 7:37 PM, Philippe Verdy wrote:
> Thanks for the clear reply, now I know that my example in a prior
> message would work appropriately with UBA:
>
> This is an [«] ARABIC EXAMPLE [»] for demonstration only.
>
> Because:
> - the opening guillemet is not stripped out of the context stack when
> the first closing bracket is matched with the first opening bracket,
This is _*incorrect*_, see the text in blue/bold in the definition
copied below.
The second bullet in item 3 of the second second-level bullet of the
third top-level bullet of BD16 clearly says that all elements that are
above the matched element are popped together with it.
> - later the closing guillemet matches the opening guillemet remaining
> on the stack,
No, this is_*incorrect*_, because the stack has been popped.

The problem with the "stack" in this algorithm is that it isn't a stack.
A stack is a data structure that allows you to manipulate the top
element. This data structure is simply a list, to which elements are
appended, as opening brackets are found, and which then is scanned (from
the tail) for a match, and, on meeting a match, the tail is trimmed.

Item "4" is the one that does the iteration in scanning the tail. After
one or more iterations, item "3" no longer operates on what would have
been the "top" element of a "stack", but deep in the tail of a list.
When the items are "popped" it's equivalent to dropping the tail.
(Unlike your interpretation, which would remove individual elements,
this language clearly refers to multiple elements.)
> even if the second opening bracket was pushed on top of it : pair of
> guillemets is matched, the opening guillement is dropped from the
> stack but the second bracket on top of it remains there and can also
> match now the following closing bracket.
>
> So brackets pairs can effectively overlap non hierarchically.

BD16. A /bracket pair/ is a pair of characters consisting of an /opening
paired bracket/ and a /closing paired bracket/ such that the
Bidi_Paired_Bracket property value of the former or its canonical
equivalent equals the latter or its canonical equivalent and which are
algorithmically identified at specific text positions within an
/isolating run sequence/. The following algorithm identifies all of the
/bracket pairs/ in a given /isolating run sequence/:

  * Create a stack for elements each consisting of a bracket character
    and a text position. Initialize it to empty.
  * Create a list for elements each consisting of two text positions,
    one for an opening paired bracket and the other for a corresponding
    closing paired bracket. Initialize it to empty.
  * Inspect each character in the isolating run sequence in logical order.
      o If an opening paired bracket is found, push its
        Bidi_Paired_Bracket property value and its text position onto
        the stack.
      o If a closing paired bracket is found, do the following:
         1. Declare a variable that holds a reference to the current
            stack element and initialize it with the top element of the
            stack.
         2. Compare the closing paired bracket being inspected or its
            canonical equivalent to the bracket in the current stack
            element.
         3. If the values match, meaning the two characters form a
            bracket pair, then
              + Append the text position in the current stack element
                together with the text position of the closing paired
                bracket to the list.
              + **Pop the stack _through the current stack element
                inclusively_.**
         4. Else, if the current stack element is not at the bottom of
            the stack, advance it to the next element deeper in the
            stack and go back to step 2.
         5. Else, continue with inspecting the next character without
            popping the stack.
  * Sort the list of pairs of text positions in ascending order based on
    the text position of the /opening paired bracket/.

>
> But still there's a problem:

The remainder of the problems can't be discussed, because the premise is
wrong (see above).

A./

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Thu Apr 24 2014 - 02:30:14 CDT

This archive was generated by hypermail 2.2.0 : Thu Apr 24 2014 - 02:30:16 CDT