Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

From: Eli Zaretskii <eliz_at_gnu.org>
Date: Mon, 21 Apr 2014 10:55:59 +0300

> Date: Sun, 20 Apr 2014 12:58:23 -0700
> From: Asmus Freytag <asmusf_at_ix.netcom.com>
>
> On 4/20/2014 3:24 AM, Eli Zaretskii wrote:
> > Would someone please help understand the following subtleties and
> > obscure language in the UBA document found at
> > http://www.unicode.org/reports/tr9/? Thanks in advance.
>
> Eli,
>
> I've tried to give you some explanations

Thanks!

> in some places, I concur with you that the wording could be improved
> and that such improved wording should be proposed to the UTC (or its
> editorial committee) for incorporation into a future update.

How do we do that?

> For details, see below.
> >
> > 1. In paragraph 3.1.2, near its very end, we have this sentence (with
> > my emphasis):
> >
> > As rule X10 will specify, an isolating run sequence is the unit to
> > which the rules following it are applied, and the last character of
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > one level run in the sequence is considered to be immediately
> > followed by the first character of the next level run in the
> > sequence during this phase of the algorithm.
> >
> > What does it mean here by "the rules following it"? Following what?
>
> That looks like a bad referent, but from context, this "it" must be X10

Ah, so simply saying "the following rules" or "rules following X10"
would be enough.

> Bullet 1 could be changed to
>
> . Create a stack for elements each consisting of a*code point* (Bidi_Paired_Bracket property value)
> and a text position. Initialize it to empty.
>
> to make things more clear. And a slight wording change might help the
> reader with item 2:
>
> 2. Compare the*code point for the*closing paired bracket being inspected or its
> canonical equivalent to the*code poin*t (Bidi_Paired_Bracket property value) in the current stack
> element.
>
>
> And, to continue
>
> 3. If the values match, meaning*the character being inspected and the character**
> ** at the text position in the stack* form a bracket pair, then [...]

Right, this makes the description a whole lot more clear.

> Apply rules W1–W7, N0–N2, and I1–I2 to each of the isolating run sequences.
> For each sequence, [completely] apply each rule in the order in which they appear below.
> The order that one isolating run sequence is treated relative to another does not matter.
>
> I believe the above restatement expresses the same thing in fewer words.

It does, thanks.

> > 5. Rule N0 says:
> >
> > . For each bracket-pair element in the list of pairs of text positions
> >
> > a. Inspect the bidirectional types of the characters enclosed
> > within the bracket pair.
> > b. If any strong type (either L or R) matching the embedding
> > direction is found, set the type for both brackets in the pair
> > to match the embedding direction.
> >
> > First, what is meant here by "strong type [...] matching the embedding
> > direction"? Does the "match" here consider only the odd/even value of
> > the current embedding level vs R/L type, in the sense that odd levels
> > "match" R and even levels "match" L? Or does this mean some other
> > kind of matching? Table 3, which the only place that seems to refer
> > to the issue, is not entirely clear, either:
> >
> > e The text ordering type (L or R) that matches the embedding level
> > direction (even or odd).
> >
> > Again, the sense of the "match" here is not clear.
>
> even/odd --- R/L match, might be made more explicit

I agree this should be made more explicit, as this is a somewhat
subtle issue that might trip the reader.

> > Next, what is meant here by "the characters enclosed within the
> > bracket pair"? If the bracket pair encloses another bracket pair,
> > which is inner to it, do the characters inside the inner pair count
> > for the purposes of resolving the level of the outer pair?
> They do, so there's no need to change the text.

It might be a good idea to say that explicitly, e.g. as a note, or at
least provide another example where the strong characters are only
inside an inner bracket pair, which will send the same message to the
reader.

Thanks again for the clarifications.
_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Mon Apr 21 2014 - 02:57:38 CDT

This archive was generated by hypermail 2.2.0 : Mon Apr 21 2014 - 02:57:38 CDT