L2/06-010

Editorial suggestions for UAX #9: 5.0.0

Asmus Freytag

I find section 3.4 in UAX#9 a bit tough to follow. The problem is that the text starts off with a number of digressions before getting to the actual 'reordering rules' themselves.

** Here's the current text, with digressions marked in green and with [] (and my comments in red and with **):

3.4 Reordering Resolved Levels

The following algorithm describes the logical process of finding the correct display order. [As described before, this logical process is not necessarily the actual implementation, which may diverge for efficiency as long as it produces the same results]. As opposed to resolution phases, this algorithm acts on a per-line basis, and is applied after any line wrapping is applied to the paragraph.

[The process of breaking a paragraph into one or more lines that fit within particular bounds is outside the scope of the bidirectional algorithm. Where character shaping is involved, it can be somewhat more complicated (see Section 8.2 Arabic of [Unicode]).] Logically there are the following steps:

The levels of the text are determined according to the bidirectional algorithm.
The characters are shaped into glyphs according to their context (taking the embedding levels into account for mirroring!).
The accumulated widths of those glyphs (in logical order) are used to determine line breaks.
- [Note that the soft-hyphen (SHY) works as it does in other scripts. (** the context for "other scripts" is buried above in a reference!) That is, it indicates a point where the line could be broken in the middle of a word. If the rendering system breaks at that point, the display — including shaping — should be what is appropriate for the given language. For more information on this and other line-breaking issues, see [UAX14].]
For each line, rules L1-L4 are used to reorder the characters on that line.
The glyphs corresponding to the characters on the line are displayed in that order.

L1. On each line, reset.....

** Here are suggested tweaks:

3.4 Reordering Resolved Levels

The following rules describe the logical process of finding the correct display order. As opposed to resolution phases, this algorithm acts on a per-line basis, and is applied after any line wrapping is applied to the paragraph.

Logically there are the following steps:

The levels of the text are determined according to the bidirectional algorithm.
The characters are shaped into glyphs according to their context (taking the embedding levels into account for mirroring!). (See also Section 3.5 Shaping).
The accumulated widths of those glyphs (in logical order) are used to determine line breaks (see [UAX#14])
For each line, rules L1-L4 are used to reorder the characters on that line.
The glyphs corresponding to the characters on the line are displayed in that order.

L1. On each line, reset.....

** I suggest to talk about rules, not algorithm here - everything happens
under the umbrella of the Bidi Algorithm, so it's confusing to have reordering
be another algorithm. That makes it unnecessary to reassert the qualification
about implementation vs. logical algorithm.

** However, it might be worth considering putting such a statement into section 3.1 right
before the paragraph "Combining characters". This is the place where the text
talks about the algorithm as a whole, and noting there that implementation is
different from logical specification makes a lot more sense than buried in
section 3.4.

** In Section 3.5, it makes sense to talk about Arabic (and related).
Currently the text starts off as if the reader is familiar with the concept:

3.5 Shaping

Shaping is logically applied after the bidirectional algorithm is used, ....

** Instead, I suggest:

3.5 Shaping

Cursively connected scripts, such as Arabic or Syriac, require the selection of positional character shapes that depend on adjacent characters (see Section 8.2 Arabic of [Unicode]). Shaping is logically applied after the bidirectional algorithm is used,...

** To capture the fine points about shaping and line breaking, why not introduce a numbered (or unnumbered) sub-section at the end of 3.5? That's a good place to give the issue the visibility it needs, without distracting from the main explanation. And by that time, the reader has considered the generic issues with shaping. Here's a suggested draft:

3.5.1 Shaping and line breaking

The process of breaking a paragraph into one or more lines that fit within particular bounds is outside the scope of the bidirectional algorithm. Where character shaping is involved, the width calculations must be based on the shaped glyphs.

Note that the soft-hyphen (SHY) works in cursively connected scripts as it does in other scripts. That is, it indicates a point where the line could be broken in the middle of a word. If the rendering system breaks at that point, the display — including shaping — should be what is appropriate for the given language. For more information on this and other line-breaking issues, see [UAX14].]