PRI #279, Background for Proposed Update to UAX #9 for Unicode 8.0

2014/08/25

The substantial modifications introduced in Version 6.3 of the Unicode Bidirectional Algorithm added a level of complexity which made the resolution of certain edge cases inconsistent with the overall intent of the algorithm and the results obtained in the more common usage patterns. The update being proposed for Version 8.0 consists of changes to the formal definition of the algorithm to address the inconsistencies in those edge cases.

The proposed modifications for Version 8.0 of the algorithm will result in a slightly different reordering compared to that obtained by a literal interpretation of the current version of the specification. Given the rare incidence of the affected cases, the benefit of correcting the specification outweighs the cost that a formal change incurs.

Furthermore, some implementations of the Unicode Bidirectional Algorithm as of Versions 6.3 and 7.0 may have already interpreted the specification as intended rather than explicitly stated and may have handled the respective edge cases accordingly. The proposed update eliminates any ambiguity in the interpretation of the affected parts of the algorithm, helping achieve better interoperability.

The proposed update for Version 8.0 addresses three specific issues, as described below. To locate the proposed changes in the text of UAX #9, follow the links from its Modifications section.

1. The proposed update corrects the behavior of an isolating run sequence in the particular context when it is tightly flanked by embeddings and enclosed within overrides. Prior to the proposed change, an isolating run sequence did not behave like a neutral in that context, which was contrary to the role of isolates. The fix consists of an update to rules X5a, X5b, and X6a.

2. When the bidirectional type of a bracket character is overridden to a strong L or R by an explicit directional override, then the bracket should no longer be a candidate for a bracket pair, as its overridden bidirectional type indicates the author’s intent to treat the bracket in a specific way rather than neutrally. Without this change, in certain conditions, a paired bracket may end up resolving to a direction opposite that intended by the directional override. The fix consists of updates to definitions BD14 and BD15 and a clarification in the preamble to rule N0.

3. When a paired bracket is accompanied by nonspacing marks, the resolved direction of the marks should always be the same as that of the bracket base, such that the sequence of bracket and nonspacing marks stay whole, as any combining character sequence would. The fix consists of an update to rule N0 of the algorithm.