L2/11-418

On the Stability and proposed Extensions for the Unicode Bidi Algorithm

Asmus Freytag
October 28, 2011

Background

The various recent proposals for amending the Bidi Algorithm strike me as a not insignificant departure from the basic Bidi Algorithm and essentially in contradiction to the spirit if not the letter of the stability guarantees. They bring with them, therefore, the risks of instability and incompatibility. The Bidi Algorithm is one of only a few algorithms that is required for Unicode conformance, and, at the same time, it has been held very stable. This has reduced the amount of divergence among implementations. Most "changes" to the algorithm, recently, have been in the nature of "clarifications" of edge cases, rather than modifications. Because UBA is such a basic and strongly required algorithm, stability guarantees are especially important. This includes the implicit guarantee that the Bidi classes are the complete description of a character under the UBA.

Effective Stability

The way I have always parsed the "spirit" of the stability guarantees for the Bidi Algorithm is that it was effectively stable - except as to the additions of new letters (and perhaps minor bug-fixes for the rules). The default property assignments for unassigned characters were carefully chosen to minimize disruptions when these characters were eventually assigned.

The written and unwritten policies for maintaining the Bidi Algorithms effectively provided several important guarantees.

Implementations could be written in a way that only required updating the property tables to account for new characters (leaving aside the occasional 'bug fix").
In addition, the default character properties were designed such that the addition of characters would cause minimal disruptions.

As a result, you could expect any existing implementation to show the same Bidi ordering for the vast majority of texts containing characters beyond the ones that it was explicitly updated for. This maximizes interoperability.

After a long quiescent period, there are now many ideas and suggestions for fixing perceived or real shortcomings of the existing bidi algorithm. As Martin Duerst wrote recently : "it looks like these changes are being added piecemeal without yet seeing a new horizon of stability...but the Bidi Algorithm isn't an area where constant tinkering is advisable. It would therefore be very important that all these new initiatives are carefully checked against each other, and coordinated both in timing and in substance. It may be well advisable to wait with some of them so that many changes can be made 'in bulk' (the idea of an UBA 2.0), which will also help implementers."

UBA2.0

I share this concern, and would support an effort towards a UBA 2.0 which addresses a comprehensive set of updates.

New set of Bidi classes

Because of the nature of the proposed changes, this new specification would be disruptive. Therefore, I see little benefit in making it subject to formal limitations on the number of Bidi classes etc. that were in place for the existing Bidi Algorithm. However, like the existing algorithm, any updated one should be based on Bidi classes as input, cleanly separating the mapping of characters to Bidi classes from the leveling and reordering calculus.

The temptation to mix specification in terms of character codes with specification in terms of Bidi classes should be firmly resisted.

New Versioning

There should be a clean versioning of this new algorithm, that is independent of mere versioning of the Unicode Standard. Existing implementations of the "old" bidi algorithms aren't going to go away overnight, and, because the way the Bidi Algorithm is designed, they may well be updated to handle future repertoire additions.

The "new" bidi algorithm could formally be an extension or a replacement. It's too early to tell which makes better sense, but it should have a designation that decouples that extension from questions of repertoire (and versioning for "bug-fixes" or other minor tweaks).

Separate Development And Beta Periods

The proposed and contemplated changes to the Bidi Algorithm call for a separate development effort that is not a-priori tied to the schedule of Unicode Versions. As with all significant developments in specifying algorithms, an extended Beta, or testing period is desirable, which should only terminate after significant stakeholder have been able to produce actual working testbeds.