Re: Unicode Bidi Algorithm – Java reference implementation

From: Ken Whistler <kenwhistler_at_att.net>
Date: Sun, 18 Sep 2016 17:16:50 -0700

On 9/17/2016 10:26 AM, Deepak Jois wrote:
> I now need to make the updates to support the changes in Unicode 8.0,
> and I am finding it a bit hard to grok the changes in C at a glance.
>

The UBA 7.0 --> UBA 8.0 changes were rather subtle. They did not change
much about the gross behavior of the algorithm, but there were some
fixes for edge cases in a couple rules. Also, the specification of
behavior on stack overflow became exact, rather than implementation-defined.

The C bidi reference code is a bit complicated, because it supports
*all* UBA versions from 6.2 through 8.0, which means it has to special
case rule processing by versions when the specification itself changes.

If you diff the 7.0 version of brrule.c and the 8.0 version of brrule.c
you'll find the heart of the differences there, along with explanations
in comments for the changes. The new function br_SetBracketPairBC
handles an edge case for combining marks following a bracket. The code
using a new flag testONisNotRequired deals with an edge case for the
current Bidi_Class of brackets being tested for pairing. Changes in
br_PushBracketStack are involved in the need to keep the pre-8.0
behavior as it was for earlier versions of bidiref, but allowing for
explicit behavior for stack overflow for 8.0.

It may also help to compare the 7.0 and 8.0 versions of UAX #9 itself,
so you can see the textual changes in the specification of the rules.
Try diffing:

http://www.unicode.org/reports/tr9/tr9-31.html (7.0)
http://www.unicode.org/reports/tr9/tr9-33.html (8.0)

The significant changes there are in BD11, BD14, BD15, BD16, and in
rules X5a, X5b, X6a, and N0. (The rest of the changes in the updated
document are cosmetic.)

--Ken
Received on Sun Sep 18 2016 - 19:17:29 CDT

This archive was generated by hypermail 2.2.0 : Sun Sep 18 2016 - 19:17:30 CDT