Re: A simpler definition of the Bidi Algorithm

From: Asmus Freytag (
Date: Sun Oct 17 2010 - 12:12:36 CDT

  • Next message: Michael D. Adams: "Re: A simpler definition of the Bidi Algorithm"

      On 10/17/2010 7:01 AM, Michael D. Adams wrote:
    > This is something that not even the C++ and Java reference
    > implementations do (though it appears that the C++ implementation of
    > the W rules was originally derived from a regular expression as it
    > uses state tables, but if so it is undocumented). (Which by the way
    > they have not been proven to be equivalent, they have merely been
    > tested. Proof is a much more complicated formalism.)
    Having written the C++ reference implementation, I know a thing or two
    about it :)

    The two implementations were not formally proven to be equivalent - in a
    way that wasn't the purpose. The purpose was to see whether several (in
    this case two) implementers could read the rules and come up with the
    same results.

    When someone creates a real-world implementation to a specification like
    the Bidi Algorithm, it's not usually verified by formal proof, but by
    testing. Therefore, the exercise had to with finding out what level of
    testing was sufficient to capture inadvertent misapplication of some of
    the less-well-worded rules. (Some of them have since been rewritten to
    make their intent less ambiguous).

    Most of the issues were found with the test pass that simply compared
    all possible sequences up to length 6. That is better than the
    BidiTest.txt file, which I understand only goes to length 4. Stochastic
    sampling of sequences up to length 20 resulted in fewer reported
    discrepancies - again, all of this is from memory.

    For the test, the maximal depth of embeddings was set to 15 instead of
    63, and the input were strings of bidi classes, not raw characters -
    therefore cutting down on the number of possible sequences.

    The Java implementation was deliberately designed to be transparent -
    matching the way the rules are formulated in an obvious way. For the C++
    implementation I wanted to do something different, and possibly faster,
    so I hand-coded a few state tables. The biggest challenge was not in
    creating those tables, but in understanding the nuances of the rules, by
    the way.

    The situation today is not fully comparable, since there was some
    feedback from the reference implementation project to the rules and
    especially their wording.


    This archive was generated by hypermail 2.1.5 : Sun Oct 17 2010 - 12:17:38 CDT