Re: A simpler definition of the Bidi Algorithm

From: Michael D. Adams (
Date: Sun Oct 17 2010 - 12:59:11 CDT

  • Next message: Asmus Freytag: "Re: A simpler definition of the Bidi Algorithm"

    "The biggest challenge was not in creating those tables, but in
    understanding the nuances of the rules, by the way."

    Two questions so I can understand better.

    First, by nuances do you mean the nuances of how the rules interact
    (which I think would be simplified by using a definition as I have
    proposed) or some other nuance?

    Second, how did you create the state tables? There are standard,
    automated, optimal techniques for turning a regular language into a
    state table by turning the regular language into a finite state
    automaton (this is partly why I'm advocating phrasing the rules as a
    single regular language). Did you use something like that or some
    other way?

    On Sun, Oct 17, 2010 at 1:12 PM, Asmus Freytag <> wrote:
    >  On 10/17/2010 7:01 AM, Michael D. Adams wrote:
    >> This is something that not even the C++ and Java reference
    >> implementations do (though it appears that the C++ implementation of
    >> the W rules was originally derived from a regular expression as it
    >> uses state tables, but if so it is undocumented).  (Which by the way
    >> they have not been proven to be equivalent, they have merely been
    >> tested.  Proof is a much more complicated formalism.)
    > Having written the C++ reference implementation, I know a thing or two about
    > it :)
    > The two implementations were not formally proven to be equivalent - in a way
    > that wasn't the purpose. The purpose was to see whether several (in this
    > case two) implementers could read the rules and come up with the same
    > results.
    > When someone creates a real-world implementation to a specification like the
    > Bidi Algorithm, it's not usually verified by formal proof, but by testing.
    > Therefore, the exercise had to with finding out what level of testing was
    > sufficient to capture inadvertent misapplication of some of the
    > less-well-worded rules. (Some of them have since been rewritten to make
    > their intent less ambiguous).
    > Most of the issues were found with the test pass that simply compared all
    > possible sequences up to length 6. That is better than the BidiTest.txt
    > file, which I understand only goes to length 4. Stochastic sampling of
    > sequences up to length 20 resulted in fewer reported discrepancies - again,
    > all of this is from memory.
    > For the test, the maximal depth of embeddings was set to 15 instead of 63,
    > and the input were strings of bidi classes, not raw characters - therefore
    > cutting down on the number of possible sequences.
    > The Java implementation was deliberately designed to be transparent -
    > matching the way the rules are formulated in an obvious way. For the C++
    > implementation I wanted to do something different, and possibly faster, so I
    > hand-coded a few state tables. The biggest challenge was not in creating
    > those tables, but in understanding the nuances of the rules, by the way.
    > The situation today is not fully comparable, since there was some feedback
    > from the reference implementation project to the rules and especially their
    > wording.
    > A./

    This archive was generated by hypermail 2.1.5 : Sun Oct 17 2010 - 13:02:44 CDT