From: Michael D. Adams (firstname.lastname@example.org)
Date: Sun Oct 17 2010 - 12:59:11 CDT
"The biggest challenge was not in creating those tables, but in
understanding the nuances of the rules, by the way."
Two questions so I can understand better.
First, by nuances do you mean the nuances of how the rules interact
(which I think would be simplified by using a definition as I have
proposed) or some other nuance?
Second, how did you create the state tables? There are standard,
automated, optimal techniques for turning a regular language into a
state table by turning the regular language into a finite state
automaton (this is partly why I'm advocating phrasing the rules as a
single regular language). Did you use something like that or some
On Sun, Oct 17, 2010 at 1:12 PM, Asmus Freytag <email@example.com> wrote:
> On 10/17/2010 7:01 AM, Michael D. Adams wrote:
>> This is something that not even the C++ and Java reference
>> implementations do (though it appears that the C++ implementation of
>> the W rules was originally derived from a regular expression as it
>> uses state tables, but if so it is undocumented). (Which by the way
>> they have not been proven to be equivalent, they have merely been
>> tested. Proof is a much more complicated formalism.)
> Having written the C++ reference implementation, I know a thing or two about
> it :)
> The two implementations were not formally proven to be equivalent - in a way
> that wasn't the purpose. The purpose was to see whether several (in this
> case two) implementers could read the rules and come up with the same
> When someone creates a real-world implementation to a specification like the
> Bidi Algorithm, it's not usually verified by formal proof, but by testing.
> Therefore, the exercise had to with finding out what level of testing was
> sufficient to capture inadvertent misapplication of some of the
> less-well-worded rules. (Some of them have since been rewritten to make
> their intent less ambiguous).
> Most of the issues were found with the test pass that simply compared all
> possible sequences up to length 6. That is better than the BidiTest.txt
> file, which I understand only goes to length 4. Stochastic sampling of
> sequences up to length 20 resulted in fewer reported discrepancies - again,
> all of this is from memory.
> For the test, the maximal depth of embeddings was set to 15 instead of 63,
> and the input were strings of bidi classes, not raw characters - therefore
> cutting down on the number of possible sequences.
> The Java implementation was deliberately designed to be transparent -
> matching the way the rules are formulated in an obvious way. For the C++
> implementation I wanted to do something different, and possibly faster, so I
> hand-coded a few state tables. The biggest challenge was not in creating
> those tables, but in understanding the nuances of the rules, by the way.
> The situation today is not fully comparable, since there was some feedback
> from the reference implementation project to the rules and especially their
This archive was generated by hypermail 2.1.5 : Sun Oct 17 2010 - 13:02:44 CDT