Reference implementations and test data for the Unicode BIDI algorithm?

From: Harald Alvestrand (
Date: Thu Jun 07 2007 - 07:34:37 CDT

  • Next message: Christopher Fynn: "Re: Devanagari composing help needed"

    Having failed to find anything, I appeal to this list...

    as part of the (slowly moving) investigation into the requirements for
    using RTL scripts in domain names, I have been checking out the
    properties of the Unicode BIDI algorithm.

    One problem I have is that there seems to be a dearth of test datasets
    to test an implementation against; my investigation of the Unicode
    "reference" implementation has revealed that the C++ and C
    implementations are basically toys, fit for verifying an algorithm, but
    totally useless for real data; they assign random directional properties
    to the ASCII characters and use that for testing the algorithm.

    (I have not looked at the Java one).

    Can anyone point me at:

    1) An implementation of the Unicode BIDI algorithm that can take real
    Unicode data and return something that I can verify (either the list of
    characters in display order or the list of indexes to which I should
    remap the characters)?

    2) Some test dataset of "real" (linguistically sensible, not just random
    characters) that has been verified by hand to display as expected after
    running through the Bidi algorithm? (Ideal would be input/output pairs
    for the implementation above, of course)

    Any hints are greatly appreciated!


    This archive was generated by hypermail 2.1.5 : Thu Jun 07 2007 - 07:36:44 CDT