Re: Reference implementations and test data for the Unicode BIDI algorithm?

From: Mark Davis (mark.davis@icu-project.org)
Date: Thu Jun 07 2007 - 10:39:35 CDT

  • Next message: Marnen Laibow-Koser: "Re: Devanagari composing help needed"

    For #1 you can use the ICU implementation. Markus Scherer can tell you more
    about it as well.

    Mark

    On 6/7/07, Harald Alvestrand <harald@alvestrand.no> wrote:
    >
    > Having failed to find anything, I appeal to this list...
    >
    > as part of the (slowly moving) investigation into the requirements for
    > using RTL scripts in domain names, I have been checking out the
    > properties of the Unicode BIDI algorithm.
    >
    > One problem I have is that there seems to be a dearth of test datasets
    > to test an implementation against; my investigation of the Unicode
    > "reference" implementation has revealed that the C++ and C
    > implementations are basically toys, fit for verifying an algorithm, but
    > totally useless for real data; they assign random directional properties
    > to the ASCII characters and use that for testing the algorithm.
    >
    > (I have not looked at the Java one).
    >
    > Can anyone point me at:
    >
    > 1) An implementation of the Unicode BIDI algorithm that can take real
    > Unicode data and return something that I can verify (either the list of
    > characters in display order or the list of indexes to which I should
    > remap the characters)?
    >
    > 2) Some test dataset of "real" (linguistically sensible, not just random
    > characters) that has been verified by hand to display as expected after
    > running through the Bidi algorithm? (Ideal would be input/output pairs
    > for the implementation above, of course)
    >
    > Any hints are greatly appreciated!
    >
    > Harald
    >
    >
    >
    >
    >

    -- 
    Mark
    


    This archive was generated by hypermail 2.1.5 : Thu Jun 07 2007 - 10:42:46 CDT