BidiTest.txt Improvements

L2/09-354
2009-oct-21
Markus Scherer
Note: Unicode 5.2 adds the BidiTest.txt file with conformance test data.

Technical Suggestions/Questions

T1. Each input string is processed with one of three paragraph levels. One of them is "automatic" and defaults to LTR. (The others are LTR and RTL.)


T2. There are @Type lines that list all of the characters for each Bidi_Class. They are unrelated to what the rest of the file is testing, and the data can be easily extracted from

DerivedBidiClass.txt.

Editorial Feedback

I followed the format specification in the BidiTest.txt file for my test code implementation, without having been involved in the design of the file. The following is my feedback.

E1. The text does not specify the low-level format, like which whitespace characters are used and what a meaningful group of @Levels, @Reorder and <input> lines looks like.
  • I suggest specifying that whitespace is allowed between all tokens, and can be a space (U+0020) or a TAB (U+0009) -- if TABs are retained.
  • I suggest specifying that a test group consists of at most one @Levels and at most one @Reorder line, in any order, followed by zero or more input lines.
  • I suggest specifying that lines starting with @ but followed by an unrecognized string can occur among the @Levels and @Reorder lines and should be ignored.

E2. The file uses TABs (U+0009). TABs are formatted in program-specific ways. (Editor, shell, ...)
  • I suggest only using spaces (U+0020).

E3. (If T2 is not done) The file has a few extremely long lines: @Type: L: [A-Za-z......] is 3495 characters long (plus the line ending). Half of the @Type lines are over 80 characters long. That can be hard on editors and side-by-side diffs.
  • If the @Type lines are retained, then I suggest a maximum line length of 80 characters. This would require new syntax for line breaks inside @Type or maybe @Anything.

E4. Specify that the "automatic" paragraph level defaults to LTR. While this is not formally necessary since that is the default in the spec, it would clarify the text.

E5. Unclear documentation for the "bitset for paragraph levels": Is this a decimal or a hexadecimal value? It will matter when more flags are added.
  • I suggest specifying that the bitset is a hexadecimal value.

E6. Unclear documentation for @Reorder values.
  • I suggest clarifying that ordering[x]=y means that visual index x corresponds to logical index y.