Re: BIDI: possible fix

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sun May 20 2001 - 18:31:26 EDT


As a point of information, implementing this change would seem to require
changing 10 bytes in my C/C++ bidi reference implementation. I write 'would
seem' since making any change would require verification with the Java
implementation. Such verification takes a lot of time and effort to make
sure no other changes are introduced inadvertently. Going beyond the two
reference implementations, the cost in terms of possible divergence of
implementations would be even higher.

Based on the fact that any changes, however well intentioned, are going to
have such high costs attached to them it is reasonable to ask how
compelling the case is for this change. [Of course, had we had this
discussion at the time of the original design of the algorithm ten years
ago, we might have come to a different conclusion.]

We need to remember that the purpose of the bidi algorithm is to handle
plain text. Given that, a limitation of this kind seems not very troubling.
In most practical situations where the purpose is to highlight parts of the
text, one would not use combining underscore (0332) but rich text
underlining (e.g. in HTML). This would produce the correct results without
change in a stable algorithm.

The use of combining underscore for regular text underlining is surely
somewhat deprecated. Perhaps we need to review the language in the book as
part of our rewrite for Unicode 4.0.

A./

PS: BTW: I have posted an updated source listing to the C/C++ reference
algorithm on
http://www.unicode.org/~asmus/bidi/bidi.cpp
I have made no changes to the implementation of the actual algorithm, but I
have corrected some of the comments in the source code and fixed some
issues with the code that invokes it for demo purposes.
However, the code samples for TR9 should be updated since the incorrect
demo mode driver has been generating spurious complaints that the actual
algorithm is not implemented correctly. UTC and bidi committee members:
Please review the updated code and help verify that the changes do not
affect the implementation of the algorithm as claimed.

At 09:34 AM 5/16/01 +0430, Roozbeh Pournader wrote:

>In the bidirectional algorithm, rule W1 states that all non-spacing mark
>should change to the type of the previous character. Rule W4, being the
>only rule refering to "a single" something, specifies that a single
>European separator between two European numbers changes to a European
>number, and a sinlge common separator between two numbers of the same type
>changes to that type.
>
>Do you agree with me that rule W4 should be fixed to also change the type
>if there are NSMs over the separator? They should be counted as one I
>mean, so if I want to underline a separator, that underlined separator
>should count as one normal separator, not two.
>
>--roozbeh



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT