Re: A real bug in bidi

From: Mark Davis (mark.davis@us.ibm.com)
Date: Tue Jan 16 2001 - 17:35:41 EST


My apologies for not answering you earlier. I knew I had sent you one
reply, but it was for a different report.

Doug Felt here confirmed that this is a bug in the implementation section.
While it does not affect the conformance of the main algorithm, it would
affect people trying to use that optimization strategy. (we here don't use
that strategy, by the way). We think that the implementation strategy could
be changed to still work, but for now we would recommend removing the
characters.

Mark
___
Mark Davis, IBM GCoC, Cupertino
(408) 777-5850 [fax: 5891], mark.davis@us.ibm.com, president@unicode.org
http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=10275+N.+De+Anza&csz=95014

Roozbeh Pournader <roozbeh@sharif.edu> on 01-05-2001 06:16:03

To: Mark Davis/Cupertino/IBM@IBMUS, Unicode List <unicode@unicode.org>,
      errata@unicode.org
cc: Behdad Esfahbod <behdad@bamdad.org>
Subject: A real bug in bidi

Dear Unicoders,

This time I think we have found a real bug in the Bidirectional Algorithm.
The problem is that the algorithm seems to be contradictory with itself. We
were trying to use the "Implementation Notes" at the end of UTR#9 to
retain the format codes. But that doesn't produce the same results as when
removing them in rule X9. We really appreciate any comments.

Would you please take your pencils out? ;)

Our example is probably not the simplest case, but is small enough:

     U+202B U+05D1 U+202C U+0031 U+202D U+0061 U+202C
     <RLE> BET <PDF> 1 <LRO> a <PDF>

When we run the algorithm with the notes in "Retaining Format Codes", we
get the following levels:

     <RLE> BET <PDF> 1 <LRO> a <PDF>
          1 3 3 2 1 2 1

which according to L2 becomes:

     <PDF> a <LRO> <PDF> BET 1 <RLE>

when rendered visually. That's "a BET 1". But when the format codes are
removed in X9, the levels will be:

     BET 1 a
      3 2 2

which becomes "BET 1 a" when rendered. So the order is different, you see.

(I do not claim anything about the user expectation in the example, because
both are against my expectation. I expected "a 1 BET". I also appreciate
comments on your expectations.)

We may have made a mistake, I know, but we have checked that many times.
I'm giving the medial results I obtained from running the algorithm while
retaining format codes here:

Original character types: "RLE R PDF EN LRO L PDF"

      P1-P3: paragraph embedding level becomes 1.
      X1-X8: levels become "? 3 ? 1 ? 2 ?".
modified X9: types become "BN R BN EN BN L BN",
             levels become "1 3 3 1 1 2 2".
        X10: four runs, (sor, eor) are (R, R), (R, R), (R, L), (L, L).
      W1-W5: no change.
modified W6: types become "ON R ON EN ON L ON".
         W7: no change.
         N1: types become "R R R EN ON L L"
         N2: types become "R R R EN R L L"
      I1-I2: levels become "1 3 3 2 1 2 2".
modified L1: levels become "1 3 3 2 1 2 1".
         L2: the ordering becomes "<PDF> a <LRO> <PDF> BET 1 <RLE>".

--roozbeh



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT