RE: 3 big bidi bugs

From: Jonathan Rosenne (
Date: Wed May 29 2002 - 15:10:48 EDT

I don't think anything to do with 5 levels of imbedding or overrides can
be considered a big bug.


> -----Original Message-----
> From:
> [] On Behalf Of Bernard Miller
> Sent: Wednesday, May 29, 2002 6:57 PM
> To:
> Subject: 3 big bidi bugs
> This letter describes 3 major technical problems with the
> current Unicode bidirectional algorithm as described in UAX
> #9, version 3.20. Problems 1 and 3 have security
> implications. Other problems with the whole Unicode
> bidirectional encoding approach, and their solutions, are
> discussed in the recently updated Bytext FAQ and
> documentation (
> (1) Line width dependent mangling, general case:
> Step L2 of UAX #9 indicates that a line that resolves into a
> sequence of characters with homogenous embedding levels will
> ALWAYS be displayed right to left, regardless of what the
> embedding level is.
> So, for example a line that with the L1 resolved embedding
> levels of: 2222222222222222222222222 will display right to
> left 3333333333333333333333333 will display right to left
> 4444444444444444444444444 will display right to left etc
> Likewise:
> in 3333333333333333333333331, the 3’s will display left to
> right in 5555555555555555555555551, the 5’s will display left
> to right etc
> It directly contradicts the writers intentions. It means that
> different Unicode compliant applications will display the
> same characters in a different order (depending on available
> line width). Examples of how this is bad are given in
> question 12 of the Bytext FAQ ( This
> can be fixed by rewording step L2 such that a reversal
> happens from the highest embedding level to each lower
> contiguous embedding level, regardless if the embedding level
> is represented by a character on the line, until the
> embedding level of 1 is reached (or, as an optimization,
> until the first odd embedding level equal to or lower than
> the lowest embedding level represented by a character on the line).
> (2) Line width dependent mangling, spelling conventions for
> quotes: What is the purpose of step X10 if not to allow
> something like LEFT DOUBLE QUOTATION MARK to be used as if it
> was an OPEN DOUBLE QUOTATION MARK? One simply puts an
> embedding inside a quotation, such as “<RLE>quotation<PDF>”.
> The problem with this is that it only works if the quotation
> begins and ends on the same line. Examples of how the text is
> mangled when the quotation spans multiple lines are given in
> question 13 of the Bytext FAQ ( This
> cannot really be fixed with minor changes other than to
> notify users that the whole left=open, right=closed idea may
> not work as such when the default automatic line breaking is
> used. Users should not rely on any spelling conventions that
> do not bypass the effects of step X10 and mirroring --how
> this can be done is described in the Bytext documentation.
> (3) Mirroring ambiguities:
> What if eor = sor?
> text: R RLO whatever PDF N LRO whatever PDF
> embedding level at step X9: 1 3 3 1 2 2
> directional type at step X10: R R R ? L L
> The above example should be in a monospace font. The original
> is at Step X10 is ambiguous whether
> the “N” should be L or R. This means that if N is has the
> mirrored property, some implementations might display the
> mirrored form, others the non mirrored form, and others might
> result in an error. This can be fixed by deciding on a single
> form for such cases. Also, the
> statement: “for two adjacent runs, the eor of the first run
> is the same as the sor of the second” needs to be removed
> because it is not true.
> Bernard
> ---
> Bernard Rafael Miller, email:
> Format enabling simplified 8 bit regexes of UCS characters:
> ---

This archive was generated by hypermail 2.1.2 : Wed May 29 2002 - 13:30:18 EDT