From: fantasai (fantasai.lists@inkedblade.net)
Date: Mon Jul 19 2004 - 15:56:51 CDT
I've been going through the Unicode BIDI Algorithm, and I'm having trouble
understanding the justification for the way rules W7 and N1 are formulated.
http://www.unicode.org/reports/tr9/#W7 says:
# W7. Search backwards from each instance of a European number until the first
# strong type (R, L, or sor) is found. If an L is found, then change the type
# of the European number to L.
# N1. A sequence of neutrals takes the direction of the surrounding strong text
# if the text on both sides has the same direction. European and Arabic numbers
# act as if they were R in terms of their influence on neutrals.
# Start-of-level-run (sor) and end-of-level-run (eor) are used at level run
# boundaries.
I understand that these rules are intended to handle things like "BMW 500" in
the middle of an Arabic text. But it also goes back through list separators as
in the sequence
start> SEE SECTIONS 22a, 53, 62, 95c. >end [uppercase => rtl]
In this case, the entire list (although not the letters/numbers within in each
item) would be ordered left-to-right instead of right-to-left like the rest of
the sentence. Is there a reason why W7 searches back through double-CS and N?
(Other than preparing for N1, which could have been written not to assume W7.)
I noticed, btw, that none of the examples for N1 have a neutral between two
numbers.
~fantasai
-- http://fantasai.inkedblade.net/contact
This archive was generated by hypermail 2.1.5 : Mon Jul 19 2004 - 15:58:54 CDT