Re: Unicode plain text

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Mon May 26 1997 - 06:10:00 EDT


On May 24, 11:04, Timothy Partridge <timpart@perdix.demon.co.uk> wrote:
> We seem to have two different requirements for plain text here.
...
> The text has already been formatted by the author into lines and
> paragraphs. (Just as I have done with this e-mail. [...]
> Since NL usually does not denote any logical division in the text
> it is extremely annoying if the BiDi algorithm treats it as a new
> block.

In contrary, it is annoying if it doesn't -- see below.

> [...] Page breaks are preformatted using FF and occur quite
> frequently. Like NL, page breaks usually do not denote any logical
> boundary in the text. Again the BiDi algorithm should not take any
> notice.

Again, wrong conclusion (IMHO).

You want to read the words of a paper in logical order, regardless
of writing direction. For bidirectional text, it is ineavitable that
the focus of your eyes move back and forth in a line as the
directionality changes. You will certainly not want to move your focus
back to the previous line, let alone to the previous page.

Example:
the English phrases "the United States of America", and "the United
Kingdom" are imbedded in a run of Arabic text (here represented by
"<<<<<<". (Note that the paragraphs are right-justified, as Arabic
is right-to-left). Just try to scan the following example with your
eyes, following the direction indicated by the "<" marks when viewed
as arrow-points. You would certainly prefer
  the United <<<< << <<<<< << <<<<< <<<<< <<< <<<<<<
  ------------------- page-break -------------------
     <<<<<< the United Kingdom <<< States of America
                              .<<<< <<<< <<<<<< <<<<
over
  of America <<<< << <<<<< << <<<<< <<<<< <<< <<<<<<
  ------------------- page-break -------------------
     <<<<<< the United Kingdom <<< the United States
                              .<<<< <<<< <<<<<< <<<<

Hence, I think, the bidi algorithm should treat every line break,
and, a fortiori, every page break, as the start of a new block of
text (bidi-wise, not logically!). This means, the bidi algorithm
should recognize every ISO 6429 control code implying a line-break,
as well as every Unicode character doing so.

Best wishes,
    Otto Stolz



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT