Re: Unicode plain text

From: Otto Stolz (
Date: Mon May 26 1997 - 06:10:00 EDT

On May 24, 11:04, Timothy Partridge <> wrote:
> We seem to have two different requirements for plain text here.
> The text has already been formatted by the author into lines and
> paragraphs. (Just as I have done with this e-mail. [...]
> Since NL usually does not denote any logical division in the text
> it is extremely annoying if the BiDi algorithm treats it as a new
> block.

In contrary, it is annoying if it doesn't -- see below.

> [...] Page breaks are preformatted using FF and occur quite
> frequently. Like NL, page breaks usually do not denote any logical
> boundary in the text. Again the BiDi algorithm should not take any
> notice.

Again, wrong conclusion (IMHO).

You want to read the words of a paper in logical order, regardless
of writing direction. For bidirectional text, it is ineavitable that
the focus of your eyes move back and forth in a line as the
directionality changes. You will certainly not want to move your focus
back to the previous line, let alone to the previous page.

the English phrases "the United States of America", and "the United
Kingdom" are imbedded in a run of Arabic text (here represented by
"<<<<<<". (Note that the paragraphs are right-justified, as Arabic
is right-to-left). Just try to scan the following example with your
eyes, following the direction indicated by the "<" marks when viewed
as arrow-points. You would certainly prefer
  the United <<<< << <<<<< << <<<<< <<<<< <<< <<<<<<
  ------------------- page-break -------------------
     <<<<<< the United Kingdom <<< States of America
                              .<<<< <<<< <<<<<< <<<<
  of America <<<< << <<<<< << <<<<< <<<<< <<< <<<<<<
  ------------------- page-break -------------------
     <<<<<< the United Kingdom <<< the United States
                              .<<<< <<<< <<<<<< <<<<

Hence, I think, the bidi algorithm should treat every line break,
and, a fortiori, every page break, as the start of a new block of
text (bidi-wise, not logically!). This means, the bidi algorithm
should recognize every ISO 6429 control code implying a line-break,
as well as every Unicode character doing so.

Best wishes,
    Otto Stolz

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT