Re: about P1 part of BIDI alogrithm from Eli Zaretskii on 2011-10-11 (Unicode Mail List Archive)

From: Eli Zaretskii <eliz_at_gnu.org>
Date: Tue, 11 Oct 2011 08:19:18 -0400

> Date: Tue, 11 Oct 2011 18:54:10 +0900
> From: "Martin J. Dürst" <duerst_at_it.aoyama.ac.jp>
> CC: libo.imc_at_gmail.com, unicode_at_unicode.org
>
> There is absolutely no problem to treat the algorithm in UAX#9 as a set
> of requirements, and come up with a totally different implementation
> that produces the same results. I think actually UAX#9 says so somewhere.

I didn't say this was a problem, and yes, UAX#9 does say what you
said. I was trying to explain why arguments about the order of
processing described in UAX#9 do not apply easily to the Emacs
implementation (or in fact, to any implementation that does not
reorder text in batches).

> But what is, strictly speaking, not allowed is to change the
> requirements. One requirement of the algorithm is that when lines are
> broken, logically earlier characters stay on earlier lines, and
> logically later characters move to later lines.

It is arguable whether Emacs continuation lines fit that description,
as you point out:

> The external reason is that continuation lines in Emacs are in
> general just an overflow device, text in Emacs isn't supposed to be
> broken into lines in the same way as e.g. word processors break
> lines to form paragraphs.

> I'm not sure how much it is true

It was 100% true, until word-wrap feature was added to the Emacs
display engine in Emacs 23.

> The internal reason is the one you describe below. It may indeed be a
> strong reason from an implementation perspective, but from an user
> perspective, it's a very weak reason.

That's true, but the situation where this issue is visible is quite
rare in practice. You need all of the following conditions to be
true:

. a line of text that is longer than the display width of the window

. mixed L2R and R2L text in the same line

  . the text that overflows is of the directionality that goes against
    the base direction of the paragraph, e.g., a line in a L2R
    paragraph that overflows in a stretch of R2L text

If any of these conditions is false, the problem does not show at all.

> Also, I don't understand it fully.
> You say that the Emacs display engine examines each character in turn.
> Assuming these are in logical order, you would just examine them up to
> the point where you have "about one line" of glyphs. There would indeed
> be a bit of back and forth there because of the interaction between bidi
> algorithm and glyph selection

It is impossible to know in advance how much is "about one line",
because Emacs supports different fonts and typefaces in the same line.
So you only know that the display line overflowed when you try to fit
one more character, and the result is wider than the window width.
Then you back up and go on to produce the next display line. With
your suggestion, there will be a need to back up much more, or even
iterate until you hit the "right" number of characters.

Besides, not only characters are displayed in an Emacs window. There
could be images, stretches of white space for alignment, etc. These
all affect the calculation of how much space is left on the line.
Making the UBA implementation aware of these display features would be
incorrect from the software design POV, and terribly inefficient. So
in Emacs, the UBA is called to reorder characters _before_ they are
examined by the layout mechanism. The latter gets fed by characters
in the visual order, not in the logical one. It then proceeds
normally, being relatively ignorant of the fact that characters were
reordered.

> as far as I know, mirrored glyphs mostly have the same width as
> their originals

The display engine should not make such assumptions, if it wants to be
robust. One issue is that the mirrored glyph can come from a
different font, because the default font doesn't have a corresponding
glyph. I had such a problem just the other day with U+2215 that was
in the default font, mirrored to U+29F5 that wasn't.

> Anyway, that bit of back
> and forth seems to be much less of a problem than the back and forth
> that you get when you have to reorder over much larger distances because
> you're essentially considering a whole paragraph as a single line.

That's a misunderstanding: there's no excess reordering in Emacs. It
only reorders as much as fits the last display line on the screen, and
even that is done one character at a time. So no effort is lost,
except what the UBA mandates (where it says that some rule depends on
potentially far away character). And a newline ends the reordering
anyway.

But I feel that this is becoming off-topic on this list. If you want
to continue this discussion, I suggest to switch to
emacs-devel_at_gnu.org.
Received on Tue Oct 11 2011 - 07:24:04 CDT

This archive was generated by hypermail 2.2.0 : Tue Oct 11 2011 - 07:24:05 CDT