Re: expansion of bidi planned?

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Dec 20 2000 - 19:32:40 EST


Elain Keown asked:

> Thanks to Kim Peck (and Michael Everson!) for the lovely text orientations example.
>
> So now the obvious question: does Unicode have a formal plan to expand
> the bidi algorithm to cover more text orientation options?

No.

You need to understand what the bidirectional algorithm is and is not
intended to do:

1. It *is* intended to standardize the logical order storage of text, by
   describing how text is then to be legibly displayed when it consists
   of mixtures of characters in a script traditionally laid out left-to-right
   and characters in a script traditionally laid out right-to-left.

2. It is *not* intended to be a general purpose layout engine for dealing
   with all possible "ploughing orders" for placing characters on a
   medium of display.

Thus, the bidi algorithm is essentially addressing a plain text issue: if
I mix RTL and LTR characters in a text stream (as inevitably happens when
dealing the Arabic, Hebrew, etc.), then what order do these characters
go in the "backing store" and how do I get from that to a legible display.
If the Unicode Standard did not answer this question clearly, then no one
would be able to guarantee that they could generate a stream of Arabic
text in Unicode that anyone else could interpret correctly.

Things like boustrophedon are inherently very different from bidi, in that
they do not require *reordering* from backing store to display, or have
to deal with issues like embedded runs. For boustrophedon, conceptually
what you do in rendering is just take the logical order of characters,
deal them out, and when you get to the margin, you turn the *medium*.
How you implement that in a text and graphic word processor is another
problem, but it isn't a *bidi* problem.

Vertical text rendering is also a rather different issue. Effectively, it
is a matter of inverting the x and y dimensions (and adjusting all other
concepts of margins and text progression, accordingly) and then laying
out text. Vertical text *can* have bidi issues, if you embed Arabic or
Hebrew in it, for example. Effectively, you have to rotate the Arabic
chunk 90 degrees, and you have to apply the bidi algorithm on that
little rotated chunk.

But any issue of embedding rotated LTR or RTL text in vertical text,
or otherwise mixing vertical and horizontal text on a page, essentially
engages some kind of graphic framing mechanism in a word processing
(or other layout system). These are rather complex, and have essentially
nothing to do with the bidi algorithm, per se.
  
>
> Is it extraordinarily difficult to do this, at the level of a Ph.D. thesis in
> algorithms, or is it a simpler problem? A Yiddishist and specialist in
> algorithms whom I wrote to said he heard the bidi algorithm is "really hairy."

The bidi algorithm is, indeed, "really hairy". Not that the algorithm itself
is all that complex, as algorithms go. But rather because it has lots of
debatable edge cases regarding the handling of punctuation and numbers,
where the desired outcome varies slightly depending on which language
and area you are dealing with and whether you are trying to get numerals,
dates, part numbers, or something else to come out "right" by default.
So most of the argumentation went into trying to determine the class
values of the problematical characters like "/", ".", ",", etc., and dealing
with other edge cases.

The basic concepts, on the other hand, are relatively straightforward.

--Ken

>
> An H-P interface specialist I met once told me his research group tried
> to build a text processor for Hebrew and Chinese because they thought it
> was the worst possible combination of languages.
>
> Elaine Keown
>
> Find the best deals on the web at AltaVista Shopping!
> http://www.shopping.altavista.com
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT