RE: Unicode editing (RE: Unicode complaints)

From: Marco Cimarosti (
Date: Mon Mar 19 2001 - 06:20:01 EST

Roozbeh Pournader wrote:
> Now I like this. This is getting near to what I had in mind.
> Characters,
> together with their embedding levels (and possibly more).

You are right!

I have been thinking about this the whole week-end, and I too came to the
conclusion that the resolved embedding levels is what really needs to be
maintained during editing. Once you have these, you can safely throw away
all the bidi controls and be sure that you'll be able to re-create them when
going back to logic order.

Anyway, the embedding levels have the drawback of being invisible, and
invisible things are not good in WYSIWYG editing.

So there must be a way to optionally *visualize* the embedding levels in a
line of text.

One way that comes to mind could be underlining text with *arrows* that show
the text direction.

Multiple embedding level would thus be visualized with a downwards stack of
arrows. All arrows must be shorter that the arrow that precedes it, and
point in the opposite direction.

E.g., a LTR paragraph (level 0) would be completely underlined by an arrow
heading to the right. If the paragraph contains a RTL embedding (level 1), a
second arrow, heading to the left, will underline the RTL phrase. If the RTL
embedding contains itself a LTR embedding (level 2), a third shorter arrow
would be drawn under the embedded LTR, and so on.

Once the user may activate this "show bidi" mode, she can understand and
even *edit* the levels.

I would say that two separate commands are needed to edit the levels:

- "Bidi Embed": adds or subtracts *two* to the to the embedding level of the
selected text.
- "Bidi Override": adds or subtract *one* to the embedding level of the
selected text.

As an example, consider this *displayed* paragraph:


where, as usual, lowercase letters represent left-to-right characters and
uppercase right-to-left characters.

With default embedding levels, the text above would be LTR text with two RTL

        Visual order: abcDEFghiJKLmno
        Bidi Levels: 000111000111000
        Level 0 arrows: -------------->
        Level 1 arrows: <-- <--
        Logic order: abcFEDghiLKJmno

If you select "ghi" and use the "Bidi Embed" command, you obtain a LTR text
with one RTL embedding, itself containing an LTR embedding. As you have two
levels of embedding, the resulting Unicode string will need specific
embedding controls:

        Visual order: abcDEFghiJKLmno
        Bidi Levels: 000111222111000
        Level 0 arrows: -------------->
        Level 1 arrows: <--------
        Level 2 arrows: -->
        Logic order: abc<RLE>LKJghiFED<PDF>mno

If you select "ghi" and use the "Bidi Override" command, you have a LTR text
with one RTL embedding. Part of the RTL text has an unnatural directionality
(LRT characters forced to RTL), so the resulting Unicode string will need
specific override controls:

        Visual order: abcDEFghiJKLmno
        Bidi Levels: 000111111111000
        Level 0 arrows: -------------->
        Level 1 arrows: <--------
        Logic order: abcFED<RLO>ihg<PDF>LKJmno

These are of course very simple examples. The algorithm gets much
complicated by validity checks, such as ensuring that the user doesn't do
non-sense embeddings (like LTR text embedded in other LTR text).

Moreover, complications certainly arise with cut&paste: I think that the
levels must be adjusted to avoid non-sense situations.

_ Marco

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:20 EDT