[unicode] Re: Unicode editing

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Wed Mar 28 2001 - 14:25:13 EST


Jonathan Coxhead wrote:
> Consider
>
> RLE a b c PDF RLE d e f PDF
>
> in an LTR region (where a, b, ... are neutral). This displays as
>
> cbafed

No, I think it displays as:

        fedcba

(Read on...)

> i e, 2 RTL runs in LTR order. If you encode that as
>
> a b c d e f
> 1 1 1 1 1 1
>
> it's indistinguishable from
>
> RLE a b c d e f PDF

In fact, according to the current definition of the algorithm (or to my
understanding of it), your two examples are strictly equivalent.

(Read on...)

> which displays as
>
> f e d c b a
>
> So I don't see how this can work. Am I missing something?

Your made quite a good point. For a moment, I have been on the point of
giving up and say that you were right.

I think that you started from rule P2:

        "P2. In each paragraph, find the first character of type L, AL, or
R."
        "[...] Note that the characters of type LRE, LRO, RLE, RLO are
ignored in this rule. [...]"

So, embedding controls don't count in determining the paragraph
directionality. This means that the paragraph in your first example is
actually LTR, regardless that it only contains two RTL embeddings.

You are right, so far. I add that the same is true also for your second
example.

I also add that I don't quite understand the reason for not considering
embedding controls here, at least in those cases when no stronger
directional characters are present. Could it be that Jonathan has caught a
dark spot in the current definition of the algorithm?

However. Here comes the mistake: the fact that the paragraph directionality
is RTL does *not* imply that the first embedding must be visualized on the
left of the second embedding.

The two facts are totally unrelated, although there must be some sort of
unconscious link between them because, at first sight, Jonathan explanation
sounds very convincing.

The apparent order of characters on the screen is actually determined *only*
by the resolved numerical value of the levels, as is explained in section
"Reordering Resolved Levels".

More precisely, it is rule L2 that governs this:

        "L2. From the highest level found in the text to the lowest odd
level on each line, reverse any contiguous sequence of characters that are
at that level or higher."

I don't know what would be the resolved level of embedding controls
themselves. But this clearly doesn't matter because there is a rule that
explicitly requires to *remove* them as soon as the levels have been
resolved:

        "X9. Remove all RLE, LRE, RLO, LRO, PDF, and BN codes."

By the way, Roozbeh, how didn't we notice this rule when we were discussing
whether it is legal or not to remove embedding controls!?

It is not only allowed: it is *mandatory*!!

Well, OK, not really mandatory: a note explains that an implementation is
*allowed* not keep the them, but only if it behaves "as though the codes
were not present". But there is no question that the default is to delete
them.

_ Marco



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:15 EDT