Roozbeh Pournader wrote:
> On Wed, 21 Mar 2001, Marco Cimarosti wrote:
> > struct MyWysiwygGlyph
> > {
> > wchar_t GlyphCode;
> > int EmbeddingLevel;
> > };
> > I think that Roozbeh had something quite similar in mind.
>
> Yes. I was not sure that if that's enough, but after this
> discussion, I believe them to be enough.
It depends. Enough for what?
Storing the level with each character is enough for generating *one* valid
Unicode logical order. This logical string should have the same logical
order as the original string, and the embedding relationships (who embeds
whom).
But it is *not* enough to recreate *exactly* the same embedding controls
that you had in the original text.
For instance, imagine that the original text contained a stand-alone <PDF>.
That control is just a terminator and, used alone is totally meaningless.
So, once you remove it, you have lost it forever.
If you ask me, this behavior is perfectly OK. But I was reasoning along your
lines of being 100% prepared to future meanings of bidi controls.
> I will try to implement that as part of
> fribidi, the first GPL-compatible Unicode-conformant bidi
> engine in the wild.
Is this project online already? Where?
> > 3) The lowest level in each paragraph *must* be either 0 (for a LTR
> > paragraph) or 1 (for a RTL paragraph).
>
> I have read UAX #9 many times; where have you concluded that from?
I was referring to the "paragraph embedding level" (a.k.a. "paragraph
direction" or "base direction"; see definitions BD3 and BD4), which is
always 0 or 1 (see rule P3).
My assumption was that, the "paragraph embedding level" may always be
considered one of the embedding levels present in the paragraph.
Of course, this is not true (or is only "virtually true") for a paragraph
that is completely enclosed in an explicit embedding.
But, as I explained before, I was thinking about a sort of normalization (or
optimization?) of embeddings. So, and "embedding" which is not embedded in
anything would be flatten down.
> > (e.g. XML), embedding levels do not necessarily follow the
> rule. E.g., see
> > how tagging and Unicode embedding overlap in: "<BOLD> abc
> &RLE; def </BOLD>
> > ghi &PDF; ijk".
>
> Oh, oh! That kind of thing is illegal in XML. Just take a
> look at "Bidi embedding controls" section in UTR #20 at:
>
> http://www.unicode.org/unicode/reports/tr20/#Bidi
I read it, but I don't get a clear evidence that it is "illegal".
An authoring tool may convert bidi controls to markup and, in *this* case,
it would detect the nesting problem, but it is not mandatory to use a
specific tool to develop XML or HTML files.
A parser which treats plain text between two tags as an atomic node would
fail to spot the problem, and it would hand the mess over to the rendered.
HTML 4.0 says: "If both methods are used, great care should be exercised to
insure proper nesting of markup and directional embedding or override,
otherwise, rendering results are undefined."
But it is not clear *who* has to exercise great care. I suspect that they
meant more "the author" that "the authoring tool".
_ Marco
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:15 EDT