Bidi in HTML

From: Jonathan Rosenne (100320.1303@CompuServe.COM)
Date: Wed May 22 1996 - 12:18:07 EDT


>From: Martin J Duerst <mduerst@ifi.unizh.ch>

>>HTML should allow the specification of bidi formatting, when required,
>>by means the Unicode formatting characters and corresponding named character
>>entities.

>I guess this was and still is the main point of disagreement.

>There are two main arguments against this solution:
>- Raw text HTML editing.
>- Interference of bidi structure and markup structure.

>The other main argument, which I have mainly explained to you
>in private mail, and which I do not in any way see answered
>here, is the question of interference of markup sturucture and
>bidi structure. By moving bidi embedding and override to markup,
>we can assure that these two structures are in sync. This helps
>keep documents clean and nicely structured. This may not be
>of extremely high importance in some cases and for some users,
>but for users working with large document collections and using
>SGML techniques, structural integrity is very important, and can
>best be guaranteed by making bidi embedding and override markup.
>
>To give an example, does it make sense to have something like:
>
><Q>text text text &RLE; text text text</Q> text text text &PDF;?

Probably not, but is this an HTML issue? I have seen lot's of
HTML pages that don't make sense -).

>Whatever directionality the "text" snippets are, it does not make
>sense. If we have a quote (<Q>), then it is the quote that is
>embedded; indeed, this is the most frequent embedding case,
>and I do not expect many other in-line elements with a DIR
>attribute. Now the above degenerate case can not be checked
>if we have &RLE; and &PDF; as charcaters like any other,
>they need to be markup of some form.

I don't think the browser should check it. What will it do if it
detects this error? Tell the client? What's the use?

It is agreed that block level elements, such as <P>, reset the bidi
algorithm and thus close all open "pairs". Any malformed structure
will not spread too far.

>The reason is not the frequency of these cases, but their structural
>meaning.

But what if the structure is bad? Only a proper authoring tool can
do a meaningful check of this, so this is not an HTML issue.

>>Another misunderstanding is the implied assumption that the author is
>>aware of the bidi formatting codes. In fact, these are produced by the
>>bidi editor without the author's knowledge based on certain interactions
>>between the author and the editor, mainly the keyboard language. See,
>>for example, the Accent editor. Since the author is not normally aware
>>of these codes, making them markup places an additional burden on the
>>author, especially as the authors of the subject draft expressed a
>>desire to support HTML authoring with a raw text editor, without an HTML
>>authoring tool.

>As for tools and editiors, the aim is indeed that the author does not
>have to be avare of the issues. But I have the suspicion that the
>current porposal of having bidi formatting only as characters is
>more related to the fact that although bidi tools have some ways
>to manipulate bidi text up to a point at which the author is satisfied,
>the current tools don't really "understand" much about bidi.
>
>What I mean by this is that appropriate formatting characters
>are inserted whenever the user changes something explicitly
>or during a copy/paste operation, but that the tool has
>no or only very limited ways of reducing formatting codes
>to equivalent representations with less formatting codes.
>
>This could mean that bidi is still not very well understood,
>that it was designed with too much complexity, or that the
>problem of reducing formatting codes is intractable per se.
>
>Maybe the above is just speculation (and I would be happy
>to hear it actually is), but such issues should be discussed
>in detail and not just be brushed over.

I agree that it is important the bidi codes be optimised (reduced),
that it is not done and that it should be done. I'm glad you have
brought this up, I have been saying it since my first involvement
with Unicode and 10646.

But this is not an HTML issue.

I would like to add that neither Accent nor Microsoft have implemented
embedding in their bidi products, and the 200,000 users of Hebrew
Windows somehow manage without embedding. We discussed it at the SII
and agreed that embedding is nevertheless needed, but I don't think we
should encumber HTML with special requirements to support such a rarely
used feature when alternatives (if less structured) are available.

Overrides are widely used when offered (e.g. Accent), but I suspect that with
the optimisation discussed above they would mostly vanish or be replaced
by lrm's and rlm's. Users don't get along well with invisible marks,
when the text doesn't come out as they want they just go into override mode.

>7. Conformance

>If there are no displayable right to left characters, there
>is no requirement to apply bidi processing.

Good. (But change to "character")

Jonathan Rosenne



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT