Re: Bidi in HTML

From: Martin J Duerst (mduerst@ifi.unizh.ch)
Date: Thu May 23 1996 - 12:05:32 EDT


Jonathan Rosenne wrote:

>>From: Martin J Duerst <mduerst@ifi.unizh.ch>
>
>>To give an example, does it make sense to have something like:
>>
>><Q>text text text &RLE; text text text</Q> text text text &PDF;?
>
>Probably not, but is this an HTML issue? I have seen lot's of
>HTML pages that don't make sense -).

One of the ideas of document markup is to be able to eliminate
such nonsense with simple tools. HTML, in asfar as it is an
application of SGML, allows this given the markup is appropriately
defined.

>>Whatever directionality the "text" snippets are, it does not make
>>sense. If we have a quote (<Q>), then it is the quote that is
>>embedded; indeed, this is the most frequent embedding case,
>>and I do not expect many other in-line elements with a DIR
>>attribute. Now the above degenerate case can not be checked
>>if we have &RLE; and &PDF; as charcaters like any other,
>>they need to be markup of some form.
>
>I don't think the browser should check it. What will it do if it
>detects this error? Tell the client? What's the use?

The basic idea is not that the browser should check it, but
the author should use a tool, such as a validation service.

>It is agreed that block level elements, such as <P>, reset the bidi
>algorithm and thus close all open "pairs". Any malformed structure
>will not spread too far.

"not too far" is a good consolation if you can't do anything else.
But "not at all" is definitely better, and it can easily be achieved.

>>The reason is not the frequency of these cases, but their structural
>>meaning.
>
>But what if the structure is bad? Only a proper authoring tool can
>do a meaningful check of this, so this is not an HTML issue.

I really don't know why you are writing such comments as somebody
who has, as far as I understand, been involved in the development
of SGML DTDs. The development of such a DTD is a way of clearly
saying what structures are okay and what are bad, and to help
tools check it. Actually, using the current i18n DTD, I have a
proper tool NOW to do a meaningful check. I don't even have
to wait for special editors and such, I can just take some SGML parser.

>I agree that it is important the bidi codes be optimised (reduced),
>that it is not done and that it should be done. I'm glad you have
>brought this up, I have been saying it since my first involvement
>with Unicode and 10646.
>
>But this is not an HTML issue.

It's not directly an HTML issue, but as far as markup can contribute
to avoid cluttering documents with bidi formatting characters that
it is difficult to get rid of, it is definitely related to the current
discussion.

>I would like to add that neither Accent nor Microsoft have implemented
>embedding in their bidi products, and the 200,000 users of Hebrew
>Windows somehow manage without embedding. We discussed it at the SII
>and agreed that embedding is nevertheless needed, but I don't think we
>should encumber HTML with special requirements to support such a rarely
>used feature when alternatives (if less structured) are available.
>
>Overrides are widely used when offered (e.g. Accent), but I suspect that with
>the optimisation discussed above they would mostly vanish or be replaced
>by lrm's and rlm's. Users don't get along well with invisible marks,
>when the text doesn't come out as they want they just go into override mode.

Nice to get this additional information. It actually gives me quite some
new arguments, and strengthens my old ones.

First, if some editors currently don't support embedding, wouldn't it be
nice if I could define a HTML document with embedding, for use by
other browsers? And wouldn't markup be a nice way to do this, as
embedding is anyway usually connected with document structure?
Note that while it is rather difficult to implement good Bidi using
all features of the Unicode bidi algorithm for an editor, it is not
that difficult for a browser that only has to do display and not manipulation.

Second, if users have problems with invisible formatting codes (which I
can very well understand), is not this also an argument for clear structural
markup? If a user has problems with invisible markup, understanding the
exact consequences of &RLE and &PDF, etc., might not be that easy either.
But <Q DIR=RTL>text</Q> says very clearly what it means, and a user should
not have many problems to understand meaning and structure.

Third, could not the availability of appropriate markup reduce the need
to use override more than really advisable?

Regards, Martin.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT