Bidi in HTML

From: Jonathan Rosenne (100320.1303@CompuServe.COM)
Date: Sat May 18 1996 - 13:38:46 EDT


This is a summary of my understanding of the discussion on bidi so far:

1. HTML, as a higher level protocol in the sense used by Unicode,
provides the base directionality for each "block-type" element. The
directionality may be specified by means of the DIR attribute or
inherited from a higher level element or from the global directionality
of the page.

For each such element, the embedding level is reset according to the
base directionality.

2. HTML also provides the global directionality of the page by means of
a DIR attribute in the HTML element. If not specified, it is left to
right. This global directionality is the default directionality for all
the block-type elements in the page.

3. The subject draft had proposed an elaborate system with tags and
attributes. For embedding, in addition to the Unicode codes, a DIR
attribute on in-line elements was proposed. For overrides, a BDO tag
had beeen proposed.

In fact, the actual use of embeddings and overrides is very rare
(although necessary) and it is not justified to burden HTML with these
rare occurences, especially as they are available in the underlying
character set and the proposed HTML extensions are an alternative way of
doing the same thing.

HTML should allow the specification of bidi formatting, when required,
by means the Unicode formatting characters and corresponding named character
entities. The full complement of Unicode formatting characters should be
supported, including those used by Arabic.

Of course, the Unicode charaters may be used directly, but since HTML
allows other character sets these names are needed.

Providing character entitiy names for all these codes (instead of the
partial list proposed) makes HTML more consistent, avoids the need to
redefine bidi formatting, and avoids the possibility that the
re-definition in HTML differs from that of Unicode.

The proposed solution, that an attribute to in-line elements will be
equivalent to the automatic generation of formatting codes in front and
at the end of the element (e.g. &lre; and &pdf;) would have been useful
had the need been common. As the need for these codes is rare, and as a
simple alternative is available, the solution is not justified.

Another misunderstanding is the implied assumption that the author is
aware of the bidi formatting codes. In fact, these are produced by the
bidi editor without the author's knowledge based on certain interactions
between the author and the editor, mainly the keyboard language. See,
for example, the Accent editor. Since the author is not normally aware
of these codes, making them markup places an additional burden on the
author, especially as the authors of the subject draft expressed a
desire to support HTML authoring with a raw text editor, without an HTML
authoring tool.

Following is the list of additional named entities:

    <!ENTITY lre CDATA "&#8234;"--=left-to-right embedding-->
    <!ENTITY rle CDATA "&#8235;"--=right-to-left embedding-->
    <!ENTITY pdf CDATA "&#8236;"--=pop directional formating-->
    <!ENTITY lre CDATA "&#8237;"--=left-to-right override-->
    <!ENTITY rle CDATA "&#8238;"--=right-to-left override-->

I suggest that the other formatting characters also be included.
I copied them from ISO-10646 and invented abbreviations.

    <!ENTITY iss CDATA "&#8298;"--=inhibit symmetric swapping-->
    <!ENTITY ass CDATA "&#8299;"--=activate symmetric swapping-->
    <!ENTITY iafs CDATA "&#8300;"--=inhibit Arabic form shaping-->
    <!ENTITY aafs CDATA "&#8300;"--=activate Arabic form shaping-->
    <!ENTITY nads CDATA "&#8301;"--=national digit shapes-->
    <!ENTITY nods CDATA "&#8302;"--=nominal digit shapes-->

4. Unusual sequences

The interaction of unusual sequences of codes and markup should not be
addressed by this specification.

This includes the cases of unmatched pairs of formatting codes, of
markup between characters that would not normally be separated etc.

5. the LANG attribute

The LANG attribute has no effect on bidi.

It is not easy nor useful to specify the list of bidi languages, since
the number of languages that are by default written RTL is not really
that small, and that there are languages, such as Turkish family of
languages, that can be written with different scripts and directions.

6. Preformatted text

Text under the influence of a <PRE> tag and other tags indicating
preformatting should be considered preformatted only as far as HTML is
concerned, not on the character level.

7. Conformance

Conforming user-agents are required to apply the bidi presentation
algorithm if they display right to left characters.

If the non-displayable character is a right to left character, there
is no requirement to apply bidi processing to that character.

8. Additional items

The following items should allow international values, i.e the full
character set:

     IMG ALT

     INPUT VALUE

     OPTION VALUE

Regards,

Jonathan Rosenne



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT