[Please note that this message was sent to the html-wg but
that my CC field ran out of room!]
I have been following the progress of the HTML working group for
some time now, especially with regards to the issues concerning
the i18n draft by F.Yergeau, G.Nicol, G.Adams, and M.Duerst. I
have also had the opportunity to chat with both Glen (Adams) and
Martin (Duerst) in person regarding various aspects of the
At this point, as someone involved in the implementation of a
Unicode Web Browser and multilingual HTML authoring tool which
supports BiDi (Hebrew and Arabic) and many other scripts
(www.accentsoft.com), I would like to raise some points
regarding the i18n draft. My comments are broken up by subject.
I. Language Marking
The i18n draft recommends that language be marked:
1. at the document level, using information from the HTTP
"Content-Language" field, or, presumably, its <META>
equivalent inside the document.
2. at the block level via the use of the LANG attribute
in block HTML tags, such as <P>, <H1>, etc.
3. at the character level using the newly proposed <SPAN>
1. Document level marking is fine. This would be the
default for the entire page.
2. We find no need or benefit from having language markings
specified at the block level. Even the HTML 3 draft
includes the LANG attribute on far too many tags.
Language is a word level -- or even character level --
marking. While it might be useful to mark a block of
text as a single unit, the same can be accomplished by
enclosing as much as you want in a character level tag,
such as <SPAN>.
Do we really need three "granularities" of language
marking, given that most other HTML markups can be
expressed in only one way?
3. The HTML 3.0 spec calls for a <LANG> tag. Ostensibly,
the syntax for using this would be:
What is the disposition of this tag? Though somewhat
awkward in appearance, we implemented it in our browser
since it was part of the HTML 3 draft. This is
something that could be replaced easily with the
proposed <SPAN> tag, but unless it is taken out of HTML
3, the use of <SPAN> for language marking is not
4. The separator between the language and "ethnologue" is a
period in the HTML 3 draft, while it is a "dash" in the
i18n draft. Which one is it? Both?
II. Direction of Text
Here we are in basic agreement with the i18n draft regarding
the use of the DIR attribute. We understand its use as follows:
1. When used in the <HTML> tag, the value specifies the
default direction (also called reading order) of the
2. The direction of a block of text can be specified
explicitly by using DIR as an attribute of a block
container tag such as <P>, etc.
3. The direction of individual characters can be set by
using DIR inside the proposed <SPAN> tag.
All of this is good stuff, however we have the following items
1. The <TABLE> tag can also accept DIR. The first cell of
a right-to-left table (used in Hebrew, Arabic) would be
in its upper right hand corner. The use of DIR here is
2. The <UL> and <OL> tags need DIR to specify on which
side the bullets or numbering is to appear. This is not
the same as the alignment of the list. Again, the use
of DIR in list tags is required.
3. By default, DIR="rtl" text blocks should be aligned
III. BiDi Issues
1. No BiDi layout should be performed on text marked with
the <PRE> tag.
IV. Character Set Identification
Here we agree on all the methods for identifying the character
set of an HTML document, however we feel the order of preference
for obtaining this information should be:
1. From a <META> tag embedded in the document itself.
2. From the HTTP header
3. From link semantics (though since links, URLs, etc.
change so often we feel that this is of limited use).
4. From the byte ordering mark in UCS-2 encoded files.
5. Any other hueristic for identifying character set.
While the i18n draft is basically a sound document, the issues
raised above deserve consideration for inclusion in in the i18n
section of the HTML standard. Your constructive feedback will
be most appreciated.
Accent Software Intl.
28 Pierre Koenig St.
Jerusalem, Israel 91530
Robert N. Goldrich
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT