Re: I18N of HTML - Hebrew

From: Martin J Duerst (
Date: Wed May 08 1996 - 03:38:07 EDT

I included my answers to the discussion below between Gavin Nicol and
Ken Whistler in my very long answer to Jonathan Rosenne's comments.

However, it occurred to me that most people won't have the time
to read such a long mail, so I am giving some short answers here
with the idea to serve as a management summary.

>>This is because the directionality codes are not *characters* in the
>>true sense of the word, but rather codes used to switch functionality
>>from one mode to another. As scuh, they deserve the role of markup.
>>In the I18N draft group, we discussed whether they should be markup,
>>or codes, and decided on markup, because they are not really

The reason for having RLE, LRE, RLO, LRO, and PDF as markup was
first the possibility to directly associate them with the structure
of the text (it is, for many people and applications, more natural
to say that they have a quote with different directionality than to
say that they have a quote, and everything inside is of different
directionality). The second, less obvious, but much more forcing
reason is that in this way, very nasty problems of interference
between markup and in-line directionality (due to the meta-ness
of markup) can be avoided.

>For the record, I concur with Jony Rosenne's analysis of this
>issue. HTML support for the Unicode Standard should be for ALL of it.
>It is not meaningful to try to pick out the bidi formatting characters
>(or any other formatting subset, for that matter) and create ad hoc new
>markup for them which differs from the text behavior specified
>in the Unicode Standard.

HTML, as proposed in the I18N draft, supports the full Unicode
standard especially for bidi. There is absolutely no need to
change your Unicode bidi implementation if you have your
HTML parser do some simple conversions from markup to
formatting characters.

HTML is not a character standard, it is a higher-level protocol.
As such it takes the liberties given to higher-level protocols
in the Unicode standard and makes use of them in the way
most suitable for its various areas of use. These areas include,
in contrast to some internal, application dependent higher-
level protocols, the editing of HTML text with raw text editors,
where the meta-level problem becomes obvious.

>Many encoded entities, graphic or otherwise, which make their
>way into character encoding standards, engender disagreement
>about their appropriateness for encoding as characters. However,
>once they are in the standards, implementations of the standards
>must treat them as such.

Agreed, unless the standard itself (for good reasons actually) says
that higher level protocols might supplement or override some
of these formatting characters. (see the bidi section of the
Unicode 2.0 draft)

Regards, Martin.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT