Re: pre-HTML5 and the BOM from Jukka K. Korpela on 2012-07-17 (Unicode Mail List Archive)

From: Jukka K. Korpela <jkorpela_at_cs.tut.fi>
Date: Tue, 17 Jul 2012 17:31:46 +0300

2012-07-17 17:11, Leif Halvard Silli wrote:

>>> For instance, early on in 'the Web', some
>>> appeared to think that all non-ASCII had to be represented as entities.
>>
>> Yes indeed. There's still some such stuff around. It's mostly
>> unnecessary, but it doesn't hurt.
>
> Actually, above I described an example where it did hurt ...

The situation is comparable to the BOM issue. In the old days, it was
considered (with good reasons presumably) safer to omit the BOM than to
use it in UTF-8, and it was considered safer to use entity references
rather than direct non-ASCII data. It has changed now, but people are
conservative, and people read old warnings.

We should now say that BOM is not required in UTF-8, but it is safer to
use it, unless you have good reasons not to use it (e.g., authoring
environment that dislikes it). Similarly, character data should
preferably be in UTF-8, unless you have good reasons (mostly on the
authoring side, not clients) to avoid it an use entity and character
references instead.

> I have discovered one browser where it does hurt more directly: In W3M,
> the text browser, which is also included in Emacs. W3M doesn't handle
> (all) entities. E.g. it renders å and å as an 'aa' instead
> of as an 'å', for instance.

To take a more modern example, the native e-mail client on my Android
seems to systematically display character and entity references
literally when displaying message headers with small excerpts of
content, even though it correctly interprets them when displaying the
message itself.

Yucca
Received on Tue Jul 17 2012 - 09:34:15 CDT

This archive was generated by hypermail 2.2.0 : Tue Jul 17 2012 - 09:34:16 CDT