Re: pre-HTML5 and the BOM from Martin J. Dürst on 2012-07-17 (Unicode Mail List Archive)

From: Martin J. Dürst <duerst_at_it.aoyama.ac.jp>
Date: Wed, 18 Jul 2012 11:09:07 +0900

Hello Jukka,

On 2012/07/17 23:31, Jukka K. Korpela wrote:
> 2012-07-17 17:11, Leif Halvard Silli wrote:
>
>>>> For instance, early on in 'the Web', some
>>>> appeared to think that all non-ASCII had to be represented as entities.
>>>
>>> Yes indeed. There's still some such stuff around. It's mostly
>>> unnecessary, but it doesn't hurt.
>>
>> Actually, above I described an example where it did hurt ...
>
> The situation is comparable to the BOM issue.

In a very general sense, probably yes.

> In the old days, it was
> considered (with good reasons presumably) safer to omit the BOM than to
> use it in UTF-8,

Yes indeed.

> and it was considered safer to use entity references
> rather than direct non-ASCII data.

Well, the "considered" in the BOM case applies to everybody (including
the W3C), but in the character references case, it applies only to
people who didn't understand how things were working. In fact, although
RFC 2070 and HTML4 clearly nailed down the interpretation of numeric
character references to Unicode, there were implementations (the ones I
know were in the mobile space) past 2000.

> To take a more modern example, the native e-mail client on my Android
> seems to systematically display character and entity references
> literally when displaying message headers with small excerpts of
> content, even though it correctly interprets them when displaying the
> message itself.

The reason for this may simply be that email bodies can be in HTML, but
that there is no way at all to use HTML in email header fields.

Regards, Martin.
Received on Tue Jul 17 2012 - 21:12:02 CDT

This archive was generated by hypermail 2.2.0 : Tue Jul 17 2012 - 21:12:03 CDT