Re: pre-HTML5 and the BOM

From: Jukka K. Korpela <>
Date: Wed, 18 Jul 2012 07:26:31 +0300

2012-07-18 5:09, "Martin J. Dürst" wrote:

> Well, the "considered" in the BOM case applies to everybody (including
> the W3C), but in the character references case, it applies only to
> people who didn't understand how things were working. In fact, although
> RFC 2070 and HTML4 clearly nailed down the interpretation of numeric
> character references to Unicode, there were implementations (the ones I
> know were in the mobile space) past 2000.

I presume that you mean that there were *faulty* implementations, with
wrong interpretations of numbers in character references. That’s true,
but it’s a different issue. What I meant is that it was widely said, and
it is still said by many people, that entity references like “&aring;”
were safer than directly entering characters like “å”. Part of this was
that not all transmissions were 8-bit safe; another part was that it was
not clear what encodings user agents can handle—so ASCII + entities was
described as the safe solution.

And these safety considerations have long ago been reversed, just as
with the BOM.

>> To take a more modern example, the native e-mail client on my Android
>> seems to systematically display character and entity references
>> literally when displaying message headers with small excerpts of
>> content, even though it correctly interprets them when displaying the
>> message itself.
> The reason for this may simply be that email bodies can be in HTML, but
> that there is no way at all to use HTML in email header fields.

Good guess, but the &aring; etc. do not appear in headers; they are
excerpts from the body in HTML format—which is sort-of parsed but
apparently without interpreting entity references. And this is modern
software, not Netscape 1.

Received on Tue Jul 17 2012 - 23:33:23 CDT

This archive was generated by hypermail 2.2.0 : Tue Jul 17 2012 - 23:33:46 CDT