Re: pre-HTML5 and the BOM from Jukka K. Korpela on 2012-07-17 (Unicode Mail List Archive)

From: Jukka K. Korpela <jkorpela_at_cs.tut.fi>
Date: Wed, 18 Jul 2012 07:26:31 +0300

2012-07-18 5:09, "Martin J. Dürst" wrote:

> Well, the "considered" in the BOM case applies to everybody (including
> the W3C), but in the character references case, it applies only to
> people who didn't understand how things were working. In fact, although
> RFC 2070 and HTML4 clearly nailed down the interpretation of numeric
> character references to Unicode, there were implementations (the ones I
> know were in the mobile space) past 2000.

I presume that you mean that there were *faulty* implementations, with
wrong interpretations of numbers in character references. That’s true,
but it’s a different issue. What I meant is that it was widely said, and
it is still said by many people, that entity references like “å”
were safer than directly entering characters like “å”. Part of this was
that not all transmissions were 8-bit safe; another part was that it was
not clear what encodings user agents can handle—so ASCII + entities was
described as the safe solution.

And these safety considerations have long ago been reversed, just as
with the BOM.

>> To take a more modern example, the native e-mail client on my Android
>> seems to systematically display character and entity references
>> literally when displaying message headers with small excerpts of
>> content, even though it correctly interprets them when displaying the
>> message itself.
>
> The reason for this may simply be that email bodies can be in HTML, but
> that there is no way at all to use HTML in email header fields.

Good guess, but the å etc. do not appear in headers; they are
excerpts from the body in HTML format—which is sort-of parsed but
apparently without interpreting entity references. And this is modern
software, not Netscape 1.

Yucca
Received on Tue Jul 17 2012 - 23:33:23 CDT

This archive was generated by hypermail 2.2.0 : Tue Jul 17 2012 - 23:33:46 CDT