Re: pre-HTML5 and the BOM

From: David Starner <>
Date: Fri, 13 Jul 2012 14:42:51 -0700

On Fri, Jul 13, 2012 at 1:29 PM, Jukka K. Korpela <> wrote:
> 2012-07-13 22:37, David Starner wrote:
>> Wikipedia says "The Unicode standard recommends against the BOM for
>> UTF-8." and refers to page 30 of the Unicode Standard, version 6.0,
>> that says "Use of a BOM is neither required nor recommended for
>> UTF-8..." Calling it a myth seems bizarre.
> “Not recommended” is distinct from “recommends against”.

I disagree; the meaning of the two phrases overlaps in my idolect, and
while it would be somewhat laconic, I might use "not recommended" to
mean "if you insist on doing that, please give us a chance to get the
fire extinguisher first",

> A
> more appropriate formulation would be “Use of a BOM is not required for BOM,
> but may be used as a signature that indicates, with practical certainty,
> that data is UTF-8 encoded.”

In the environment that UTF-8 was developed for, a BOM is a nuisance;
a BOM will stop the shell from properly interpreting a hashbang, and
other existing programs will lose the BOM, duplicate the BOM, and
scatter BOMs throughout files. Given the number of text-like file
formats (like old-school PNM) and number of scripts depending on
existing behavior, these aren't going to be changed.

As I said before, Unicode simplified but did not solve the fact that
text from other operating systems requires some modification before
working just right. But I don't think that Unicode should recommend
unconditionally the UTF-8 BOM, because it is problematic in the field
of use UTF-8 was created for and is still used for.

Kie ekzistas vivo, ekzistas espero.
Received on Fri Jul 13 2012 - 16:44:10 CDT

This archive was generated by hypermail 2.2.0 : Fri Jul 13 2012 - 16:44:10 CDT