UTF-8 BOM (Re: Charset declaration in HTML)

From: Steven Atreju <snatreju_at_googlemail.com>
Date: Thu, 12 Jul 2012 12:32:46 +0200

 |> As for editors: If your own editor have no problems with the BOM, then
 |> what? But I think Notepad can also save as UTF-8 but without the BOM -
 |> there should be possible to get an option for choosing when you save
 |> it.
 |Perhaps there should be such an option in Notepad, but there isn't. The
 |decision to have Notepad always write the signature to UTF-8 files, and
 |always rely on it to read them, has been documented to death.
 |The bottom line is, there are zillions of editors available for Windows,
 |many of them free,
 |and people who want to create or modify UTF-8 files
 |which will be consumed by a process that is intolerant of the signature
 |should not use Notepad. That goes for HTML (pre-5) pages, Unix shell
 |scripts, and others.

In the meanwhile the UTF-8 BOM is in the standard and thus
contradicts fourty years of (well) good (Unix/POSIX) engineering
and craftsmanship. Where a file is a file and everything is a
file, holistically. Where small tools which do their thing well
can be plugged together to achieve complex tasks. Unicode is
very, very important. Really.

In the future simple things like '$ cat File1 File2 > File3' will
no longer work that easily. Currently this works *whatever* file,
and even program code that has been written more than thirty years
ago will work correctly. No! You have to modify content to get it
Unicode is very, very important. Really.

Adding the UTF-8 BOM is an incarnation of the malicious evil.
(In at least its incarnations ignorance and foolishness, and what
about pretension.)
Tomorrow is Friday, 13th. Good luck.

I couldn't respond to the thread because the Digest doesn't include
the message-ID.

P.S., 2.:
Microsoft and IBM are involved in the standard groups that have been

Received on Thu Jul 12 2012 - 05:37:27 CDT

This archive was generated by hypermail 2.2.0 : Thu Jul 12 2012 - 05:37:28 CDT