Re: UTF-8 BOM (Re: Charset declaration in HTML)

From: Martin J. Dürst <duerst_at_it.aoyama.ac.jp>
Date: Wed, 18 Jul 2012 16:59:58 +0900

Hello Doug,

On 2012/07/18 0:35, Doug Ewell wrote:
> For those who haven't yet had enough of this debate yet, here's a link
> to an informative blog (with some informative comments) from Michael
> Kaplan:
>
> "Every character has a story #4: U+feff (alternate title: UTF-8 is the
> BOM, dude!)"
> http://blogs.msdn.com/b/michkap/archive/2005/01/20/357028.aspx
>
> What should be interesting is that this blog dates to January 2005,
> seven and a half years ago, and yet includes the following:
>
> "But every 4-6 months another huge thread on the Unicode List gets
> started

Well, less or more than 4-6 months, but yes.

> about how bad the BOM is for UTF-8 and how it breaks UNIX tools
> that have been around and able to support UTF-8 without change for
> decades

Yes indeed. The BOM and Unix/Linux tools don't work well together.

> and about how Microsoft is evil for shipping Notepad that causes
> all of these problems

That's a bit overblown, but I guess for a Microsoft employee, it looks
like this.

> and how neither the W3C nor Unicode would have
> ever supported a UTF-8 BOM if Microsoft did not have Notepad doing it,

That's true, too. It was indeed Notepad that brought the UTF-8
BOM/signature to the attention of the W3C and the browser makers.

The problem with the BOM in UTF-8 is that it can be quite helpful (for
quickly distinguishing between UTF-8 and legacy-encoded files) and quite
damaging (for programs that use the Unix/Linux model of text
processing), and that's why it creates so much controversy.

Regards, Martin.
Received on Wed Jul 18 2012 - 03:02:44 CDT

This archive was generated by hypermail 2.2.0 : Wed Jul 18 2012 - 03:02:45 CDT