Re: pre-HTML5 and the BOM

From: Doug Ewell <doug_at_ewellic.org>
Date: Sat, 14 Jul 2012 14:57:13 -0600

David Starner wrote:

> In the environment that UTF-8 was developed for, a BOM is a nuisance;
> a BOM will stop the shell from properly interpreting a hashbang, and
> other existing programs will lose the BOM, duplicate the BOM, and
> scatter BOMs throughout files. Given the number of text-like file
> formats (like old-school PNM) and number of scripts depending on
> existing behavior, these aren't going to be changed.

We've been hearing the story about hashbang for many, many years now,
and I still don't understand why the following logic hasn't been made
part of the low-level I/O process in such environments:

"When reading a text file that could be UTF-8 or some other ACE, if the
first three bytes of the file are EF BB BF, discard them and assume the
file is UTF-8."

> As I said before, Unicode simplified but did not solve the fact that
> text from other operating systems requires some modification before
> working just right. But I don't think that Unicode should recommend
> unconditionally the UTF-8 BOM, because it is problematic in the field
> of use UTF-8 was created for and is still used for.

I think there is a middle ground of tolerance between unconditionally
recommending it and unconditionally recommending against it.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell ­ 
Received on Sun Jul 15 2012 - 20:26:13 CDT

This archive was generated by hypermail 2.2.0 : Sun Jul 15 2012 - 20:26:13 CDT