BOM Bogle Bombination

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jan 20 2005 - 17:50:01 CST

Next message: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"

Previous message: Marcin 'Qrczak' Kowalczyk: "Re: UTF-8 'BOM'"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

In a certain, recent thread from hell, certain posters have
been repeatedly making otherwise unsubstantiated statements
such as:

> On the contrary, the Unicode standard defines that a BOM
> should be used at the start of a plain text file under
> certain circumstances.

> So, in short, you say that nobody has the need for BOM's.
> But still Unicode does require them.

> The UTF-8 without BOM's is already taking off. But formally,
> in the eyes of Unicode, that is a corrupted UTF-8.

And similar fallacious claims about the standard.

Lest such bogles cause undue consternation among the country
folk, I think it might be helpful to review what the standard
*actually* says:

BOM in UTF-8:

  "When represented in UTF-8, the byte order mark turns into
   the byte sequence <EF BB BF>. Its usage at the beginning
   of a UTF-8 data stream is NEITHER REQUIRED NOR RECOMMENDED
   BY THE UNICODE STANDARD, but its presence does not affect
   conformance to the UTF-8 encoding scheme."

   [emphasis added]

BOM in UTF-16:

  "The UTF-16 encoding scheme MAY OR MAY NOT begin with a BOM.
   However, when there is no BOM, and in the absence of a
   higher-level protocol, the byte order of the UTF-16
   encoding scheme is big-endian."

   [emphasis added]

If you care to read those in the original, they can be found
on p. 79. And if you do not have the book itself, please
refer to:

http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf

I -- and I suspect many other readers of this list -- would
appreciate it if certain posters would meditate on these
specifications until they *understand* them, and would, in
the meantime, refrain from posting repeated claims to the
contrary.

--Ken

Next message: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Previous message: Marcin 'Qrczak' Kowalczyk: "Re: UTF-8 'BOM'"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 17:52:41 CST