From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jan 20 2005 - 17:50:01 CST
In a certain, recent thread from hell, certain posters have
been repeatedly making otherwise unsubstantiated statements
such as:
> On the contrary, the Unicode standard defines that a BOM
> should be used at the start of a plain text file under
> certain circumstances.
> So, in short, you say that nobody has the need for BOM's.
> But still Unicode does require them.
> The UTF-8 without BOM's is already taking off. But formally,
> in the eyes of Unicode, that is a corrupted UTF-8.
And similar fallacious claims about the standard.
Lest such bogles cause undue consternation among the country
folk, I think it might be helpful to review what the standard
*actually* says:
BOM in UTF-8:
"When represented in UTF-8, the byte order mark turns into
the byte sequence <EF BB BF>. Its usage at the beginning
of a UTF-8 data stream is NEITHER REQUIRED NOR RECOMMENDED
BY THE UNICODE STANDARD, but its presence does not affect
conformance to the UTF-8 encoding scheme."
[emphasis added]
BOM in UTF-16:
"The UTF-16 encoding scheme MAY OR MAY NOT begin with a BOM.
However, when there is no BOM, and in the absence of a
higher-level protocol, the byte order of the UTF-16
encoding scheme is big-endian."
[emphasis added]
If you care to read those in the original, they can be found
on p. 79. And if you do not have the book itself, please
refer to:
http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf
I -- and I suspect many other readers of this list -- would
appreciate it if certain posters would meditate on these
specifications until they *understand* them, and would, in
the meantime, refrain from posting repeated claims to the
contrary.
--Ken
This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 17:52:41 CST