BOM Bogle Bombination

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jan 20 2005 - 17:50:01 CST

  • Next message: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"

    In a certain, recent thread from hell, certain posters have
    been repeatedly making otherwise unsubstantiated statements
    such as:

    > On the contrary, the Unicode standard defines that a BOM
    > should be used at the start of a plain text file under
    > certain circumstances.

    > So, in short, you say that nobody has the need for BOM's.
    > But still Unicode does require them.

    > The UTF-8 without BOM's is already taking off. But formally,
    > in the eyes of Unicode, that is a corrupted UTF-8.

    And similar fallacious claims about the standard.

    Lest such bogles cause undue consternation among the country
    folk, I think it might be helpful to review what the standard
    *actually* says:

    BOM in UTF-8:

      "When represented in UTF-8, the byte order mark turns into
       the byte sequence <EF BB BF>. Its usage at the beginning
       of a UTF-8 data stream is NEITHER REQUIRED NOR RECOMMENDED
       BY THE UNICODE STANDARD, but its presence does not affect
       conformance to the UTF-8 encoding scheme."
       
       [emphasis added]
       
    BOM in UTF-16:

      "The UTF-16 encoding scheme MAY OR MAY NOT begin with a BOM.
       However, when there is no BOM, and in the absence of a
       higher-level protocol, the byte order of the UTF-16
       encoding scheme is big-endian."
       
       [emphasis added]
       
    If you care to read those in the original, they can be found
    on p. 79. And if you do not have the book itself, please
    refer to:

    http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf
       
    I -- and I suspect many other readers of this list -- would
    appreciate it if certain posters would meditate on these
    specifications until they *understand* them, and would, in
    the meantime, refrain from posting repeated claims to the
    contrary.

    --Ken



    This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 17:52:41 CST