Re: UTF-8 'BOM'

From: Hans Aberg (haberg@math.su.se)
Date: Thu Jan 20 2005 - 18:52:37 CST

  • Next message: Hans Aberg: "Re: UTF-8 'BOM'"

    On 2005/01/20 20:10, Addison Phillips [wM] at aphillips@webmethods.com
    wrote:

    >> The BOM in UTF-8 is not the 0xFEFF UTF-8 encoded number, but 0xFEFF
    >> appearing as though in UTF-16. 0xFEFF is Unicode number, and
    >> could be still
    >> translated into UTF-8. So the BOM in UTF-8 is a really strange animal.
    >
    > I hesitate to feed the thread, but what the heck.
    >
    > This is confusingly written, but I believe it is wrong.

    Yes, I misunderstood that one.

    > The Unicode scalar value (for the BOM character) is U+FEFF. In UTF-8 this is
    > encoded as the byte sequence:
    >
    > 0xEF 0xBB 0xBF
    >
    > This is the byte sequence that Notepad writes at the start of UTF-8 files
    > saved from that editor.

    So they say.

    > Given all the misinformation on this thread, I direct your attention to the
    > FAQ:
    >
    > http://www.unicode.org/faq/utf_bom.html#BOM

    Thanx for the pointer.

      Hans Aberg



    This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 18:54:40 CST