Re: UTF-8 'BOM'

From: Hans Aberg (haberg@math.su.se)
Date: Thu Jan 20 2005 - 18:52:35 CST

  • Next message: Hans Aberg: "Re: UTF-8 'BOM'"

    On 2005/01/20 19:38, Andrew C. West at andrewcwest@alumni.princeton.edu
    wrote:

    >> The BOM in UTF-8 is not the 0xFEFF UTF-8 encoded number, but 0xFEFF
    >> appearing as though in UTF-16. 0xFEFF is Unicode number, and could be still
    >> translated into UTF-8. So the BOM in UTF-8 is a really strange animal.

    > The BOM generated by Notepad and other Windows applications at the start of
    > UTF-8 files is 0xEF 0xBB 0xBF, which is the UTF-8 transformation of the the
    > valid Unicode character U+FEFF, and so no process that claims to process UTF-8
    > files should have any problem. If you do get 0xFEFF at the start of (or
    > anywhere
    > in) a UTF-8 file, then that IS very wrong ... but I've never seen such an
    > animal.

    Sorry, then I misunderstoofd that. Then it is even more meaningless, because
    the point of the UTF-16 BOM is that it can detect byte swapping. Unicode has
    decided that text files should be prepended with an ad hoc character of no
    particular use.

      Hans Aberg



    This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 18:54:35 CST