Re: UTF-8 'BOM'

From: Hans Aberg (haberg@math.su.se)
Date: Thu Jan 20 2005 - 18:52:35 CST

Next message: Hans Aberg: "Re: UTF-8 'BOM'"

Previous message: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
In reply to: Andrew C. West: "Re: UTF-8 'BOM'"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 2005/01/20 19:38, Andrew C. West at andrewcwest@alumni.princeton.edu
wrote:

>> The BOM in UTF-8 is not the 0xFEFF UTF-8 encoded number, but 0xFEFF
>> appearing as though in UTF-16. 0xFEFF is Unicode number, and could be still
>> translated into UTF-8. So the BOM in UTF-8 is a really strange animal.

> The BOM generated by Notepad and other Windows applications at the start of
> UTF-8 files is 0xEF 0xBB 0xBF, which is the UTF-8 transformation of the the
> valid Unicode character U+FEFF, and so no process that claims to process UTF-8
> files should have any problem. If you do get 0xFEFF at the start of (or
> anywhere
> in) a UTF-8 file, then that IS very wrong ... but I've never seen such an
> animal.

Sorry, then I misunderstoofd that. Then it is even more meaningless, because
the point of the UTF-16 BOM is that it can detect byte swapping. Unicode has
decided that text files should be prepended with an ad hoc character of no
particular use.

Hans Aberg

Next message: Hans Aberg: "Re: UTF-8 'BOM'"
Previous message: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
In reply to: Andrew C. West: "Re: UTF-8 'BOM'"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 18:54:35 CST