Re: UTF-8 'BOM'

From: Hans Aberg (haberg@math.su.se)
Date: Thu Jan 20 2005 - 12:16:30 CST

Next message: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"

Previous message: Rick McGowan: "Public Review Issue update"
In reply to: Christopher Fynn: "Re: UTF-8 'BOM'"
Next in thread: Addison Phillips [wM]: "RE: UTF-8 'BOM'"
Reply: Addison Phillips [wM]: "RE: UTF-8 'BOM'"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 2005/01/20 14:14, Christopher Fynn at cfynn@gmx.net wrote:

> Hans Aberg wrote:
>
>
>> It is much better if the BOM is illegal in UTF-8. It does not prevent MS to
>> use it, instead labelling it as a file format marker for MS text files. A
>> program that then deals with MS text files must then know about the BOM and
>> remove it when and if appropriate. At the same time, it does not cause any
>> problems for programs that normally do not handle MS text files but only
>> plain text: They are fine as they are. Everyone should be able to be happy.
>
> Since BOM is a valid Unicode & ISO 110646 character and UTF-8 is a
> transformation format of Unicode & 10646, if BOM were illegal in UTF-8
> it couldn't be used for *all* Unicode characters.

The BOM in UTF-8 is not the 0xFEFF UTF-8 encoded number, but 0xFEFF
appearing as though in UTF-16. 0xFEFF is Unicode number, and could be still
translated into UTF-8. So the BOM in UTF-8 is a really strange animal.

Next message: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Previous message: Rick McGowan: "Public Review Issue update"
In reply to: Christopher Fynn: "Re: UTF-8 'BOM'"
Next in thread: Addison Phillips [wM]: "RE: UTF-8 'BOM'"
Reply: Addison Phillips [wM]: "RE: UTF-8 'BOM'"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 12:18:04 CST