RE: Subject: Re: 32'nd bit & UTF-8

From: Martin Duerst (duerst@w3.org)
Date: Mon Jan 24 2005 - 02:02:26 CST

Next message: Martin Duerst: "Re: 32'nd bit & UTF-8"

Previous message: Peter Constable: "RE: Actually, this wasn't rhetorical"
Maybe in reply to: Arcane Jill: "Subject: Re: 32'nd bit & UTF-8"
Next in thread: Martin Duerst: "RE: Subject: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

At 13:54 05/01/20, Peter Constable wrote:

>As for whether plain text files can have a BOM, that is one of the few
>unending debates that arise with certain (fortunately not too freguent)
>regularity, each time with vociferous expressions of deeply-held beliefs
>but never any resolution. I'll just observe that the formal grammar for
>XML does not make reference to a BOM, yet the XML spec certainly assumes
>that a well-formed XML document may begin with a UTF-8 BOM (or a BOM in
>any Unicode encoding form/scheme). Rather than have a philosophical
>debate about the definition of "plain text file", I suggest a more
>pragmatic approach: for better or worse, plain text processes that
>support UTF-8 are going to encounter UTF-8 data beginning with a BOM:
>learn to live with it!

Just for your reference, I'd like to point out the following
historical facts:

- The fact that the BOM isn't part of the XML grammar is due to the
   fact that the BOM was always required for UTF-16 (but not for
   things such as UTF-16BE and UTF-16LE, which got defined later).
- When XML was first defined and issued as a recommendation (Feb 1998),
   nobody in the XML community as far as I know was thinking about
   a BOM for UTF-8. The first edition of the XML Recommendation didn't
   say anything about a BOM for UTF-8. Also, the early XML Parsers
   didn't accept BOMs is the case of UTF-8.
- When Notepad started to use a BOM for UTF-8, the responsible Working
   Group went back and took the lack of any statement about a BOM for
   UTF-8 in the XML Recommendation to say that this could mean either
   that the BOM was allowed or it was not allowed, and clarified that
   the BOM was indeed allowed for UTF-8. Many parsers have in the meantime
   been upgraded.

So the fact that XML allows an UTF-8 BOM cannot be taken as an indication
of how 'good' the BOM for UTF-8 is, but it can certainly be taken as
an indication of its practical occurrence.

Regards, Martin.

Next message: Martin Duerst: "Re: 32'nd bit & UTF-8"
Previous message: Peter Constable: "RE: Actually, this wasn't rhetorical"
Maybe in reply to: Arcane Jill: "Subject: Re: 32'nd bit & UTF-8"
Next in thread: Martin Duerst: "RE: Subject: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jan 24 2005 - 19:27:27 CST