From: Martin Duerst (duerst@w3.org)
Date: Mon Jan 24 2005 - 02:02:26 CST
At 13:54 05/01/20, Peter Constable wrote:
>As for whether plain text files can have a BOM, that is one of the few
>unending debates that arise with certain (fortunately not too freguent)
>regularity, each time with vociferous expressions of deeply-held beliefs
>but never any resolution. I'll just observe that the formal grammar for
>XML does not make reference to a BOM, yet the XML spec certainly assumes
>that a well-formed XML document may begin with a UTF-8 BOM (or a BOM in
>any Unicode encoding form/scheme). Rather than have a philosophical
>debate about the definition of "plain text file", I suggest a more
>pragmatic approach: for better or worse, plain text processes that
>support UTF-8 are going to encounter UTF-8 data beginning with a BOM:
>learn to live with it!
Just for your reference, I'd like to point out the following
historical facts:
- The fact that the BOM isn't part of the XML grammar is due to the
fact that the BOM was always required for UTF-16 (but not for
things such as UTF-16BE and UTF-16LE, which got defined later).
- When XML was first defined and issued as a recommendation (Feb 1998),
nobody in the XML community as far as I know was thinking about
a BOM for UTF-8. The first edition of the XML Recommendation didn't
say anything about a BOM for UTF-8. Also, the early XML Parsers
didn't accept BOMs is the case of UTF-8.
- When Notepad started to use a BOM for UTF-8, the responsible Working
Group went back and took the lack of any statement about a BOM for
UTF-8 in the XML Recommendation to say that this could mean either
that the BOM was allowed or it was not allowed, and clarified that
the BOM was indeed allowed for UTF-8. Many parsers have in the meantime
been upgraded.
So the fact that XML allows an UTF-8 BOM cannot be taken as an indication
of how 'good' the BOM for UTF-8 is, but it can certainly be taken as
an indication of its practical occurrence.
Regards, Martin.
This archive was generated by hypermail 2.1.5 : Mon Jan 24 2005 - 19:27:27 CST