Date: Mon Feb 17 2003 - 13:52:16 EST
Roozbeh Pournader wrote,
> And some people find it annoying and dangerous. A BOM-ed UTF-8 file breaks
> the Unix text file model to some degree. I can post a link if anyone's
One report seen recently during various searches was that the
BOM caused a core dump in certain cases.
Seems like it should be easy to fix.
* * *
Doing a google search on utf-8 bom produces thousands of hits. There
are several different instances in which various applications have
posted bugs about the UTF-8 BOM; somebody sends in a patch and the
problem is solved.
Mention was made of problems with "cat" or "grep". If a series of
files is concatenated into one file, either the incoming files can
be checked for the BOM and be stripped of same if found, or
the character can simply be passed along. How could the presense
of a few ZWNBSPs make a problem?
List members Markus Kuhn and Tex Texin both have pages commenting
upon the BOM for UTF-8 and otherwise.
* * *
The Tidy application supports BOM for UTF-8 in HTML,
(page is in shift-jis)
(If the BOM is present in the input HTML, Tidy will pass
it along to the output HTML if Tidy is in auto mode.)
* * *
The XMLmind XML Editor posted a bug fix in Jan 2003,
the application "can now load XML files having the UTF-8
Byte Order Mark (BOM = 0xEF 0xBB 0xBF) as allowed by
XML 1.0 Second Edition Specification Errata:
http://www.w3.org/XML/xml-V10-2e-errata. Such files are
typically created using Windows 2000 notepad. "
* * *
Rick Jelliffe wrote on the xml-dev list on 2001-07-04,
"Things like UTF-8 BOMs belong in entity management (like line-feed
handling, transcoding, and Unicode normalization) that should be
as transparent to XML as possible. XML does really well in this regard: the
XML-in-MIME RFCS and the use of Unicode have served us well I think."
( http://lists.xml.org/archives/xml-dev/200107/msg00115.html )
Doesn't Rick Jelliffe's concept make sense for HTML as well?
This archive was generated by hypermail 2.1.5 : Mon Feb 17 2003 - 14:32:40 EST