Re: BOM's at Beginning of Web Pages?

From: jameskass@att.net
Date: Mon Feb 17 2003 - 13:52:16 EST

  • Next message: Peter_Constable@sil.org: "Re: Everson Mono"

    .
    Roozbeh Pournader wrote,

    > And some people find it annoying and dangerous. A BOM-ed UTF-8 file breaks
    > the Unix text file model to some degree. I can post a link if anyone's
    > interested.

    One report seen recently during various searches was that the
    BOM caused a core dump in certain cases.

    Seems like it should be easy to fix.

    * * *

    Doing a google search on utf-8 bom produces thousands of hits. There
    are several different instances in which various applications have
    posted bugs about the UTF-8 BOM; somebody sends in a patch and the
    problem is solved.

    Mention was made of problems with "cat" or "grep". If a series of
    files is concatenated into one file, either the incoming files can
    be checked for the BOM and be stripped of same if found, or
    the character can simply be passed along. How could the presense
    of a few ZWNBSPs make a problem?

    List members Markus Kuhn and Tex Texin both have pages commenting
    upon the BOM for UTF-8 and otherwise.
    http://www.cl.cam.ac.uk/~mgk25/unicode.html
    http://www.xencraft.com/resources/unicodebom.html

    * * *

    The Tidy application supports BOM for UTF-8 in HTML,

    (see "output-bom")
    (page is in shift-jis)
    http://kwatch.tripod.co.jp/web/htmltidy.html
    (If the BOM is present in the input HTML, Tidy will pass
    it along to the output HTML if Tidy is in auto mode.)

    * * *

    The XMLmind XML Editor posted a bug fix in Jan 2003,
    the application "can now load XML files having the UTF-8
    Byte Order Mark (BOM = 0xEF 0xBB 0xBF) as allowed by
    XML 1.0 Second Edition Specification Errata:
    http://www.w3.org/XML/xml-V10-2e-errata. Such files are
    typically created using Windows 2000 notepad. "

    http://www.xmlmind.com/xmleditor/changes.html

    * * *

    Rick Jelliffe wrote on the xml-dev list on 2001-07-04,
    "Things like UTF-8 BOMs belong in entity management (like line-feed
    handling, transcoding, and Unicode normalization) that should be
    as transparent to XML as possible. XML does really well in this regard: the
    XML-in-MIME RFCS and the use of Unicode have served us well I think."

    ( http://lists.xml.org/archives/xml-dev/200107/msg00115.html )

    Doesn't Rick Jelliffe's concept make sense for HTML as well?

    Best regards,

    James Kass
    .



    This archive was generated by hypermail 2.1.5 : Mon Feb 17 2003 - 14:32:40 EST