Re: Problem with SSI and BOM

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Fri Sep 22 2006 - 15:12:25 CDT

  • Next message: Richard Wordingham: "Re: Fw: Unicode & space in programming & l10n"

    On Fri, 22 Sep 2006, Mark Cilia Vincenti wrote:

    > I'm using SSI to include UTF-8 encoded files within a UTF-encoded
    > HTML page on IIS (Internet Information Services). The problem is that
    > the byte order mark is not being stripped by the SSI parser,
    > resulting in BOMs within the HTML body.

    Can't you just remove the BOM? It's not needed in UTF-8 encoded data. It
    might be thought of as a "signature" from which it is possible to deduce
    (guess) the encoding. But for HTML files, you can and should explicitly
    specify the encoding in HTTP headers (when they are transmitted via HTTP)
    or in <meta> tags or both.

    If you can't do that for some reason, and if you can't make the inclusion
    mechanism remove the BOM, it shouldn't be an issue, since within data,
    BOM (U+FEFF, ZERO-WIDTH NON-BREAKING SPACE) should be treated as an
    invisible character that "glues" the characters around it together for the
    purposes of rendering, and this should normally do no harm. Is there some
    reason to suspect that some browsers don't treat BOM either that way or
    simply ignore it (which is usually the same thing, for contexts where BOM
    would normally appear as a result of inclusion).

    See also the Unicode BOM FAQ,
    http://www.unicode.org/unicode/faq/utf_bom.html

    -- 
    Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
    


    This archive was generated by hypermail 2.1.5 : Fri Sep 22 2006 - 15:16:49 CDT