Re: Problem with SSI and BOM

From: Addison Phillips (
Date: Fri Sep 22 2006 - 16:39:23 CDT

  • Next message: Kenneth Whistler: "Re: Fw: Unicode & space in programming & l10n"



    The BOM is often rendered in the page, throwing off other display
    elements. One common problem on Windows is the prevalence of editors
    (Notepad!!) that add the UTF-8 BOM to text files stored as "UTF-8".
    While one might expect this to act as a "no-op" character, in practice,
    it isn't.


    Jukka K. Korpela wrote:
    > On Fri, 22 Sep 2006, Mark Cilia Vincenti wrote:
    >> I'm using SSI to include UTF-8 encoded files within a UTF-encoded
    >> HTML page on IIS (Internet Information Services). The problem is that
    >> the byte order mark is not being stripped by the SSI parser,
    >> resulting in BOMs within the HTML body.
    > Can't you just remove the BOM? It's not needed in UTF-8 encoded data. It
    > might be thought of as a "signature" from which it is possible to deduce
    > (guess) the encoding. But for HTML files, you can and should explicitly
    > specify the encoding in HTTP headers (when they are transmitted via
    > HTTP) or in <meta> tags or both.
    > If you can't do that for some reason, and if you can't make the
    > inclusion mechanism remove the BOM, it shouldn't be an issue, since
    > within data,
    > BOM (U+FEFF, ZERO-WIDTH NON-BREAKING SPACE) should be treated as an
    > invisible character that "glues" the characters around it together for
    > the purposes of rendering, and this should normally do no harm. Is there
    > some reason to suspect that some browsers don't treat BOM either that
    > way or simply ignore it (which is usually the same thing, for contexts
    > where BOM would normally appear as a result of inclusion).
    > See also the Unicode BOM FAQ,

    Addison Phillips
    Globalization Architect -- Yahoo! Inc.
    Internationalization is an architecture.
    It is not a feature.

    This archive was generated by hypermail 2.1.5 : Fri Sep 22 2006 - 16:40:56 CDT