From: Mark Cilia Vincenti (email@example.com)
Date: Tue Sep 26 2006 - 02:07:32 CST
Thanks all for your answers. This email by Addison Phillips below
summarizes everything neatly. I have 3 SSI includes, and each of them
are breaking the page by putting in an empty line (tested under the
latest versions of IE and Firefox).
If the BOM wasn't being rendered, then it wouldn't have been a problem,
but it is being rendered.
Now, some of these SSIs will be edited by a number of users. I haven't
yet found a text editor which always saves in UTF-8 AND without BOM, no
matter what settings you have.
Besides the fact that I'm limiting users on what editors they can use
(also increasing the chance of human error), BOM has a very important
use. In fact every text editor I tried would think that a file
containing English language characters and saved without BOM is an ANSI
file. In fact, when saved, they are identical.
This poses a big problem. Here's a scenario: the SSI file is saved with
English language characters and without BOM. The user opens up the file
in his favourite text editor. The text editor assumes the file is ANSI.
The user proceeds to add characters with accents in them (eg the name of
a French person), and re-saves the file. Now, since the text editor
opened the file as ANSI, most likely it will assume you want to save it
as ANSI as well, so the default saving format is going to be ANSI.
Conclusion: the BOM is important to have. Some text editors eg Notepad
don't even allow you to save the file without it. But the BOM inside
HTML code is also bad as it's putting in empty lines each time. I'm just
wondering if there's a way I can apply the includes with some other
means that recognises the BOM and doesn't include it as well.
Mark Cilia Vincenti - Internal Developer - Marketing
GFI Software - www.gfi.com
From: Addison Phillips [mailto:firstname.lastname@example.org]
Sent: 22 September 2006 11:39 PM
To: Jukka K. Korpela
Cc: Mark Cilia Vincenti; email@example.com
Subject: Re: Problem with SSI and BOM
The BOM is often rendered in the page, throwing off other display
elements. One common problem on Windows is the prevalence of editors
(Notepad!!) that add the UTF-8 BOM to text files stored as "UTF-8".
While one might expect this to act as a "no-op" character, in practice,
Jukka K. Korpela wrote:
> On Fri, 22 Sep 2006, Mark Cilia Vincenti wrote:
>> I'm using SSI to include UTF-8 encoded files within a UTF-encoded
>> HTML page on IIS (Internet Information Services). The problem is that
>> the byte order mark is not being stripped by the SSI parser,
>> resulting in BOMs within the HTML body.
> Can't you just remove the BOM? It's not needed in UTF-8 encoded data.
> might be thought of as a "signature" from which it is possible to
> (guess) the encoding. But for HTML files, you can and should
> specify the encoding in HTTP headers (when they are transmitted via
> HTTP) or in <meta> tags or both.
> If you can't do that for some reason, and if you can't make the
> inclusion mechanism remove the BOM, it shouldn't be an issue, since
> within data,
> BOM (U+FEFF, ZERO-WIDTH NON-BREAKING SPACE) should be treated as an
> invisible character that "glues" the characters around it together for
> the purposes of rendering, and this should normally do no harm. Is
> some reason to suspect that some browsers don't treat BOM either that
> way or simply ignore it (which is usually the same thing, for contexts
> where BOM would normally appear as a result of inclusion).
> See also the Unicode BOM FAQ,
-- Addison Phillips Globalization Architect -- Yahoo! Inc. Internationalization is an architecture. It is not a feature. This mail was checked for viruses by GFI MailSecurity. GFI also develops anti-spam software (GFI MailEssentials), a fax server (GFI FAXmaker), and network security and management software (GFI LANguard) - www.gfi.com
This archive was generated by hypermail 2.1.5 : Tue Sep 26 2006 - 08:29:39 CST