Re: Charset declaration in HTML

From: Jean-François Colson <jf_at_colson.eu>
Date: Wed, 11 Jul 2012 13:04:36 +0200

Le 11/07/12 06:32, Philippe Verdy a écrit :
> 2012/7/10 Naena Guru <naenaguru_at_gmail.com <mailto:naenaguru_at_gmail.com>>
>
> I wanted to see how hard it is to edit a page in Notepad. So I
> made a copy of my LIYANNA page and replaced the character entities
> I used for Unicode Sinhala, accented Pali and Sanskrit with their
> raw letters. Notepad forced me to save the file in UTF-8 format. I
> ran it through W3C Validator. It passed HTML5 test with the
> following warning:
>
> #
>
> Warning Byte-Order Mark found in UTF-8 File.
>
> #
>
> The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is
> known to cause problems for some text editors and older
> browsers. You may want to consider avoiding its use until it
> is better supported.
>
> The BOM is the first character of the file. There are myriad hoops
> that non-Latin users go through to do things that we routinely do.
> This problem I saw right at the inception. I already know why
> romanizing is so good. Don't you?
>
>
> You should probably ignore this non-critical warning now ; it is only
> for extremely strict compatibility with deprecated softwares that
> should have been updated since long for obvious security and
> performance reasons.

There are a few cases where a BOM may cause troubles.

For example, there's a PHP function header() which permits to redirect
to another page.

If you create a PHP document containing

?<?php
header("location:http://unicode.org");
?>

you will be redirected to the Unicode website.

No text may be sent before the header() function. Otherwise you get an
error message.
If your document contains

text
?<?php
header("location:http://unicode.org");
?>

"text" will be sent and you'll get an error message such as:

text ? Warning: Cannot modify header information - headers already sent
by (output started at /customers/0/1/f/colson.eu/httpd.www/test.php:2)
in /customers/0/1/f/colson.eu/httpd.www/test.php on line 3

If your document only contains

?<?php
header("location:http://unicode.org");
?>

but you save it with a BOM, the BOM will be sent and you'll get an error
message like

Warning: Cannot modify header information - headers already sent by
(output started at /customers/0/1/f/colson.eu/httpd.www/test.php:1) in
/customers/0/1/f/colson.eu/httpd.www/test.php on line 2

(tested with Firefox 13.0 and Google Chrome 20.0.1132.47 on Ubuntu 12.04.)

On Windows, Notepad++ <http://notepad-plus-plus.org/> permits to choose
among different encodings including UTF-8 and UTF-8 without BOM. Always
choose the last one and you'll avoid such problems.
Received on Wed Jul 11 2012 - 06:06:31 CDT

This archive was generated by hypermail 2.2.0 : Wed Jul 11 2012 - 06:06:33 CDT