Re: Charset declaration in HTML (was: Romanized Singhala - Think about it again)

From: Leif Halvard Silli <xn--mlform-iua_at_xn--mlform-iua.no>
Date: Wed, 11 Jul 2012 06:58:28 +0200

Naena Guru, Tue, 10 Jul 2012 01:40:19 -0500:

> HTML5 assumes UTF-8 as the character set if you do not declare one
> explicitly. My current pages are in HTML 4.

There is in principle no difference between what HTML5-parsers assume
and what HTML4-parsers assume: All of them default to the default
encoding for the locale.

> Notepad forced
> me to save the file in UTF-8 format. I ran it through W3C Validator. It
> passed HTML5 test with the following warning:
>
> [image: Warning] Byte-Order Mark found in UTF-8 File.

I assume that you used the validator at

        http://validator.w3.org.

But if you instead use the most updated HTML5-compatible validators at

        http://www.validator.nu
or http://validator.w3.org/nu/

then will not get any warning just because your file uses the
Byte-Order Mark. HTML5 explicitly allows you to use the BOM.

> The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to cause
> problems for some text editors and older browsers. You may want to consider
> avoiding its use until it is better supported.

Weasel words from the validator. The notion about "older browsers" is
not very relevant. How old are they? IE6 have no problems with the BOM,
for instance. And that is probably one of the few, somewhat relevant,
old browsers.

As for editors: If your own editor have no problems with the BOM, then
what? But I think Notepad can also save as UTF-8 but without the BOM -
there should be possible to get an option for choosing when you save
it. Else you can use the free Notepad++. And many others. In VIM, you
set or unset the BOM via the commands

        set bomb
        set nobomb

-- 
Leif H Silli
Received on Wed Jul 11 2012 - 00:01:52 CDT

This archive was generated by hypermail 2.2.0 : Wed Jul 11 2012 - 00:01:57 CDT