Re: Charset declaration in HTML (was: Romanized Singhala - Think about it again)

From: Naena Guru <>
Date: Sun, 15 Jul 2012 15:47:04 -0500

On Tue, Jul 10, 2012 at 11:58 PM, Leif Halvard Silli <> wrote:

> Naena Guru, Tue, 10 Jul 2012 01:40:19 -0500:
> > HTML5 assumes UTF-8 as the character set if you do not declare one
> > explicitly. My current pages are in HTML 4.
> There is in principle no difference between what HTML5-parsers assume
> and what HTML4-parsers assume: All of them default to the default
> encoding for the locale.
I see. That is, for the transliteration, the locale should be Sinhala
(Latin). Yes. I know that it is not official. I loathe the spelling
Sinhala. Oh, well, you cannot have it all.

> > Notepad forced
> > me to save the file in UTF-8 format. I ran it through W3C Validator. It
> > passed HTML5 test with the following warning:
> >
> > [image: Warning] Byte-Order Mark found in UTF-8 File.
> I assume that you used the validator at

Yes, and it validated it. I was talking about BOM in a different context.
It showed up when I opened the file in HTML-Kit that was first created in
Notepad and saved under UTF-8. HTML-Kit Tools asked me to specify the
character set. It took it. but messed up the macron and dot letters anyway.
What I was trying to emphasize was the fact that it is hard for those
people that try to make web pages in those 'character sets'. I have been
making web pages since 1990s and never had these problems because they were
written by hand in English.

> But if you instead use the most updated HTML5-compatible validators at
> or

> then will not get any warning just because your file uses the
> Byte-Order Mark. HTML5 explicitly allows you to use the BOM.
Thanks. This too validated all seven pages as HTML5 (I upgrated from HTML

> > The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to
> cause
> > problems for some text editors and older browsers. You may want to
> consider
> > avoiding its use until it is better supported.
> Weasel words from the validator. The notion about "older browsers" is
> not very relevant. How old are they? IE6 have no problems with the BOM,
> for instance. And that is probably one of the few, somewhat relevant,
> old browsers.
As I said before BOM was no problem for me.

> As for editors: If your own editor have no problems with the BOM, then
> what? But I think Notepad can also save as UTF-8 but without the BOM -
> there should be possible to get an option for choosing when you save
> it. Else you can use the free Notepad++. And many others. In VIM, you
> set or unset the BOM via the commands
> set bomb
> set nobomb
Yes, yes. I've seen it before. I have Notepad++. It intimidated me the
first time and never used it, haha!

> --
> Leif H Silli
Received on Sun Jul 15 2012 - 17:21:50 CDT

This archive was generated by hypermail 2.2.0 : Sun Jul 15 2012 - 17:29:03 CDT