Re: Charset declaration in HTML (was: Romanized Singhala - Think about it again)

From: Naena Guru <naenaguru_at_gmail.com>
Date: Sun, 15 Jul 2012 15:47:04 -0500

On Tue, Jul 10, 2012 at 11:58 PM, Leif Halvard Silli <
xn--mlform-iua_at_xn--mlform-iua.no> wrote:

> Naena Guru, Tue, 10 Jul 2012 01:40:19 -0500:
>
> > HTML5 assumes UTF-8 as the character set if you do not declare one
> > explicitly. My current pages are in HTML 4.
>
> There is in principle no difference between what HTML5-parsers assume
> and what HTML4-parsers assume: All of them default to the default
> encoding for the locale.
>
I see. That is, for the transliteration, the locale should be Sinhala
(Latin). Yes. I know that it is not official. I loathe the spelling
Sinhala. Oh, well, you cannot have it all.

>
> > Notepad forced
> > me to save the file in UTF-8 format. I ran it through W3C Validator. It
> > passed HTML5 test with the following warning:
> >
> > [image: Warning] Byte-Order Mark found in UTF-8 File.
>
> I assume that you used the validator at
>

> http://validator.w3.org.
>
Yes, and it validated it. I was talking about BOM in a different context.
It showed up when I opened the file in HTML-Kit that was first created in
Notepad and saved under UTF-8. HTML-Kit Tools asked me to specify the
character set. It took it. but messed up the macron and dot letters anyway.
What I was trying to emphasize was the fact that it is hard for those
people that try to make web pages in those 'character sets'. I have been
making web pages since 1990s and never had these problems because they were
written by hand in English.

>
> But if you instead use the most updated HTML5-compatible validators at
>
> http://www.validator.nu
> or http://validator.w3.org/nu/

>
> then will not get any warning just because your file uses the
> Byte-Order Mark. HTML5 explicitly allows you to use the BOM.
>
Thanks. This too validated all seven pages as HTML5 (I upgrated from HTML
4)

>
> > The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files is known to
> cause
> > problems for some text editors and older browsers. You may want to
> consider
> > avoiding its use until it is better supported.
>
> Weasel words from the validator. The notion about "older browsers" is
> not very relevant. How old are they? IE6 have no problems with the BOM,
> for instance. And that is probably one of the few, somewhat relevant,
> old browsers.
>
As I said before BOM was no problem for me.

>
> As for editors: If your own editor have no problems with the BOM, then
> what? But I think Notepad can also save as UTF-8 but without the BOM -
> there should be possible to get an option for choosing when you save
> it. Else you can use the free Notepad++. And many others. In VIM, you
> set or unset the BOM via the commands
>
> set bomb
> set nobomb
>
Yes, yes. I've seen it before. I have Notepad++. It intimidated me the
first time and never used it, haha!

> --
> Leif H Silli
>
Received on Sun Jul 15 2012 - 17:21:50 CDT

This archive was generated by hypermail 2.2.0 : Sun Jul 15 2012 - 17:29:03 CDT