Re: UTF-8 isn't the default for HTML (was: xkcd: LTR)

From: Leif Halvard Silli <xn--mlform-iua_at_xn--mlform-iua.no>
Date: Thu, 29 Nov 2012 17:08:58 +0100

Philippe Verdy, Thu, 29 Nov 2012 16:10:14 +0100:
> Thanks a lot, this was really hard to see and understand, because I
> was only reading the XHTML specs, and the Validator did not complain.

Glad to find we are no the same page!

Philippe Verdy, Thu, 29 Nov 2012 16:27:13 +0100:
> <?html version="5.0" encoding="utf-8">

HTML5 already have 4 *conforming* methods for setting the UTF-8
encoding:

1. byte-order mark
2. HTTP server,
   Content-Type:text/html;charset=UTF-8
3. meta http-equiv,
   <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/>
4. meta charset,
   <meta charset="UTF-8"/>
   (Note that there is no content-type here, and thus the meta charset
   method is more "clean" to use in a file served as XHTML.)

In addition, other things have effect:

6. Sniffing is an official, but largely unimplemented method for
   getting the encoding (Chrome and Opera use it, and Firefox
   has it as an option and also uses it by default for some locales.)
7. The XML prologue (sic) takes effect in *some* browsers.
8. Simply serving the page as application/xhtml+xml is
   yet another method of setting the encoding to UTF-8.

Thus I can guarantee you that your idea about at method number 9, is
not going to be met with enthusiasm.

-- 
leif halvard silli
Received on Thu Nov 29 2012 - 10:10:25 CST

This archive was generated by hypermail 2.2.0 : Thu Nov 29 2012 - 10:10:25 CST