Re: UTF-8 isn't the default for HTML (was: xkcd: LTR)

From: Leif Halvard Silli <xn--mlform-iua_at_xn--mlform-iua.no>
Date: Thu, 29 Nov 2012 15:05:22 +0100

Philippe Verdy, Thu, 29 Nov 2012 14:24:29 +0100:
> And you forget the important part of Appendix A:
>
> *Consequence*: Remember, however, that when the XML declaration is not
> included in a document, AND the character encoding is not specified by a
> higher level protocol such as HTTP, the document can only use the default
> character encodings UTF-8 or UTF-16. See, however, guideline
> 9<http://www.w3.org/TR/xhtml-media-types/#C_9>
> below.
>
> Here we have an XHTML site that is already encoded with the default UTF-8.
> There's no reason then for Firefox or IE to render it with windows-1252,
> even if they ignore the XML prolog. the "text/html" content-type remains
> appropriate for XHTML 1.0, 1.1 or 5.0.

Note that point 1, which you quoted,[1] and all the rest of the entire
note, is about how *authors* should behave when they create XHTML
documents. The note is *not* about how user agents should behave. Also
note that what you refer to as "the important part of Appendix A" ends
in a sentence that points to guideline 9, which in turn tells authors
to 'DO set the encoding via a "meta http-equiv"' and note that the
example in guideline 9 uses UTF-8 as example, quote: '(e.g., <meta
http-equiv="Content-Type" content="text/html; charset=utf-8" />)'.

 ...
> But why ? Isn't UTF-8 (or alternatively UTF-16) already the default
> encoding of XHTML?
>
> If not, then we should file a bug in the W3C Validator for not honoring the
> Guideline 9 (even though it is not part of the standard itself, but just a
> recommendation, it should issue at least a warning).

This is exactly the problem. Your "if not" does apply! Because, if one
presents a XHTML document to the browser as HTML, then then
windows-1252 - and not UTF-8 - becomes the default encoding. And, in
fact, as consequence of our dialog, I have notified the developers of
Unicorn about the shortcoming, asking them to issue a warning.

[1] http://www.w3.org/TR/xhtml-media-types/#C_1
[2] http://www.w3.org/TR/xhtml-media-types/#C_9

-- 
leif halvard silli
Received on Thu Nov 29 2012 - 08:06:47 CST

This archive was generated by hypermail 2.2.0 : Thu Nov 29 2012 - 08:06:48 CST