Re: UTF-8 isn't the default for HTML (was: xkcd: LTR) from Philippe Verdy on 2012-11-29 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Thu, 29 Nov 2012 14:24:29 +0100

And you forget the important part of Appendix A:

*Consequence*: Remember, however, that when the XML declaration is not
included in a document, AND the character encoding is not specified by a
higher level protocol such as HTTP, the document can only use the default
character encodings UTF-8 or UTF-16. See, however, guideline
9<http://www.w3.org/TR/xhtml-media-types/#C_9>
below.

Here we have an XHTML site that is already encoded with the default UTF-8.
There's no reason then for Firefox or IE to render it with windows-1252,
even if they ignore the XML prolog. the "text/html" content-type remains
appropriate for XHTML 1.0, 1.1 or 5.0. The other Content-Type is
"text/xml+xhtml" and similar types for integrating other XML schemas, but
it is only appropriate if you need another schema than just XHTML, or if
you want to integrate the support for an external or internal non-standard
DTD, or you want to integrate the support for XML processing instructions
(including XML schemas not used here, or XML stylesheets, which is the case
here for rendering its technical code when viewing the source but not for
rendering the described page content itself).

The problem here is the "guideline 9" which is not part of the standard,
and which uses one of the worst part of HTML, meta elements ; this was
partly ill-designed as an empty element, and that binds the content-type to
override it and forces the reparsing from start, after parsing all or part
of other required elements (html, body, head, title).

But why ? Isn't UTF-8 (or alternatively UTF-16) already the default
encoding of XHTML?

If not, then we should file a bug in the W3C Validator for not honoring the
Guideline 9 (even though it is not part of the standard itself, but just a
recommendation, it should issue at least a warning).

2012/11/29 Leif Halvard Silli <xn--mlform-iua_at_xn--mlform-iua.no>

> Philippe Verdy, Thu, 29 Nov 2012 13:26:28 +0100:
> > You're wrong. XHTML1 is integrated in the W3C validator and
> > recognized automatically.
>
> Indeed, yes. What I meant by "doesn't integrate XHTML1' was that
> Unicorn doesn't 100% adhere to the two sections of XHTML1 that I
> quoted.[1][2]
>
> > The document you cite in the XHTML1 specs has just not been updated.
>
> The validator must of course implement what XHTML1 says.
>
> > Anyway this http://www.xn--elqus623b.net/XKCD/1137.html site is
> > actually using XHTML1.1 (in its strict schema, not a transitional
> > schema)
>
> A relevant point, of course. But XHTML11 says the same thing:
>
> [3] 'XHTML 1.1 documents SHOULD be labeled with the Internet Media Type
> "application/xhtml+xml" as defined in [RFC3236]. For further
> information on using media types with XHTML, see the informative note
> [XHTMLMIME].'
>
> The XHTMLMIME note says:
>
> [4] 'The 'text/html' media type [RFC2854] is primarily for HTML, not
> for XHTML. In general, this media type is NOT suitable for XHTML except
> when the XHTML is conforms to the guidelines in Appendix A.'
>
> [5] 'DO set the encoding via a "meta http-equiv" statement in the
> document (e.g., <meta http-equiv="Content-Type" content="text/html;
> charset=utf-8" />)'
>
> [1] http://www.w3.org/TR/xhtml1/#media
> [2] http://www.w3.org/TR/xhtml1/#C_9
> [3] http://www.w3.org/TR/xhtml11/xhtml11.html#strict
> [4] http://www.w3.org/TR/xhtml-media-types/#text-html
> [5] http://www.w3.org/TR/xhtml-media-types/#C_9
> --
> leif halvard silli
Received on Thu Nov 29 2012 - 07:26:55 CST

This archive was generated by hypermail 2.2.0 : Thu Nov 29 2012 - 07:26:55 CST