Re: xkcd: LTR

From: Leif Halvard Silli <>
Date: Wed, 28 Nov 2012 04:05:45 +0100

Philippe Verdy, Wed, 28 Nov 2012 01:10:45 +0100:
> 2012/11/27 Leif Halvard Silli

>> The fact that XHTML 1 permits the XML prolog regardless how the
>> document is served, is just a shortcoming of the XHTML 1 specification.
> No, it was by design. Making HTML an application of XML. Only XML but
> with all rules of XML.

It was by design. But nevertheless a shortcoming. They should/could
have defined more restrictions on the syntax than then they did, and
then it would have been OK. But don't forget that XHTML1 also permits
you to use the meta element - which works in all web browsers, for
setting the encoding:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

This is described in the famous Appendix C of XHTML 1:

>>> So these browsers must find
>>> something else: given the XML prolog they should then use HTML5 in
>>> its XHTML profile, not in its HTML profile
>> No, that is not how things works. The decision to parse the document as
>> HTML is taken before the browser sees the XML prologue. So the prologue
>> should not - and does not - change anything with regard to parsing as
>> HTML or as XML.
> Then explain why the W3C validator sees absolutley no problem in the
> way these XHTML1 pages are encoded and transported.

Because it only checks the syntax, without asking you how you are
actually going to use that syntax - whether you want to serve it to an
XML parser as XHTML or you are going to serve it to an HTML parser. For
a new version of the validator, that ask more of those questions,
please try - it happens to for the most
part be developed by one of the Firefox developers, btw. And it allows
you to check XHTML1-syntax as well (but only if you serve it as XHTML -
if you serve it as HTML, then it validates it as HTML.)

>>> ; in this profile, they
>>> MUST honor the XML prolog and notably its XML encoding declaration
>>> (given that the encoding is not specified in the HTTP Content-type.
>> Again: Absolutely not. They must not, will not and must not honour the
>> XML prologue. (It is another matter that some user agents sometimes use
>> the prologue to look for encoding information.)
> Sure they can because this XHTML1 site violates  HTML5 rules, missing
> its required prologue.

Not sure how you understand the phrase "honour the XML prologue". It
also sounds as if you say that HTML5 has its own prologue. But HTML5
does not contain any code that is commonly known as "prologue". For
instance, if you refer to the code "<!DOCTYPE html>", then this is not
a prologue even if it occurs at the start of the document.

Also, since there are two flavours of XML - XML 1.0 and XML 1.1, the
prologue may potentially have an effect on how the document is parsed,
but only if the parser already knows that the file is XML. But the XML
prologue does not *cause* parsers to choose XML-mode rather than

(Opera introduced the opposite thing some time ago: If the document is
an XHTML document - for real, but contains XML wellformedness errors,
then it will switch to HTML-mode.)

>>> Now given the XML prolog and the DTD declaration, the file is clearly
>>> not even HTML5 in XML/XHTML (i.e. XHTML 5), but is XHTML 1 (based on
>>> a stable subset of HTML4, but working in strict mode without the
>>> quirks modes). Once again, this excludes using the HTML5 rules again.
>> In a way the names and the numbers (HTML4, XHTML1, HTML5) are just
>> confusing. There is just one way to parse HTML. When it comes to HTML
>> (text/html),then HTML5 differs from HTML4 and XHTML1 in that it is not
>> based on a *another* format than HTML itself. Because HTML4 and XHTML1
>> are not based on how HTML actually works, and - in addition - does not
>> take fully account of that (or whatever the reason), they allow
>> syntaxes, such as DTD declarations, which have no effect (except
>> side-effects such as quirks-mode) in HTML.
> HTML5 admits the two syntaxes : SGML-based like it is used primarily
> (in a simplified profile), and XML.

From one angle, you are off course right. But HTML5 actually explains
that what you call "SGML-based" is not SGML-based but only SGML
*inspired*. Thus, HTML5 is much simpler and less cryptic than the
(official) SGML syntax of HTML4.

>>> I'm still convinced that these are bugs in Firefox and IE, which
>>> support only HTML5 in its basic HTML profile, but not HTML5 in its
>>> XML/XHTML profile (which is also part of the HTML5 standard and where
>>> processing the XML prolog is NOT an option but a requirement).
>> Just for the record: HTML5 defines the most up-to-date parsing
>> mechanism for *all* HTML documents - HTML1,2,3,5 as well as any flavour
>> of XHTML served as HTML. HTML5 does not allow authors to use the XML
>> prologue.
> Where ?

Here: (As you can see,
it doesn't say that it is allowed, hence it is not.) You can also see
the bottom of this page:

> The required HTML5 prolog applies to its SGML based syntax ;

Please note that prolog is one thing, and the DOCTYPE is another, see
XML 1.0:

> it makes no sense in XHTML as it voluntarily violates the validity of
> the XML document declaration.

If you are speaking about the HTML5 doctype, then its only effect is to
make sure that the HTML parser stays in no-quirks (aka standards) mode.
In XHTML then, you are right that it is not needed. But you are wrong
if you say that it is a problem to include it in XHTML, as it causes no
harm. In fact, in XHTML, you can drop both the DOCTYPE and the XML

> The absence of the HTML5 required prolog (in its standard basic-SGML
> profile), or the presence of another incompatible XML prolog is
> enough to make the distinction between the two syntaxes.

You mean: Visually? Yes. However, that is not how parsers think. What
parsers normally do is that they look at the Content-Type "flag",
before they decide how to parse the document.

> But both
> syntaxes will generate the same HTML DOM, which is just enough to
> make the proper rendering intended, and make HTML5 compatible with
> both syntaxes.

Yes, if you use XHTML5 syntax, then it can be parsed as both XML and
HTML. But you should not include the XML prologue, since it is illegal
in HTML5 and without any specified effect. Also, in a conforming XML
parser, then as long as your document is UTF-8 encoded anyway, then it
is actually is without effect in XHTML5 as well!

> Now HTML5 is still not completely polished, finished and approved.
> Such interoperability rules are not clearly defined even if they are
> the "most up-to-date" to make it work seamlessly with the claimed
> compatibility with all flavors of HTML or XHTML. And the fact that
> Firefox and IE behave differently from Chorme and Safari in this
> domain is a proof of this unfinished status.

I would not conclude like that … But it could probably have saved us
this discussion if Firefox/IE, like the other dominating browsers, did
use it as a fallback method for setting the encoding …

leif halvard silli
Received on Tue Nov 27 2012 - 21:05:45 CST

This archive was generated by hypermail 2.2.0 : Tue Nov 27 2012 - 21:07:45 CST