Re: xkcd: LTR from Leif Halvard Silli on 2012-11-27 (Unicode Mail List Archive)

From: Leif Halvard Silli <xn--mlform-iua_at_xn--mlform-iua.no>
Date: Tue, 27 Nov 2012 22:18:37 +0100

Philippe Verdy, Tue, 27 Nov 2012 21:07:31 +0100:
> Ahhhh ! I see now the problem: the XHTML file is being served as HTML
> instead of XHTML (but this is not invalid for XHTML 1).

Both SGML-based HTML4 and XML-based XHTML 1 operate with syntax rules
that are not - and has never been - compatible with the way text/html
operates. Thus, both HTML4 and XHTML1 permits syntaxes whose semantics
are ignored when the document is parsed as HTML (as opposed to parsed
as SGML or as XML).

If you you are interested in creating XHTML syntax that is compatible
with HTML, then you should look at Polyglot Markup:
http://www.w3.org/TR/html-polyglot/

> But anyway you're also right that the XML prolog found is NOT valid
> for HTML5 when the file is served as HTML instead of XHTML.

The fact that XHTML 1 permits the XML prolog regardless how the
document is served, is just a shortcoming of the XHTML 1 specification.

> So these browsers must find
> something else: given the XML prolog they should then use HTML5 in
> its XHTML profile, not in its HTML profile

No, that is not how things works. The decision to parse the document as
HTML is taken before the browser sees the XML prologue. So the prologue
should not - and does not - change anything with regard to parsing as
HTML or as XML.

> ; in this profile, they
> MUST honor the XML prolog and notably its XML encoding declaration
> (given that the encoding is not specified in the HTTP Content-type.

Again: Absolutely not. They must not, will not and must not honour the
XML prologue. (It is another matter that some user agents sometimes use
the prologue to look for encoding information.)

> Now given the XML prolog and the DTD declaration, the file is clearly
> not even HTML5 in XML/XHTML (i.e. XHTML 5), but is XHTML 1 (based on
> a stable subset of HTML4, but working in strict mode without the
> quirks modes). Once again, this excludes using the HTML5 rules again.

In a way the names and the numbers (HTML4, XHTML1, HTML5) are just
confusing. There is just one way to parse HTML. When it comes to HTML
(text/html),then HTML5 differs from HTML4 and XHTML1 in that it is not
based on a *another* format than HTML itself. Because HTML4 and XHTML1
are not based on how HTML actually works, and - in addition - does not
take fully account of that (or whatever the reason), they allow
syntaxes, such as DTD declarations, which have no effect (except
side-effects such as quirks-mode) in HTML.

> I'm still convinced that these are bugs in Firefox and IE, which
> support only HTML5 in its basic HTML profile, but not HTML5 in its
> XML/XHTML profile (which is also part of the HTML5 standard and where
> processing the XML prolog is NOT an option but a requirement).

Just for the record: HTML5 defines the most up-to-date parsing
mechanism for *all* HTML documents - HTML1,2,3,5 as well as any flavour
of XHTML served as HTML. HTML5 does not allow authors to use the XML
prologue. So while XHTML1 allows you to use the prologue, the best
description of how to parse anything that purports to be HTML - HTML5
- does not require user agents/browsers to pay any attention to the
prologue. Thus the correct one to blame in this case for the fact that
it doesn't work in Firefox, seems to be the author. (Though we could
also blame the "The history of how HTML developed".

-- 
leif halvard silli

Received on Tue Nov 27 2012 - 15:45:30 CST

This archive was generated by hypermail 2.2.0 : Tue Nov 27 2012 - 15:45:32 CST