Re: xkcd: LTR from Philippe Verdy on 2012-11-27 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Wed, 28 Nov 2012 04:23:10 +0100

2012/11/28 Leif Halvard Silli <xn--mlform-iua_at_xn--mlform-iua.no>

> For
> a new version of the validator, that ask more of those questions,
> please try http://validator.w3.org/nu/ - it happens to for the most
> part be developed by one of the Firefox developers, btw. And it allows
> you to check XHTML1-syntax as well (but only if you serve it as XHTML -
> if you serve it as HTML, then it validates it as HTML.)
>

This "new" validator is not the one promoted and supported. I use the
"Unicorn" validator that checks all W2C supported markup languages
(including HTML5).

>>> ; in this profile, they
> >>> MUST honor the XML prolog and notably its XML encoding declaration
> >>> (given that the encoding is not specified in the HTTP Content-type.
> >>
> >> Again: Absolutely not. They must not, will not and must not honour the
> >> XML prologue. (It is another matter that some user agents sometimes use
> >> the prologue to look for encoding information.)
> >
> > Sure they can because this XHTML1 site violates HTML5 rules, missing
> > its required prologue.
>
> Not sure how you understand the phrase "honour the XML prologue". It
> also sounds as if you say that HTML5 has its own prologue. But HTML5
> does not contain any code that is commonly known as "prologue". For
> instance, if you refer to the code "<!DOCTYPE html>", then this is not
> a prologue even if it occurs at the start of the document.
>

Question of terminology specific to this version, I consider it part of the
prolog, and it is not valid XML, so not valid XHTML.

>
> From one angle, you are off course right. But HTML5 actually explains
> that what you call "SGML-based" is not SGML-based but only SGML
> *inspired*. Thus, HTML5 is much simpler and less cryptic than the
> (official) SGML syntax of HTML4.

It is evident that here I mean the legacy HTML syntax, not compatible with
XML (it allows closing tags, and does not require self-closed tags for
empty elements).

> >>> I'm still convinced that these are bugs in Firefox and IE, which
> >>> support only HTML5 in its basic HTML profile, but not HTML5 in its
> >>> XML/XHTML profile (which is also part of the HTML5 standard and where
> >>> processing the XML prolog is NOT an option but a requirement).
> >>
> >> Just for the record: HTML5 defines the most up-to-date parsing
> >> mechanism for *all* HTML documents - HTML1,2,3,5 as well as any flavour
> >> of XHTML served as HTML. HTML5 does not allow authors to use the XML
> >> prologue.
> >
> > Where ?
>
> Here: http://dev.w3.org/html5/spec/syntax.html#writing (As you can see,
> it doesn't say that it is allowed, hence it is not.) You can also see
> the bottom of this page:
> http://dev.w3.org/html5/spec/the-meta-element.html#charset
>
> > The required HTML5 prolog applies to its SGML based syntax ;
>
> Please note that prolog is one thing, and the DOCTYPE is another, see
> XML 1.0: http://www.w3.org/TR/REC-xml/#sec-prolog-dtd

Yes I know the terminolgy, but it's evident that I'm including the document
declaration as part of the "prolog" (i.e. everything that is not comment
and that appears before the root element)

> > it makes no sense in XHTML as it voluntarily violates the validity of
> > the XML document declaration.
>
> If you are speaking about the HTML5 doctype, then its only effect is to
> make sure that the HTML parser stays in no-quirks (aka standards) mode.
> In XHTML then, you are right that it is not needed. But you are wrong
> if you say that it is a problem to include it in XHTML, as it causes no
> harm. In fact, in XHTML, you can drop both the DOCTYPE and the XML
> prologue.
>
> > The absence of the HTML5 required prolog (in its standard basic-SGML
> > profile), or the presence of another incompatible XML prolog is
> > enough to make the distinction between the two syntaxes.
>
> You mean: Visually? Yes. However, that is not how parsers think. What
> parsers normally do is that they look at the Content-Type "flag",
> before they decide how to parse the document.

True, but then when the HTML5 parser detects a violation of the required
extended "prolog" (sorry, the HTML5 document declaration, which is not a
valid "document declaration" for XHTML or for HTML4 or before or even for
SGML, due to the unspecified schema after the shema short name), it should
catch this exception to try another parser. The XML declaration itself is
enough to throw the exception and so easy to detect to allow changing from
an HTML parser to an XML parser for XHTML. If even the XML parser fails,
then retry with a legacy HTML parser working in quirks mode.

> Now HTML5 is still not completely polished, finished and approved.

> > Such interoperability rules are not clearly defined even if they are
> > the "most up-to-date" to make it work seamlessly with the claimed
> > compatibility with all flavors of HTML or XHTML. And the fact that
> > Firefox and IE behave differently from Chorme and Safari in this
> > domain is a proof of this unfinished status.
>
> I would not conclude like that … But it could probably have saved us
> this discussion if Firefox/IE, like the other dominating browsers, did
> use it as a fallback method for setting the encoding …
>

I agree.
Received on Tue Nov 27 2012 - 21:25:50 CST

This archive was generated by hypermail 2.2.0 : Tue Nov 27 2012 - 21:25:51 CST