Re: pre-HTML5 and the BOM from Leif Halvard Silli on 2012-07-17 (Unicode Mail List Archive)

From: Leif Halvard Silli <xn--mlform-iua_at_xn--mlform-iua.no>
Date: Tue, 17 Jul 2012 10:22:06 +0200

Philippe Verdy, Tue, 17 Jul 2012 03:40:37 +0200:
> 2012/7/16 Leif Halvard Silli:

HTML5:

> (ASCII is considered now an alias of Windows-1252, also for
> compatibiluty reasons, even if strict US-ASCII resources could be
> interpreted without changes as UTF-8)

I agree that HTML5 ought to ask UAs to, more aggressively, try to
detect UTF-8. And an argument was put forward in the WHATWG mailinglist
earlier tis year/end of previous year, that a page with strict ASCII
characters inside could still contain character entities/references for
characters outside ASCII. For instance, early on in 'the Web', some
appeared to think that all non-ASCII had to be represented as entities.

> and require explicit encoding
> (sniffing no longer works for something else as UTF-8 for its leading
> BOM interpreted as a data signature and not as a character)

If that was true, then Firefox' (for most locales) optional character
encoding detector would not be compatible with HTML5. And also, Chrome
would violate HTML5. I do not think that HTML5 rules out detection of
encodings that HTML5 permit/requires UAs to support. However, the
encoding sniffing algorithm specifies at which stages in the sniffing
process the 'try-to-guess' step should happen.

-- 
Leif Halvard Silli

Received on Tue Jul 17 2012 - 03:29:13 CDT

This archive was generated by hypermail 2.2.0 : Tue Jul 17 2012 - 03:29:37 CDT