Re: HTML5 encodings (was: Re: BOCU patent)

From: Doug Ewell (doug@ewellic.org)
Date: Mon Dec 28 2009 - 08:17:10 CST

  • Next message: Peter Krefting: "Re: HTML5 encodings (was: Re: BOCU patent)"

    "verdy_p" <verdy underscore p at wanadoo dot fr> wrote:

    > But anyway, isn't there a default ordering in UTF-32 when no BOM is
    > present ? Why HTML5 wants to change the default ordering and still
    > maintain its name as "UTF-32", in contradiction with TUS ? Shouln't
    > HTML5 rename its modified encoding as "HTML5-UTF-32" (even if it then
    > requires using the BOM... which was also proposed, and also
    > contradicts TUS which only allow optional BOMs in UTF-32 and forbids
    > all BOMs in UTF-32BE and UTF-32LE)...

    The HTML5 draft uses disclaimers such as this one to justify such
    decisions:

    "This algorithm is a willful violation of the HTTP specification, which
    requires that the encoding be assumed to be ISO-8859-1 in the absence of
    a character encoding declaration to the contrary, and of RFC 2046, which
    requires that the encoding be assumed to be US-ASCII in the absence of a
    character encoding declaration to the contrary. This specification's
    third approach is motivated by a desire to be maximally compatible with
    legacy content. [HTTP] [RFC2046]"

    This was from a table in Section 9.2.2.1, where browser developers are
    encouraged to choose a default encoding that is not Unicode 2/3 of the
    time based on "the user's locale."

    More "willful violations" appear in Section 9.2.2.2, in which browsers
    are required to "misinterpret for compatibility" ISO and
    national-standard character sets as Windows code pages, even when the
    author specified the ISO or national character set.

    The implications are that (1) the authors of the present draft know
    better than authors of previous works on character encoding and (2)
    compatibility with existing, incorrectly or incompletely marked HTML
    documents is more important than adherence to standards. This is a
    departure from all other HTML and XHTML specifications I've ever seen
    from the W3C.

    --
    Doug Ewell  |  Thornton, Colorado, USA  |  http://www.ewellic.org
    RFC 5645, 4645, UTN #14  |  ietf-languages @ http://is.gd/2kf0s ­
    


    This archive was generated by hypermail 2.1.5 : Mon Dec 28 2009 - 08:20:06 CST