RE: BOM in HTML (was Conformance (was UTF, BOM, etc))

From: Jon Hanna (
Date: Sat Jan 22 2005 - 12:00:38 CST

  • Next message: Jon Hanna: "RE: Subject: Re: 32'nd bit & UTF-8"

    or if there's no charset
    > specification in HTTP headers, but there's an internal charset
    > specified in the document that indicates it's using the UTF-8
    > "charset"

    *Strictly* in the absence of a charset parameter the header "Content-Type:
    text/html" is supposed to be taken as having a default charset parameter of
    "charset=iso-8859-1", which is one of the minor changes RFC 2616 (HTTP) made
    in its use of MIME (under which the default charset parameter would have
    been "charset=us-ascii).

    In practice browsers tend to give <meta /> elements priority in such a case,
    and even the MIME registration for text/html notes that the whole area of
    default charset parameters is problematic. As such while it is strictly
    against the letter of the standards it is probably within the general spirit
    of being "tolerant in what you accept".

    When content is served as application/xhtml+xml, or if an XML declaration is
    present, then really only the XML rules for dealing with absent charset
    information in HTTP headers should be used, <meta /> elements should be

    > There's absolutely no need for the HTML or XML standard to
    > say anything
    > about the BOM, because this is specified elsewhere, in the charset
    > definition (using the IANA definition of charsets, also referenced
    > normatively by the optional MIME "content-type:" charset
    > specifier) and
    > its related standards.

    For the most part, yes, they both work at a layer above the encoding and the
    encoding deals with the BOM. XML does have rules for determining the
    encoding in the absence of any information about it, and that therefore does
    have to deal with the BOM.

    Jon Hanna
    Work: <>
    Play: <>
    Chat: <irc://>

    This archive was generated by hypermail 2.1.5 : Sat Jan 22 2005 - 12:04:07 CST