RE: Bad Content-type headers on Unicode web site?

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Fri Mar 04 2005 - 00:34:42 CST

  • Next message: Doug Ewell: "Re: Bad Content-type headers on Unicode web site?"

    On Fri, 4 Mar 2005, Dean Harding wrote:

    > According to this section of the HTTP/1.1 protocol:
    >
    > http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1
    >
    > the default encoding is iso-8859-1, unless otherwise stated.

    More exactly, it says:
    "When no explicit charset parameter is provided by the sender, media
    subtypes of the "text" type are defined to have a default charset value of
    "ISO-8859-1" when received via HTTP. Data in character sets other than
    "ISO-8859-1" or its subsets MUST be labeled with an appropriate charset
    value."

    Thus, the protocol requires a charset parameter unless the encoding
    is ISO-8859-1. It also specifies what the receiving agent should or shall
    imply when this requirement is violated. It is unfortunate that this
    default conflicts with other specifications - e.g., RFC 2046, the document
    that defines the media type text/plain, says, in clause (for text/plain):
    "The default character set, which must be assumed in the absence of a
    charset parameter, is US-ASCII."

    But even HTTP/1.1 clearly says that a text/plain document that is
    utf-8 encoded _must_ be sent with charset=utf-8.

    -- 
    Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
    


    This archive was generated by hypermail 2.1.5 : Fri Mar 04 2005 - 00:35:48 CST