From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Fri Mar 04 2005 - 00:34:42 CST
On Fri, 4 Mar 2005, Dean Harding wrote:
> According to this section of the HTTP/1.1 protocol:
>
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1
>
> the default encoding is iso-8859-1, unless otherwise stated.
More exactly, it says:
"When no explicit charset parameter is provided by the sender, media
subtypes of the "text" type are defined to have a default charset value of
"ISO-8859-1" when received via HTTP. Data in character sets other than
"ISO-8859-1" or its subsets MUST be labeled with an appropriate charset
value."
Thus, the protocol requires a charset parameter unless the encoding
is ISO-8859-1. It also specifies what the receiving agent should or shall
imply when this requirement is violated. It is unfortunate that this
default conflicts with other specifications - e.g., RFC 2046, the document
that defines the media type text/plain, says, in clause (for text/plain):
"The default character set, which must be assumed in the absence of a
charset parameter, is US-ASCII."
But even HTTP/1.1 clearly says that a text/plain document that is
utf-8 encoded _must_ be sent with charset=utf-8.
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Fri Mar 04 2005 - 00:35:48 CST