From: Jon Hanna (
Date: Tue Jan 25 2005 - 11:51:14 CST

  • Next message: Mark E. Shoulson: "Re: I Heart Huckabees"

    > Whatever the HTTP protocol specs say, it is not mandating
    > anything about how to interpret Content-Types.

    Yes it is. RTFRFC.

     HTTP just
    > offers a way to transport the Content-Type information, and
    > then leaves the interpretation of this content type to MIME
    > specifications.

    No it doesn't. It is based on MIME, but it is not identical to MIME. This is
    what RFC 2616 means by such phrases as "MIME-like messages", "similar to
    that used by Internet mail as defined by the Multipurpose Internet Mail
    Extensions (MIME)", "analogous to ... MIME", and in particular "HTTP is not
    a MIME-compliant protocol." and even the ability to state a MIME-Version
    header to indicate that the message is in compliance with MIME comes with
    the caveat "However, HTTP/1.1 message parsing and semantics are defined by
    this document and not the MIME specification."


    > In other words, it DOES NOT specify any default charset for
    > the transported document, should it be "text/html", or
    > whatever other "text/*" content-type !

    'The "charset" parameter is used with some media types to define the
    character set (section 3.4) of the data. When no explicit charset parameter
    is provided by the sender, media subtypes of the "text" type are defined to
    have a default charset value of "ISO-8859-1" when received via HTTP. Data in
    character sets other than "ISO-8859-1" or its subsets MUST be labeled with
    an appropriate charset value. See section 3.4.1 for compatibility problems.'

    > It's important to note this because the relevant information
    > is not in RFC 2616. HTTP is ONLY a transport protocol that
    > allows querying documents along with their meta-data.

    HTTP is not a transport protocol, it is an application protocol that sits on
    top of a transport protocol, though it is frequently abused to serve as a
    transport protocol (e.g. in SOAP). It is transport-protocol neutral (and
    some of the changes between 1.0 and 1.1 increase the range of transport
    protocols it can work on top of). The most common transport protocol it is
    used on top of is TCP, but other reliable protocols can and are used
    (generally these in turn sit on top of TCP and offer privacy or other
    advantages, this doesn't have to be the case - one could implement HTTP
    immediately on top of Ethernet for example, though one isn't likely to find
    this useful).

    > does not describe or mandate any of these meta-data.

    The majority of the text of the spec does exactly that. RTFRFC.

    > The only mandatory requirements in HTTP for the
    > interpretation of headers are those effectively used in HTTP,
    > to specify the origin host of the document, to sign its
    > content or certify it against alteration, to see if the
    > document can be replicated or cached, or to change its
    > transport encoding syntax to bypass some limitations (most
    > HTTP gateways however are binary safe today, so the only
    > current use of transfer-encoding syntaxes in HTTP is for data
    > compression or security, for example by inserting partial
    > checksums, and allowing altered parts of the document to be
    > reloaded from the source)...

    Actually, clients are free to assume origin host based on connection, to
    ignore MD5 hashes, to implement their own caching rules if they are at the
    end of the connection (i.e. not a proxy) and to not cache any content on
    arbitrary grounds even if they are marked as cacheable. You've come pretty
    close to identifying the set of HTTP headers that aren't mandatory.

    > Don't forget that: HTTP is only a transport protocol, but not

    Again it is not. It is an application protocol. TCP is a transport protocol.

    > fact that HTTP is most often used for HTML documents since
    > its origin does not conceptually binds it to the HTML
    > requirements.

    Of course the only way to determine what is an HTML document and what is a
    text file that happens to contain the likes of <head> etc. is by examining
    the content-type (IE is buggy in this regard though).

     In fact HTTP does not even specify that all
    > HTML documents should be transported with a "text/html"
    > content-type (it could be other types including some XML
    > variants, or application specific content-types, even if the
    > document will first be parsed as HTML, depending on the
    > client application requirements.)

    This is noted in the specs which *do* specify the text/html and
    application/xhtml+xml MIME types. Notably the most recent registration for
    text/html notes the feature of HTTP with regard to default charset
    parameters you are claiming does not exist.

    > The "Content-Type:" header is then only standardized as a
    > container for MIME related information, but it's not to the
    > HTTP spec to say how it will be interpreted.

    Again, MIME-like; not MIME, merely based on it. RTFRFC.

     Notably the
    > absence of a "charset=..." attribute in the content-type
    > value means or implies NOTHING in HTTP,


     which is open to any
    > other content-types for which the simple concept of a
    > "charset" is not significant.

    Yes, the rule for 'media subtypes of the "text" type' does not apply to
    media subtypes that are not of the "text" type.

    > So please don't assert such things. HTTP will just indicate
    > you that the document is of a "text/html" content-type, and
    > then it's up to the client to interpret it according to the
    > definition of this content-type in MIME, where it is
    > registered and bound in reference to the HTML standard.

    No, it's up to the client to interpret it according to the adaptation of
    MIME used by HTTP. RTFRFC.

    > Then comes the HTML standard: this is where the "text/html"
    > content-type will be described with its charset attribute.
    > The HTML standard is clear:

    The HTML standards explicitly refer to exactly the feature of HTTP that I
    mentioned, which you claim does not exist, and the practical issues with it
    that I also mentioned. In other words they say to RTFRFC.

    > (8) In some cases, the browser will need to reload the
    > document from its source by performing a new request to its
    > URL (this will be true if the source indicates that the
    > document is not cachable and generated on the fly, or
    > secured. Unfortunately, if the source document came from a
    > dynamic POST request, the document may be the result of a
    > active query, so generally the browser will first ask to the
    > user whever it wants to resend its last form to get the
    > generated document).

    Browsers are free to retrieve data from private caches when they are merely
    refreshing a *view* on a page or if the user is going backwards and forwards
    through the browser history. It is therefore not necessary to repeat the
    POST to comply with a document being labelled as not cacheable (RTFRFC).

    > However, more modern browsers will cache internally the bytes
    > stream coming from HTTP, to be able to change its meta-data
    > on the fly without having to reload the source: this may
    > cause problems if the HTML document was parsed a first time,
    > and refered to other objects whose URL may be active;
    > reparsing the document will possibly change the list of
    > refered active objects.
    > To avoid this nightmare, notably for actively generated
    > documents, all of them should really specify reliably the
    > charset needed to parse them without needing to requery the
    > source. This is to the website designer to ensure this!

    This is both unlikely and not a "nightmare", merely an efficiency issue
    (since those sub-objects of an HTML document would be retrieved through GET
    which (unlike POST, PUT and DELETE) has safe semantics.

    Jon Hanna
    Work: <>
    Play: <>
    Chat: <irc://>

    This archive was generated by hypermail 2.1.5 : Tue Jan 25 2005 - 11:52:15 CST