quoted-string in for MIME Content-Type charset parameter

From: Yung-Fong Tang (ftang@netscape.com)
Date: Thu Feb 27 2003 - 17:07:55 EST

  • Next message: Tex Texin: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"

    Not sure this is the right fourm to discuss this issue. I found this
    "problem" when I debugging a UTF-8 email message.

    When I look into some email that we have problem with, I just saw some
    Content-Type header like the following:

    Content-Type: text/html; charset="UTF-8"

    As I remember, the MIME specification does not allowed "" with the
    charset parameter and it should only accept

    Content-Type: text/html; charset=UTF-8

    but not charset="UTF-8"

    So... I check the MIME spec try to figure out is it allowed or not. What
    shock me is the original MIME specification RFC 1521 disallowed it

        The formal grammar for the content-type header field for text is as

        text-type := "text" "/" text-subtype [";" "charset" "=" charset]

        text-subtype := "plain" / extension-token

        charset := "us-ascii"/ "iso-8859-1"/ "iso-8859-2"/ "iso-8859-3"
        / "iso-8859-4"/ "iso-8859-5"/ "iso-8859-6"/ "iso-8859-7"
        / "iso-8859-8" / "iso-8859-9" / extension-token

    but RFC 2045 which obsoleted RFC 1521 allow the " quoted charset name:

    see http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2045.html#sec-5.1

         parameter := attribute "=" value

         attribute := token
                      ; Matching of attributes
                      ; is ALWAYS case-insensitive.

         value := token / quoted-string

        Note that the value of a quoted string parameter does not include
        the quotes. That is, the quotation marks in a quoted-string are not
        a part of the value of the parameter, but are merely used to delimit
        that parameter value. In addition, comments are allowed in
        accordance with RFC 822
        <http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc0822.html> rules for
        structured header fields. Thus the following two forms

        Content-type: text/plain; charset=us-ascii (Plain text)

        Content-type: text/plain; charset="us-ascii"

        are completely equivalent.

    I never aware this differences between RFC 1521 and RFC 2045. Not sure
    about you folks aware of it or not.

    I also check HTTP 1.1- RFC 2068. and HTTP 1.0 RFC 1945 . It looks like
    both specification have conflict language within the same specification
    about this issue:

    While one place say:

         charset = "US-ASCII"
                 | "ISO-8859-1" | "ISO-8859-2" | "ISO-8859-3"
                 | "ISO-8859-4" | "ISO-8859-5" | "ISO-8859-6"
                 | "ISO-8859-7" | "ISO-8859-8" | "ISO-8859-9"
                 | "ISO-2022-JP" | "ISO-2022-JP-2" | "ISO-2022-KR"
                 | "UNICODE-1-1" | "UNICODE-1-1-UTF-7" | "UNICODE-1-1-UTF-8"
                 | token


           token = 1*<any CHAR except CTLs or tspecials>

           tspecials = "(" | ")" | "<" | ">" | "@"
                          | "," | ";" | ":" | "\" | <">
                          | "/" | "[" | "]" | "?" | "="
                          | "{" | "}" | SP | HT
    which ruled out the use of quoted-string

    The other placce it said

    3.6 Media Types

       HTTP uses Internet Media Types [13] in the Content-Type header field
       (Section 10.5) in order to provide open and extensible data typing.

           media-type = type "/" subtype *( ";" parameter )
           parameter = attribute "=" value
           value = token | quoted-string

    :( :( :( :(

    Therefore we need to make sure
    1. all the mailer which receive email not only deal with charset=value
    but also charset="value". I am not sure about Mozilla can deal with it
    or not. How about your email program?

    2. The browse can deal with
    Content-Type: text/html; charset="value"
    in additional to
    Content-Type: text/html; charset=value

    3. because we also use META tag in the HTML to reflect the HTTP header,
     that mean the browser not only have to deal with the following kind of
    meta tag

    <meta http-equiv="content-type" content="text/html; charset=value">
    <meta http-equiv="content-type" content='text/html; charset=value'>
    but also
    <meta http-equiv="content-type" content='text/html; charset="value"'>

    :( :( :( :(

    not sure does mozilla handle 2 or 3. How about IE?

    However, for email, since RFC 1521 does NOT allow it, to make sure it
    work with most of the email program, when we try to send out internet
    email, we should try to use

    Content-Type: text/html; charset=UTF-8

    instead of
    Content-Type: text/html; charset="UTF-8"

    Can you check this issue with the product that you are working on ?

    This archive was generated by hypermail 2.1.5 : Thu Feb 27 2003 - 17:53:02 EST