quoted-string in for MIME Content-Type charset parameter

From: Yung-Fong Tang (ftang@netscape.com)
Date: Thu Feb 27 2003 - 17:07:55 EST

  • Next message: Tex Texin: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"

    Not sure this is the right fourm to discuss this issue. I found this
    "problem" when I debugging a UTF-8 email message.

    When I look into some email that we have problem with, I just saw some
    Content-Type header like the following:

    Content-Type: text/html; charset="UTF-8"

    As I remember, the MIME specification does not allowed "" with the
    charset parameter and it should only accept

    Content-Type: text/html; charset=UTF-8

    but not charset="UTF-8"

    So... I check the MIME spec try to figure out is it allowed or not. What
    shock me is the original MIME specification RFC 1521 disallowed it
    http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc1521.html#sec-7.1.1
    and
    http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc1521.html#sec-7.1.2

        The formal grammar for the content-type header field for text is as
        follows:

        text-type := "text" "/" text-subtype [";" "charset" "=" charset]

        text-subtype := "plain" / extension-token

        charset := "us-ascii"/ "iso-8859-1"/ "iso-8859-2"/ "iso-8859-3"
        / "iso-8859-4"/ "iso-8859-5"/ "iso-8859-6"/ "iso-8859-7"
        / "iso-8859-8" / "iso-8859-9" / extension-token

    but RFC 2045 which obsoleted RFC 1521 allow the " quoted charset name:

    see http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2045.html#sec-5.1

         parameter := attribute "=" value

         attribute := token
                      ; Matching of attributes
                      ; is ALWAYS case-insensitive.
      

    ....
         value := token / quoted-string
      

        Note that the value of a quoted string parameter does not include
        the quotes. That is, the quotation marks in a quoted-string are not
        a part of the value of the parameter, but are merely used to delimit
        that parameter value. In addition, comments are allowed in
        accordance with RFC 822
        <http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc0822.html> rules for
        structured header fields. Thus the following two forms

        Content-type: text/plain; charset=us-ascii (Plain text)

        Content-type: text/plain; charset="us-ascii"

        are completely equivalent.

    I never aware this differences between RFC 1521 and RFC 2045. Not sure
    about you folks aware of it or not.

    I also check HTTP 1.1- RFC 2068. and HTTP 1.0 RFC 1945 . It looks like
    both specification have conflict language within the same specification
    about this issue:
    http://www.w3.org/Protocols/rfc1945/rfc1945
    http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2068.html

    While one place say:

         charset = "US-ASCII"
                 | "ISO-8859-1" | "ISO-8859-2" | "ISO-8859-3"
                 | "ISO-8859-4" | "ISO-8859-5" | "ISO-8859-6"
                 | "ISO-8859-7" | "ISO-8859-8" | "ISO-8859-9"
                 | "ISO-2022-JP" | "ISO-2022-JP-2" | "ISO-2022-KR"
                 | "UNICODE-1-1" | "UNICODE-1-1-UTF-7" | "UNICODE-1-1-UTF-8"
                 | token

    and

           token = 1*<any CHAR except CTLs or tspecials>

           tspecials = "(" | ")" | "<" | ">" | "@"
                          | "," | ";" | ":" | "\" | <">
                          | "/" | "[" | "]" | "?" | "="
                          | "{" | "}" | SP | HT
    which ruled out the use of quoted-string
      

    The other placce it said

    3.6 Media Types

       HTTP uses Internet Media Types [13] in the Content-Type header field
       (Section 10.5) in order to provide open and extensible data typing.

           media-type = type "/" subtype *( ";" parameter )
    ....
           parameter = attribute "=" value
    ....
           value = token | quoted-string

    :( :( :( :(

    Therefore we need to make sure
    1. all the mailer which receive email not only deal with charset=value
    but also charset="value". I am not sure about Mozilla can deal with it
    or not. How about your email program?

    2. The browse can deal with
    Content-Type: text/html; charset="value"
    in additional to
    Content-Type: text/html; charset=value

    3. because we also use META tag in the HTML to reflect the HTTP header,
     that mean the browser not only have to deal with the following kind of
    meta tag

    <meta http-equiv="content-type" content="text/html; charset=value">
    <meta http-equiv="content-type" content='text/html; charset=value'>
    but also
    <meta http-equiv="content-type" content='text/html; charset="value"'>

    :( :( :( :(

    not sure does mozilla handle 2 or 3. How about IE?

    However, for email, since RFC 1521 does NOT allow it, to make sure it
    work with most of the email program, when we try to send out internet
    email, we should try to use

    Content-Type: text/html; charset=UTF-8

    instead of
    Content-Type: text/html; charset="UTF-8"

    Can you check this issue with the product that you are working on ?



    This archive was generated by hypermail 2.1.5 : Thu Feb 27 2003 - 17:53:02 EST