Re: SMTP and unicode

From: Philippe VERDY (
Date: Wed May 18 2005 - 10:30:07 CDT

  • Next message: Philippe VERDY: "Re: ASCII and Unicode lifespan"

    De : "Hans Aberg" A : "Stephane Bortzmeyer" Copie à : "faraz siddiqi" ,
    > At 21:53 +0200 2005/05/17, Stephane Bortzmeyer wrote:
    > >The default channel in SMTP is only 7-bits wide, for historical
    > >reasons. Almost all the SMTP servers, for many, many years, accept to
    > >properly carry 8-bits data (wether UTF-8 or else). See RFC 2821, "2.4
    > >General Syntax Principles and Transaction Model".
    > When 8-bit mail servers started to appear in the beginning of the
    > 1990'ies, it probed difficult to ensure that all servers the mail was
    > passed through were 8-bit. Thus, using an 8-bit character encoding,
    > the mail frequently got corrupted. Therefore, people switched to
    > MIME, which encodes 8-bit data into 7-bit data. That situation seem
    > to remain.

    MIME is NOT an encoding. It's an envelope format that specify the recomanded structure and encapsulation of a RFC822 (and successor) message transmitted for example over SMTP, and defines the limitations to respect, and the format and interpretation of headers.

    MIME suggests TWO "transfer encoding syntaxes" to allow transmitting safely 8-bit data:
    - Base64 which is best for general binary attachment.
    - Quoted-Printable which is best only for Western European languages or HTML files whose most characters are ASCII only (it uses equal signs followed by pairs of hex digits, and allows joining escaping end-of-lines required by too many SMTP servers that have restrictions on maximum line lengths, but it does not work within MIME headers, so MIME headers uses another system with "line continuation" conventions).
    - Some email agents also recognize Hex encoding... but it is not standard (Base64 is better).

    For international text (other than Western European Latin), Using the UTF-7 encoding ("charset") will often be better, but it is not specified in MIME as a content-transfer encoding. Instead the UTF-7 charset can be safely sent over a 7-bit only channel like SMTP, without requiring an extra transfer encoding syntax.

    This archive was generated by hypermail 2.1.5 : Wed May 18 2005 - 10:30:59 CDT