Re: 8-bit MIME (was: Documenting in Tamil Computing)

From: Jungshik Shin (jshin@mailaps.org)
Date: Tue Dec 17 2002 - 10:50:26 EST

  • Next message: Alan Wood: "RE: Precomposed Tibetan"

    On Tue, 17 Dec 2002, Stephane Bortzmeyer wrote:

    > On Tue, Dec 17, 2002 at 01:28:00PM +0100,
    > Otto Stolz <Otto.Stolz@uni-konstanz.de> wrote

    > > I have seen many messages, originally in ISO-8859-1-encoded French,
    > > that got the high-bit of every accented character chopped off, thus
    > > replacing "é" with "i", "î" with "n", and so forth.

      When was the last time you saw this?

    > Last time I saw such problems was something like ten years ago. It was
    > almost never the fault of the SMTP server, but of some programs on the
    > destination machine (or sometimes the faults of funny gateways like
    > X400 servers, something you cannot blame on the Internet).

      Although I agree that 8BITMIME is implemented and deployed
    very widely these days(it's been more than two years since I received
    garbled emails due to 7bit-only path. I receive tens of emails in 8bit
    encodings every day), I'm afraid it's your unique experience that the
    last time you received emails with MSB stripped off was 10 years ago.
    While trying to counter the exaggeration made against the ability of the
    internet email to transport UTF-8 emails, you may have gone to the other
    extreme. In 1992, sendmail 4.x/5.x transported more than half (if not
    more) of the Internet email and they're not 8bit clean. That's why RFC
    1468 and RFC 1557 were written circa 1992 for Japanese and Korean email
    exchanges in 7bit ISO-2022-JP and ISO-2022-KR, respectively. (in case
    of ISO-2022-JP, there's another important reason. there are two major
    encodings used for Japanese, Shift_JIS on DOS/Windows/Mac and EUC-JP on
    Unix) As lately as 1999, I did receive MSB-stripped emails which didn't
    go through non-SMTP gateway (e.g. X400). Back then, some mail servers
    still used 7bit-only sendmail 4.x, 5.x (on old Sun OS 4.x, AIX 3.x, 4.x,
    HP/UX 8.x, IRIX, etc machines), old version of PMDF(old VMS machines)
    and smail(on some Unix machines) while 8bit clean sendmail 8.6.x or
    later had been around since mid-1990's.

    Besides, some email servers still don't
    abide by ESMTP standard and don't include '8BITMIME' in their response
    when queried with 'EHLO' although they support 8bit clean transport
    (as you wrote).

    Nonetheless, I agree that these days most mail transport paths are 8bit
    clean. Even if not, Base64 and QP(I don't regard them as hack as you do)
    are well supported by most modern MUAs so that end-users have little
    problem exchanging emails in UTF-8 (or other legacy 8bit encodings).
    Most of them don't have to care whether 8BITMIME is used in transit
    or which C-T-E is used, 8bit,QP, or Base64.

    > > take the pains to transform 8-bit MIME to some transfer-encoding
    > > supported by the receiving server.
    >
    > Very bad idea, BTW, since it mangles the mail, which can be a problem
    > with applications like cryptographic signatures. I always turn it off
    > and it was never a problem. In practice (do note I refer to the real
    > world), all SMTP servers accept 8-bits EVEN IF THEY DO NOT ADVERTISE
    > IT PROPERLY with the 8BITMIME option.

      Doing this type of C-T-E change (from 8bit to QP/Base64)
    automatically at the MTA level may be a bad idea, but doing this with
    MUAs should not be a problem(that's what end-users choose). With most
    modern MUAs supporting MIME standard very well(with notable exceptions
    being Eudora and some popular web mail services), the 8bit-cleanness
    of the transport path doesn't matter much for UTF-8 email exchange
    as I wrote above.

         IMHO, the biggest obstacle to email exchange in UTF-8 is not
    7bit only SMTP but the fact that people don't feel a strong need to
    switch because they think legacy encodings just work fine for them.
    (not many people need to exchange emails in languages other than their
    native ones, let alone multilingual emails that cross the boundary of
    legacy encodings). Another obstacle is that popular web mail services don't
    support UTF-8 well incorrectly assuming that there's 'the' invariant
    mapping between languages and MIME charset/encodings(e.g. for French,
    use ISO-8859-15/1 or Windows-1252, for Japanese ISO-2022-JP). Therefore,
    even though major MUAs have no problem with UTF-8 emails, some people
    get reluctant to send all their outgoing emails in UTF-8 for fear that
    their correspondents with web mail accounts won't be able to read them
    without some 'user-intervention'.

         Jungshik Shin



    This archive was generated by hypermail 2.1.5 : Tue Dec 17 2002 - 11:22:43 EST