Re: UTF-7 is dead

From: Pete Resnick (presnick@qualcomm.com)
Date: Wed May 26 1999 - 18:58:01 EDT


On 5/26/99 at 3:21 PM -0700, Alain wrote:

>Many people (I have remarked this on several occasions) confuse MIME and QP
>or MIME and BASE64 (even some software pieces confuse them even if they are
>not synonymous).

Right on target.

>What is a flaw in MIME though, imho, is the lack of external MIME
>tag for non-7-bit headers

One of the big problems here is that there are still a number of SMTP
servers which do downright horrible things with 8-bit characters in
headers.

>I made suggestions to several software makers that they should assume that
>the encoding of non-7-bit headers be assumed to be the same as the first
>text charset identified in a multi-part MIME message but I was qualified of
>being heretic by some very respected colleagues, because it would
>jeopardize the current goal to have the assumption be made that any 8-bit
>encoding in headers be that of UTF-8 in the future (given the asbence of
>tags).

Actually, if I was one of the complainers, it wasn't because it would
jeopardize UTF-8. It's because it's a nasty layer violation.

First of all, trying to figure out what kind of data you're dealing
with by looking inside the data is grotesque. Imagine some day we can
have UTF-16 flying around on the net. Given your approach, headers
would have to be limited to 8-bit because you'd have no way to
interpret the charset parameter line without assuming it's ASCII.

Second, headers have a way of getting re-ordered. If an 8-bit line
comes through first, you'd have no to know how to interpret it until
you got down to the field that told you the charset.

The correct solution here is to make an ESMTP extension which announces:

1. Character encoding size of the message header (7bit, 8bit, 16bit, etc.)
2. Charset interpretation for header fields.

That way, the information is discovered up front before any
interpretation needs to be done. Of course, there will also have to
be POP and IMAP extensions so this information can be passed down.
But storing data interpretation information in the data itself is a
bad idea. We've tried that sort of thing before and people inevitably
screwed it up.

>Sorry if I still look heretic.

I don't think you're heretical; I think you're technically mistaken.
I am glad to join you in being a heretic in thinking that this stuff
needs to be marked up explicitly in the stream and not left as "It's
8bit, so it must be UTF-8." I think that's a lousy approach too.

pr

-- 
Pete Resnick <mailto:presnick@qualcomm.com>
Eudora Engineering - QUALCOMM Incorporated
Ph: (217)337-6377 or (619)651-4478, Fax: (619)651-1102



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT