Re: Communicator Unicode

From: Martin J. Dürst (
Date: Tue Sep 30 1997 - 07:34:51 EDT

On Thu, 25 Sep 1997, Alain LaBonti SCT wrote:

> Again, what I propose is to overload an external tag already used for the
> text.
> Declaring that all of a sudden current 8-bit coding, untagged, is UTF-8
> (for which support I am all in favour, of course, if it is tagged), would
> disrupt current practice that works well and could easily work better when
> different encodings are used between the sender and the recipient. Again,
> the only coding unaffected by assuming that 8-bit data is UTF-8 would be
> 7-bit ASCII. To this I am opposed.

For backwards compatibility, it is easy to heuristically distinguish
UTF-8 and other encodings, because UTF-8 has a very clear and distinctive
structure. So existing practice can continue.

> To get is straight:
> 1. I am strongly in favour of tags, although they shall be external to
> text (MIME is that way, except when it is imbedded in text like as
> per RFC 1522).
> 2. I would like untagged header data to be aligned with the most
> likely coding, given by the first text character set tag
> encountered in a message.

This would just be an official recognition of a currently not allowed
practice. It would introduce all kinds of cludge dependencies that would
be difficult to program. It would gratify those that didn't respect the
standards (for whatever reasons), and punish those that did. It would
also, and that is most important, block further development into a
direction that makes things even simpler (namely using UTF-8 only in
headers). There is no need for labeling if there is only one thing.

I have nothing againts a mailer actually using some heuristics to deal
with nonconformant received mails to try to figure out what encoding
the 8-bit data in the headers might be. I would probably do this myself
if I was working on an internationalized mailer. This might well turn
out to look very much like what you propose. But I am clearly opposed
against any kind of standardization that implies that it would be
okay to send untagged 8-bit headers other than UTF-8.

Regards, Martin.

