Re: Communicator Unicode

From: Alain LaBont\i SCT (
Date: Wed Oct 01 1997 - 10:21:30 EDT

A 03:34 97-09-30 -0700, Martin J. Dürst a écrit :
>On Thu, 25 Sep 1997, Alain LaBonté SCT wrote:
>> Again, what I propose is to overload an external tag already used for the
>> text.
>> Declaring that all of a sudden current 8-bit coding, untagged, is UTF-8
>> (for which support I am all in favour, of course, if it is tagged), would
>> disrupt current practice that works well and could easily work better when
>> different encodings are used between the sender and the recipient. Again,
>> the only coding unaffected by assuming that 8-bit data is UTF-8 would be
>> 7-bit ASCII. To this I am opposed.
>For backwards compatibility, it is easy to heuristically distinguish
>UTF-8 and other encodings, because UTF-8 has a very clear and distinctive
>structure. So existing practice can continue.
>> To get is straight:
>> 1. I am strongly in favour of tags, although they shall be external to
>> text (MIME is that way, except when it is imbedded in text like as
>> per RFC 1522).
>> 2. I would like untagged header data to be aligned with the most
>> likely coding, given by the first text character set tag
>> encountered in a message.

[Martin] :
>This would just be an official recognition of a currently not allowed
>practice. It would introduce all kinds of cludge dependencies that would
>be difficult to program. It would gratify those that didn't respect the
>standards (for whatever reasons), and punish those that did. It would
>also, and that is most important, block further development into a
>direction that makes things even simpler (namely using UTF-8 only in
>headers). There is no need for labeling if there is only one thing.
>I have nothing againts a mailer actually using some heuristics to deal
>with nonconformant received mails to try to figure out what encoding
>the 8-bit data in the headers might be. I would probably do this myself
>if I was working on an internationalized mailer. This might well turn
>out to look very much like what you propose. But I am clearly opposed
>against any kind of standardization that implies that it would be
>okay to send untagged 8-bit headers other than UTF-8.

[Alain] :
I am of course very sensitive to your argument saying that this would be
punitive to those who respected the standard, if it is really the case. But
is it? Right now recipients who follow the standard (or users who are
forced to respect a standard because they're stuck with a poor mailer) are
already punished anyway because they deprive themselves of an accurate
interpretation that would be easy to guess, as you yourself say that you
would do if you had the opportunity. So it is more than a standard, it is a
dogma that is really annoying everybody. Such standards should be
corrected, or at least guidelines be given to adapt them softly, in
particular because the practice to use 8-bit characters in headers is
spread worldwide already, just because it makes sense (but it is also done
by end-users who have no idea that what they are doing is a sin against
"good engineering design"!)

That would be a goodie to recommend the approach I suggested as you
yourself say it makes sense to use heuristics to do it. So why not at least
recommend it, it seems that some would appreciate such a guideline if they
did not have the good idea to do implement it already?

Another approach would also be to allow tagging standard 8-bit character
sets totally in front of a full string, which is apparently not the case
today (those who use more efficient 8-bit coding are punished everyday,
even if they also use old ISO standards [ISO/IEC 8859 series began to be
adopted in 1987, at a time when even the"/IEC" was not part of the names of
the IT standards]!)

Even if today I wanted to do (I invent syntax slightly here, but it is just
an illustration):
              (=?iso-8859-1?Alain LaBonté=)
or even:
              (=?UTF-8?Alain LaBonté=)

...I would not even be allowed to do it! That's a sin too!

Alain LaBonté

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT