Re: Is there Unicode mail out there?

From: Gaute B Strokkenes (gs234@cam.ac.uk)
Date: Sat Jul 14 2001 - 07:57:27 EDT


On Sat, 14 Jul 2001, dstarner98@aasaa.ofe.org wrote:
> From: Gaute B Strokkenes <gs234@cam.ac.uk>
>> No way. Any mail client that is sufficiently clever to understand
>> UTF-8 should understand all valid and registered MIME-charsets.
>> After all, conversion libraries are both widely available and easy
>> to use.
>
> Do you know of any that actually do?

Actually do convert messages in arbitrary charsets to UTF-8 / Unicode,
you mean? Any reasonably modern mail client will. IIRC Microsoft OE
and friends do everything in Unicode internally and only convert to
other encodings when receiving or sending mail. (Though OE is broken
in so many other ways that I wouldn't recommend it.) Gnus/Emacs does
too (actually it uses the Emacs MULE encoding internally, but from the
users perspective the effect is precisely the same).

My argument is based on the fact that if you have put in the necessary
work to interpret UTF-8 messages, then it does not take at all that
much extra effort to interpret messages in other charsets by running
them through a converter first. I postulate that libraries to perform
this function are both widely available and highly portable; if you do
not agree then I would be happy to point out concrete examples.

> How about just supporting these: ISO646-PT, ISO10646-UTF-1,
> NATS-SEFI and HP-DeskTop?

I'm not sure what you're trying to say here. Assuming these are
properly registered charsets, it seems like a very narrow range to
support. If they're not, then they have no place in email whatsoever
(and UTF-8 is clearly a better choice.)

> I don't think anyone was suggesting that for all lists. However,
> here, on the Unicode list, everyone on the list should be able to
> handle Unicode, and those who can have sometimes been willing to cut
> and paste into a Unicode editor just to see what's up.

I don't think that holds. People on the unicode list are not
necessarily Unicode boffins, although a lot of the active people are.
Some of us are just here because we have an interest in, say, i18n in
general and like to keep an eye on things. If we all had to upgrade
our software to do so, I think a lot of people just wouldn't bother.
That way, everyone loses.

Note that I think it is appropriate to use UTF-8 when there's just no
common charset that can represent a given message.

> Legacy encodings should be used when you're communicating with
> people who use legacy encodings and legacy mail readers. Unicode
> people don't - after ASCII, UTF-8 is probably the closest thing we
> have to a common usable encoding.

It's the closest thing that we have to a common _universal_ charset.
For messages that do not require the `universal' property, there are
many charsets that are just as sensible and, more to the point, much
better supported.

-- 
Gaute Strokkenes                        http://www.srcf.ucam.org/~gs234/
Yow!  Am I in Milwaukee?



This archive was generated by hypermail 2.1.2 : Sat Jul 14 2001 - 09:27:29 EDT