RE: Communicator Unicode

From: Murray Sargent (
Date: Thu Sep 25 1997 - 15:44:53 EDT

It's disturbing to think that people may use more than one charset in an
HTML document. We went this route with RTF, since Unicode wasn't
available back then to solve the multilingual problem. HTML has grown
up in a new world in which Unicode is clearly the best way to represent
multilingual documents. Hence for such documents (and others), all you
need is UTF8. The HTML standards committees should resist allowing
charset atttibutes to be specified at a granularity finer than the body
and browsers should fail to display such documents correctly, since even
if the spec says no, lax browsers may establish the practice anyhow.


> -----Original Message-----
> From: John Gardiner Myers []
> Sent: Wednesday, September 24, 1997 11:58 AM
> To: Multiple Recipients of
> Subject: Re: Communicator Unicode
> Alain LaBonti - SCT wrote:
> > It seems to me that, even if it might appear heretic, one should
> assume, in
> > absence of tags for headers, that the character set used is the same
> as the
> > message body's.
> Your use of the word "the" indicates that you're assuming there is
> exactly one charset tag in the body to choose from. Such is not the
> case.
> The body might be a multipart/mixed or some other type which does not
> have a charset. In the case of multipart/mixed, one might start
> parsing
> the multipart/mixed in search of a charset tag on some part at a lower
> level, but there might be more than one such tag and the tags might be
> different.
> Seems to me you're proposing violating an abstraction layer or two in
> order to fish for information which in the general case cannot be
> deterministically found. This is not good engineering.
> > With GUI presentation of messages it would not be a major problem,
> as
> > anyway the message is never displayed before full reception. For non
> GUIs,
> > it would be just too bad, but certainly not worse than today. Right
> now,
> > because of a dogma, it is wrong in all environments that do not
> cheat.
> Right now, it is right in environments which use MIME encoded-words.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT