RE: Communicator Unicode

From: Martin J. Dürst (mduerst@ifi.unizh.ch)
Date: Fri Sep 26 1997 - 07:27:56 EDT


On Thu, 25 Sep 1997, Murray Sargent wrote:

> It's disturbing to think that people may use more than one charset in an
> HTML document. We went this route with RTF, since Unicode wasn't
> available back then to solve the multilingual problem. HTML has grown
> up in a new world in which Unicode is clearly the best way to represent
> multilingual documents. Hence for such documents (and others), all you
> need is UTF8. The HTML standards committees should resist allowing
> charset atttibutes to be specified at a granularity finer than the body
> and browsers should fail to display such documents correctly, since even
> if the spec says no, lax browsers may establish the practice anyhow.

We have resisted quite strongly, and successfully. What Alain wrote
about is email, not HTML, and headers vs. bodies, and composite
bodies. For these, resisting is more difficult, as the standards
already allow more flexibility. In particular, it is possible
to mix words in different encodings in a single header line, with
RFC 2047 (formerly RFC 1522). For single bodies, only one encoding
can be used, but for multiparts, each part can have it's own encoding.
Of course, there are many people who try to steer that towards UTF-8,
and every support in that direction is helpful.

Regards, Martin.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT