Re: emails: utf-8 vs. ls & ps

From: Erik van der Poel (
Date: Mon Feb 15 1999 - 14:19:13 EST wrote:
> Line ends in Unicode may be unambiguously coded with LS (Line Separator,
> U+2028) and PS (Paragraph Separator, U+2029) characters, see TR 13.
> This means for emails in UTF-8, that they may not be "well-formed" because
> they may not contain CR (13) and/or LF (10) ASCII line ends.
> I believe there are (at least) three ways to deal with this, and I would
> like to know which one(s) is (are) recommended or used:
> 1) Disregard TR13 for emails and write only ASCII-style (LF, CR, CRLF)
> line ends.

We (Netscape) offer this as one option. I.e. plain text (non-HTML) UTF-8
or UTF-7 with CRLF for newline, as required by SMTP. The user can also
select HTML, or several other encodings (e.g. iso-8859-1).

> 2) Write Unicode email bodies with a modified or new encoding that breaks
> lines with LF...
> that are not part of the Unicode text, and encode the text itself with:
> 2a) disregard the minimum-length rule for UTF-8 and encode U+0000 to U+001f
> with
> (otherwise UTF-8-compliant) two-byte codes

I don't understand this.

> 2b) binary/base64-encoded UTF-16

Certainly seems legal from the MIME standpoint, but probably won't be
popular for a while.

> 2c) create an email-only variable-length encoding with 7 bits/email-byte

This already exists. It is called UTF-7.

> 2d) ?
> 3) Do not use LS and PS but instead require Unicode email bodies to use
> HTML or similar, and use <br> and <p> ;
> similar to (2), old-style line ends are inserted only for the sake of
> protocol-conformance and are not part of the displayed text

As I said, we offer this as an option. HTML may even be the default. (I
don't remember.)

> I guess that (1) and (3) would be the most popular choices.

Currently, non-Unicode-based encodings are the most popular. And plain
text is probably still the most popular. Both of these may change,


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT