I am interested in a clarification of the following:
Line ends in Unicode may be unambiguously coded with LS (Line Separator,
U+2028) and PS (Paragraph Separator, U+2029) characters, see TR 13.
This means for emails in UTF-8, that they may not be "well-formed" because
they may not contain CR (13) and/or LF (10) ASCII line ends.
I believe there are (at least) three ways to deal with this, and I would
like to know which one(s) is (are) recommended or used:
1) Disregard TR13 for emails and write only ASCII-style (LF, CR, CRLF)
2) Write Unicode email bodies with a modified or new encoding that breaks
lines with LF...
that are not part of the Unicode text, and encode the text itself with:
2a) disregard the minimum-length rule for UTF-8 and encode U+0000 to U+001f
(otherwise UTF-8-compliant) two-byte codes
2b) binary/base64-encoded UTF-16
2c) create an email-only variable-length encoding with 7 bits/email-byte
3) Do not use LS and PS but instead require Unicode email bodies to use
HTML or similar, and use <br> and <p> ;
similar to (2), old-style line ends are inserted only for the sake of
protocol-conformance and are not part of the displayed text
I guess that (1) and (3) would be the most popular choices.
Markus Scherer IBM RTP +1 919 486 1135 Dept. Fax +1 919 254 6430
Unicode is here! --> http://www.unicode.org/
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT