From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Tue Nov 23 2004 - 10:16:37 CST
Kefas,
you have written:
> I tried UTF-8 export to send an e-mail that contained
> several scattered unicode codepoints from the full
> 16-bit range from oooo to ffff from XP+Word to the
> university's Linux/Mozilla/OpenOffice/Kmail, enabled
> UTF-8 support. With very disappointing results.
For UTF-8 (or any other encoding except ISO 646 IRV
(aka ASCII)) to survive the transport via e-mail
(RFC 2821), it must be tagged and "transfer-encoded",
according to RFC 2045, and RFC 2047. For examples, cf.
<http://www.systems.uni-konstanz.de/EMAIL/FAQ-SMTP.php#74>
(in German). It is the e-mail clients' responsibilty
to do this tagging and encoding (on the sending side),
and the corresponding interpretation and decoding (on
the receiving side).
You have not mentioned, which e-mail client program
you have used, how it was configured, nor what the
result looked like. Hence, the cause of your "very
disappointing results" cannot be derived (nor even
guessed at).
> 1. Do I expect too much assuming that UTF-8 just
> recodes the full 16-range in 8-bit but that
> text-programs with UTF-8 enabled should be able to
> reconstruct the full 16-bit range (as far as used)?
The Unicode range is much more than 16 bit (you need 21 bits
per character, but all 21 bit values are not used).
UTF-8 encodes every single character in 1 through 4 bytes;
cf. <http://www.unicode.org/faq/utf_bom.html> for more
details. I do not understand what you mean by "recon-
struct", but I guess your question is answered in the
cited WWW page.
Best wishes,
Otto Stolz
This archive was generated by hypermail 2.1.5 : Tue Nov 23 2004 - 10:21:57 CST