On Monday, June 11, 2001 4:14 AM, Vadim Snurnikov wrote: > How can I read a text in Unicode (Russian) where every Russian letter > is represented like that: D=B6 (or similar)? Unfortunately, all these > four characters that stand for one Russian letter are of one byte each, > so that I am getting 4 bytes for every Russian letter. (The e-mail got > transferred to this format.) The format of an E-Mail message should be described in its MIME headers. E. g., in a message containing the headers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable the text is encoded twice: the Unicode characters are first encoded in UTF-8 (which has 8-bit coding units), and then the result is encoded in MIME quoted-printable (which has 7-bit coding units). Thus, the Russian word "Ya " (meaning "I ") ends up in 7 bytes, viz. "=D0=AF ". To interpret this, you have to undo the two encodings, in reverse order: 1. Undo the quoted-printable; from the above example, this will yield three bytes, viz. D0 AF 20. 2. Undo the UTF-8 encoding; in the example, you'll get U+042F U+0020 (Cyrillic Capital Ya, Space). Depending on the encodings chosen, the details may vary, of course. Note that cyrillic has several popular encodings (in addition to Unicode), cf. . All of these would look superficially alike, in MIME quoted-printable encoding. The "=" is characteristic for quoted-printable; however UTF-8 followed by quoted-printable will yield 6 bytes for every cyrillic character (as "=D0=AF", in the example given), and 1 byte for every ASCII character (as the space in the example given), and former cyrillic encodings (cf. supra) followed by quoted printable will encode cyrillic character in 3 bytes each (and again ASCII ones in 1 byte each). I am not aware of any e-mail encoding scheme that will encode cyrillic characters in four bytes each, such as the "D=B6" sequence originally quoted. > Is there a tool to transfer this back into 2-byte-encoding or to any > other readable form? Every decent, contemporary e-mail client should do this automagically, provided the headers have not been removed, or the mail has been other- wise distorted. (Deplorably, all WWW-mail servers I have tested so far remove the MIME headers before they have properly undone both encodings, thus corrupting the message in one way or the other.) So my advice is: - install Unicode fonts, comprising at least the WGL4 repertoire (cf. ), - collect your mail from a POP3, or an IMAP, server (not from a HTTP server via some mail-WWW interface), - use the current version of your favourite e-mail client. I have tested · Messenger from Netscape 6.0, and it does it right, though it exhibited some teething troubles; · Eudora 5.1, which is not even capable of displaying cyrillic text from the Windows 98 clipboard, its doc has nothing whatsoever on UTF-8 or Unicode or charsets, and its menus do not mention character encoding; hence, it probably does not interpret UTF-8 encoded messages, either (which I cannot test tonight). I have not yet tested: · Outlook from Internet Explorer 5 (which is promising, as its browser has the most thorough UTF-8 support I have seen so far). Best wishes, Otto Stolz