Re: utf-8 and unicode fonts on LINUX

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Tue Nov 23 2004 - 10:16:37 CST

  • Next message: Addison Phillips [wM]: "RE: My Querry"

    Kefas,

    you have written:
    > I tried UTF-8 export to send an e-mail that contained
    > several scattered unicode codepoints from the full
    > 16-bit range from oooo to ffff from XP+Word to the
    > university's Linux/Mozilla/OpenOffice/Kmail, enabled
    > UTF-8 support. With very disappointing results.

    For UTF-8 (or any other encoding except ISO 646 IRV
    (aka ASCII)) to survive the transport via e-mail
    (RFC 2821), it must be tagged and "transfer-encoded",
    according to RFC 2045, and RFC 2047. For examples, cf.
    <http://www.systems.uni-konstanz.de/EMAIL/FAQ-SMTP.php#74>
    (in German). It is the e-mail clients' responsibilty
    to do this tagging and encoding (on the sending side),
    and the corresponding interpretation and decoding (on
    the receiving side).

    You have not mentioned, which e-mail client program
    you have used, how it was configured, nor what the
    result looked like. Hence, the cause of your "very
    disappointing results" cannot be derived (nor even
    guessed at).

    > 1. Do I expect too much assuming that UTF-8 just
    > recodes the full 16-range in 8-bit but that
    > text-programs with UTF-8 enabled should be able to
    > reconstruct the full 16-bit range (as far as used)?

    The Unicode range is much more than 16 bit (you need 21 bits
    per character, but all 21 bit values are not used).
    UTF-8 encodes every single character in 1 through 4 bytes;
    cf. <http://www.unicode.org/faq/utf_bom.html> for more
    details. I do not understand what you mean by "recon-
    struct", but I guess your question is answered in the
    cited WWW page.

    Best wishes,
       Otto Stolz



    This archive was generated by hypermail 2.1.5 : Tue Nov 23 2004 - 10:21:57 CST