Re: (Not really?) Unicode question

From: Stephane Bortzmeyer (bortzmeyer@nic.fr)
Date: Wed Sep 27 2006 - 01:53:50 CST

  • Next message: Jefsey_Morfin: "Re: Unicode & space in programming & l10n"

    On Tue, Sep 26, 2006 at 12:53:36PM -0700,
     Magda Danish (Unicode) <v-magdad@microsoft.com> wrote
     a message of 46 lines which said:

    > Below is an inquiry I received through the Unicode office's
    > reporting form.

    It does not seem to be really Unicode-related. If only "legacy"
    charsets existed, we would have the same problem. It would disappear
    only if everyone used Unicode *and* the same encoding.

    > Along with Spiro's questions below,

    The thread on the KDE bug tracking system seems to be comprehensive
    and the kmail people replied to all his questions, I believe.

    > - Why is it that some emails in < any foreign script> display
    > correctly while others just appear as squares and interrogation
    > marks?

    Because, in a world where there are many character sets and many
    encodings, an email MUST be tagged with the proper charset (MIME calls
    "charset" what is actually an encoding). If it is not properly tagged,
    it will not be displayed properly. If it is untagged, it depends on
    some local default (ISO 8859-1 in my case, for instance).

    > - I usually change the encoding to Arabic-Windows in order to view
    > my Arabic emails, which seems to work almost all the time; But
    > sometimes doesn't. Why is that?

    I do not speak Arabic but I assume that some messages are in
    Arabic-Windows (windows-1256), and some in Unicode. If they are
    untagged, changing the local default allows you to see some messages
    but not all.

    > - If I change the encoding of an email to view it and then forward
    > it to a friend, will my friend still be able to view it?

    Typically (but I assume it depends on the MUA), changing the encoding
    used for viewing does not change the message. But "forwarding" is
    somethign else: it is not well specified, every MUA does it
    differently.

    > - What should my setup and that of my correpondent be if we want to
    > insure proper display and communication of our non-english messages?

    First thing is to use a charset and encoding which allows to encode
    Arabic characters (on this list, everyone will scream "Use Unicode!"
    :-)

    Second thing is to tag the message properly, according to RFC 2046,
    4.1.2. "Charset Parameter":

    Content-type: text/plain; charset=utf-8

    or:

    Content-type: text/plain; charset=windows-1256

    (Of course, this should be done automatically and properly by the MUA)

    Third thing is for the recipient to have a MUA which handles this
    charset.



    This archive was generated by hypermail 2.1.5 : Wed Sep 27 2006 - 01:55:52 CST