Re: Unicode support

From: Neil Harris (neil@tonal.clara.co.uk)
Date: Wed Jul 27 2005 - 10:54:43 CDT

  • Next message: Asmus Freytag: "Re: Unicode support"

    Tunga, Prasad wrote:

    >I have an application (written in 'C') which currently reads and manipulates ASCII strings. However I would like to it convert it so that it can read Unicode strings.
    >What are the basic things I should be looking at to make it compatible with Unicode..?
    >
    >
    >
    >This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
    >
    >
    >
    >
    >
    >
    >
    There's a difference between Unicode, and transformation formats of
    Unicode. For many purposes, just using UTF-8 and treating the text in
    the traditional way works just fine. However, if you want to manipulate
    Unicode characters directly, you will need to use arrays of "wide"
    characters, and have codecs which translate to/from concrete Unicode
    representations and sequences of Unicode code points (which are what you
    store in the "wide" character data type). You will need to know the
    encodings used/expected on input and output to choose the correct codecs.

    See
    http://www.gnu.org/software/libc/manual/html_node/Extended-Char-Intro.html
    for some more infomation on wide characters in C. 4 byte wide characters
    are best, because that way you can have a true 1:1 relationship between
    all possible Unicode code points, even those outside the Basic
    Multilingual Plane, and the values stored in the wide characters.

    GNU libiconv http://www.gnu.org/software/libiconv/ is helpful, too.

    -- Neil



    This archive was generated by hypermail 2.1.5 : Wed Jul 27 2005 - 10:57:28 CDT