Re: Unicode support

From: Neil Harris (
Date: Wed Jul 27 2005 - 10:54:43 CDT

    Tunga, Prasad wrote:

    >I have an application (written in 'C') which currently reads and manipulates ASCII strings. However I would like to it convert it so that it can read Unicode strings.
    >What are the basic things I should be looking at to make it compatible with Unicode..?
    There's a difference between Unicode, and transformation formats of
    Unicode. For many purposes, just using UTF-8 and treating the text in
    the traditional way works just fine. However, if you want to manipulate
    Unicode characters directly, you will need to use arrays of "wide"
    characters, and have codecs which translate to/from concrete Unicode
    representations and sequences of Unicode code points (which are what you
    store in the "wide" character data type). You will need to know the
    encodings used/expected on input and output to choose the correct codecs.

    for some more infomation on wide characters in C. 4 byte wide characters
    are best, because that way you can have a true 1:1 relationship between
    all possible Unicode code points, even those outside the Basic
    Multilingual Plane, and the values stored in the wide characters.

    GNU libiconv is helpful, too.

    -- Neil

