From: Neil Harris (email@example.com)
Date: Wed Jul 27 2005 - 10:54:43 CDT
Tunga, Prasad wrote:
>I have an application (written in 'C') which currently reads and manipulates ASCII strings. However I would like to it convert it so that it can read Unicode strings.
>What are the basic things I should be looking at to make it compatible with Unicode..?
>This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
There's a difference between Unicode, and transformation formats of
Unicode. For many purposes, just using UTF-8 and treating the text in
the traditional way works just fine. However, if you want to manipulate
Unicode characters directly, you will need to use arrays of "wide"
characters, and have codecs which translate to/from concrete Unicode
representations and sequences of Unicode code points (which are what you
store in the "wide" character data type). You will need to know the
encodings used/expected on input and output to choose the correct codecs.
for some more infomation on wide characters in C. 4 byte wide characters
are best, because that way you can have a true 1:1 relationship between
all possible Unicode code points, even those outside the Basic
Multilingual Plane, and the values stored in the wide characters.
GNU libiconv http://www.gnu.org/software/libiconv/ is helpful, too.
This archive was generated by hypermail 2.1.5 : Wed Jul 27 2005 - 10:57:28 CDT