Re: FAQ !?

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Wed Dec 13 2000 - 12:38:46 EST


Nitin_Goel@i2.com wrote:
> I guess this should be a FAQ (but is'nt). I need code to convert unicode
> data between
> various encoding schemes (UTF16LE to UTF32BE etc...). Are there standard
> routines
> I can use ? If so, where can I find them ?

The CD for the Unicode book should have some of this - in any case, these transformations are fairly simple.

Unicode libraries have it, see http://www.unicode.org/unicode/onlinedat/products.html
For example, see ICU at http://oss.software.ibm.com/icu/ - see documentation and source code for converters and UTF macros in icu/source/common/unicode/utf.h

> As an aside. I have run into trouble porting a database application which
> stores UTF16LE
> data onto HPUX and SUN machines. I can see that wchar_t there is defined as
> unsigned long.
> So most probably all wcs*() functions would expect UTF32 encoded data. Am I
> correct in my
> assumption ? What do I do to be certain ?

wchar_t is a very fuzzy type. It may be 8, 16, or 32 bits depending on the platform, and there is no general guarantee that it stores Unicode. Most older systems use it for scalar character code points custom-built for the char* encoding.

> What online information can I
> look through for
> more information on such a problem ?

About wchar_t and Unicode, see "What size wchar_t do I need for Unicode?" at http://www-4.ibm.com/software/developer/library/uniwchar.html

To be sure, you can use typedefs that are always what you want. ICU and other libraries define types for string units and scalar code points that work on all platforms, and they provide functions to work with such Unicode strings and characters.

Good luck,
markus



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT