RE: FAQ !?

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Fri Dec 15 2000 - 11:18:19 EST

Next message: Misha Wolf: ""Unicode in XML and other Markup Languages" published today"
Previous message: Otto Stolz: "Re: Transcriptions of "Unicode""
Maybe in reply to: Nitin_Goel@i2.com: "FAQ !?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Markus,

Unfortunately if I remember correctly, Sun is one of those that has wchar_t
that is not Unicode.

Carl

-----Original Message-----
From: Markus Scherer [mailto:markus.scherer@jtcsv.com]
Sent: Wednesday, December 13, 2000 9:17 AM
To: Unicode List
Subject: Re: FAQ !?

Nitin_Goel@i2.com wrote:
> I guess this should be a FAQ (but is'nt). I need code to convert unicode
> data between
> various encoding schemes (UTF16LE to UTF32BE etc...). Are there standard
> routines
> I can use ? If so, where can I find them ?

The CD for the Unicode book should have some of this - in any case, these
transformations are fairly simple.

Unicode libraries have it, see
http://www.unicode.org/unicode/onlinedat/products.html
For example, see ICU at http://oss.software.ibm.com/icu/ - see documentation
and source code for converters and UTF macros in
icu/source/common/unicode/utf.h

> As an aside. I have run into trouble porting a database application which
> stores UTF16LE
> data onto HPUX and SUN machines. I can see that wchar_t there is defined
as
> unsigned long.
> So most probably all wcs*() functions would expect UTF32 encoded data. Am
I
> correct in my
> assumption ? What do I do to be certain ?

wchar_t is a very fuzzy type. It may be 8, 16, or 32 bits depending on the
platform, and there is no general guarantee that it stores Unicode. Most
older systems use it for scalar character code points custom-built for the
char* encoding.

> What online information can I
> look through for
> more information on such a problem ?

About wchar_t and Unicode, see "What size wchar_t do I need for Unicode?" at
http://www-4.ibm.com/software/developer/library/uniwchar.html

To be sure, you can use typedefs that are always what you want. ICU and
other libraries define types for string units and scalar code points that
work on all platforms, and they provide functions to work with such Unicode
strings and characters.

Good luck,
markus

Next message: Misha Wolf: ""Unicode in XML and other Markup Languages" published today"
Previous message: Otto Stolz: "Re: Transcriptions of "Unicode""
Maybe in reply to: Nitin_Goel@i2.com: "FAQ !?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT