The answer is that wchar_t is not necessarily Unicode, and that Unicode is
not necessarily stored in 16-bit units.
ANSI C defines wchar_t as an abstract type for "wide" characters but does
not specify a concrete type nor a character set for it. On some platforms,
it is Unicode, on others, it is a scalar form of the platform default MBCS.
Unicode, on the other hand, is a character set standard that allows several
The most important ones are UTF-8, UTF-16, and UTF-32. They are stored
using 8-, 16-, or 32-bit integers (unsigned chars, shorts, and ints - or
longs where those are 32b).
Relying on wchar_t to be anything fixed across platforms will not work.
"Magda Danish (Unicode)" <firstname.lastname@example.org> on 99-11-02 08:42:15
To: "Unicode List" <email@example.com>
Subject: FW: unicode character on Different Unix platforms ....
From: Shrinivas Kulkarni [mailto:Shrinivas_Kulkarni@i2.com]
Sent: Tuesday, November 02, 1999 6:14 AM
Subject: unicode character on Different Unix platforms ....
Here is a query on Unicode.
I am building an application, which reads a multi byte character string
The application converts this MBCS string to unicode string and writes it
a dbf file.
The application has to work on Sun Solaris, HP-UX and AIX .
It works fine on AIX and NT.
I use wchar_t to define a wide char (unicode) string.
On Sun solaris, wchar_t is defined as unsigned long and on HP -UX it is
as unsigned int.
Is my basic assumption that unicode character is of 2 bytes wide right ?
then how is that
different OSs define their own definition of wchar_t ?
waiting for your reply.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT