Re: FW: unicode character on Different Unix platforms ....

From: schererm@us.ibm.com
Date: Tue Nov 02 1999 - 13:11:30 EST

Next message: Valeriy E. Ushakov: "Re: FW: unicode character on Different Unix platforms ...."
Previous message: Chookij Vanatham: "RE: handwritten Arabic [was: arabic number in bidi algorithm]"
Maybe in reply to: Magda Danish (Unicode): "FW: unicode character on Different Unix platforms ...."
Next in thread: Valeriy E. Ushakov: "Re: FW: unicode character on Different Unix platforms ...."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

The answer is that wchar_t is not necessarily Unicode, and that Unicode is
not necessarily stored in 16-bit units.

ANSI C defines wchar_t as an abstract type for "wide" characters but does
not specify a concrete type nor a character set for it. On some platforms,
it is Unicode, on others, it is a scalar form of the platform default MBCS.

Unicode, on the other hand, is a character set standard that allows several
encodings.
The most important ones are UTF-8, UTF-16, and UTF-32. They are stored
using 8-, 16-, or 32-bit integers (unsigned chars, shorts, and ints - or
longs where those are 32b).

Relying on wchar_t to be anything fixed across platforms will not work.

markus

"Magda Danish (Unicode)" <v-magdad@microsoft.com> on 99-11-02 08:42:15

To: "Unicode List" <unicode@unicode.org>
cc:
Subject: FW: unicode character on Different Unix platforms ....

-----Original Message-----
From: Shrinivas Kulkarni [mailto:Shrinivas_Kulkarni@i2.com]
Sent: Tuesday, November 02, 1999 6:14 AM
To: info@unicode.org
Subject: unicode character on Different Unix platforms ....

Hi,
Here is a query on Unicode.
I am building an application, which reads a multi byte character string
from
a
text file.
The application converts this MBCS string to unicode string and writes it
into
a dbf file.
The application has to work on Sun Solaris, HP-UX and AIX .
It works fine on AIX and NT.
I use wchar_t to define a wide char (unicode) string.
On Sun solaris, wchar_t is defined as unsigned long and on HP -UX it is
defined
as unsigned int.

Is my basic assumption that unicode character is of 2 bytes wide right ?
If so
then how is that
different OSs define their own definition of wchar_t ?

waiting for your reply.

Regards
Shrinivas

Next message: Valeriy E. Ushakov: "Re: FW: unicode character on Different Unix platforms ...."
Previous message: Chookij Vanatham: "RE: handwritten Arabic [was: arabic number in bidi algorithm]"
Maybe in reply to: Magda Danish (Unicode): "FW: unicode character on Different Unix platforms ...."
Next in thread: Valeriy E. Ushakov: "Re: FW: unicode character on Different Unix platforms ...."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT