From: Rick Cameron (Rick.Cameron@businessobjects.com)
Date: Mon Mar 01 2004 - 16:59:06 EST
OK, I guess I need to be more precise in my question.
For each of the popular unices (Solaris, HP-UX, AIX, and - if possible -
linux), can anyone answer the following question:
Assuming that the locale is set to Unicode, what is in a wchar_t string? Is
it UTF-32 or pseudo-UTF-16 (i.e. UTF-16 code units, zero-extended to 32
bits)?
I'm not expecting that there's single answer for all the unices of interest.
And I'm well aware that our application can store in a wchar_t [] whatever
it wants. I'm trying to find out what the O/S expects to be in a wchar_t
string.
The reason we want to know this is that we want to be able to write a
function that converts from UTF-8 (stored in a char []) to wchar_t []
properly. Obviously the function may need to behave differently on different
flavours of unix.
I am aware of the utility functions offered by TUC to perform conversions
between UTF-8, UTF-16 and UTF-32. These functions do not handle the case of
pseudo-UTF-16; which doesn't surprise me, since AFAIK it's not a conformant
encoding form. Nonetheless, I have a string suspicion that some unices may
use it.
Cheers
- rick cameron
_____
From: Frank Yung-Fong Tang [mailto:ytang0648@aol.com]
Sent: March 1, 2004 12:48
To: Rick Cameron
Cc: unicode@unicode.org
Subject: Re: What's in a wchar_t string on unix?
I
Rick Cameron wrote on 3/1/2004, 2:13 PM:
Hi, all
This may be an FAQ, but I couldn't find the answer on unicode.org.
The reason is there are "NO answer" to the question you ask.
It seems that most flavours of unix define wchar_t to be 4 bytes.
Depend on which UNIX and which version. Depend on how you define "most
flavours"
If the locale is set to be Unicode, what's in a wchar_t string?
No answer for that because
1) ANSI C standard does not define it. (neither it's size nor it's content)
2) Several organization try to establish standard for Unix. One of that is
"The Open Group"'s "Base Specifications" IEEE Std 1003.1, 2003. But neither
that define what should wchar_t hold.
Is it UTF-32, or UTF-16 with the code units zero-extended to 4 bytes?
Cheers
- rick cameron
The more interesting question is, why do you need to know the answer of your
question. And the ANSI/C wchar_t model basically suggest, if you ask that
question, you are moving to a wrong direction....
This archive was generated by hypermail 2.1.5 : Mon Mar 01 2004 - 17:29:01 EST