From: Frank Yung-Fong Tang (email@example.com)
Date: Tue Mar 02 2004 - 11:18:55 EST
Basically, the answer is very simple- the value is something you "should not know". Why?OK, I guess I need to be more precise in my question.For each of the popular unices (Solaris, HP-UX, AIX, and - if possible - linux), can anyone answer the following question:Assuming that the locale is set to Unicode, what is in a wchar_t string? Is it UTF-32 or pseudo-UTF-16 (i.e. UTF-16 code units, zero-extended to 32 bits)?
and also those functions listed in
size_t mbstowcs(wchar_t *, const char *, size_t);
int mbtowc(wchar_t *, const char *, size_t);
size_t wcstombs(char *, const wchar_t *, size_t);
int wctomb(char *, wchar_t);
There is one single answer- "Developers, except those who write the compiler code and the C Lib, should NOT know what is".I'm not expecting that there's single answer for all the unices of interest.
NO. that is not true. "Application" cannot store whatever it want in a wcha_t. ANSI C standars basically say the "compiler vendor" or "OS vendor who also ship the compiler (which convert the L"" into wchar_t and implement those library functions above)" can store whatever it want into wchar_t. That does not mean "Application developer" can do that because the application developer have no control over how L"String" convert into wchar_t and no control over how to implement those wchar_t functions.And I'm well aware that our application can store in a wchar_t  whatever it wants.
the OS expect the wchar_t store the value which generated by wbstowcs or wbtowc.I'm trying to find out what the O/S expects to be in a wchar_t string.
1. save your current localeThe reason we want to know this is that we want to be able to write a function that converts from UTF-8 (stored in a char ) to wchar_t  properly. Obviously the function may need to behave differently on different flavours of unix.
I am aware of the utility functions offered by TUC to perform conversions between UTF-8, UTF-16 and UTF-32. These functions do not handle the case of pseudo-UTF-16; which doesn't surprise me, since AFAIK it's not a conformant encoding form. Nonetheless, I have a string suspicion that some unices may use it.Cheers- rick cameron
From: Frank Yung-Fong Tang [mailto:firstname.lastname@example.org]
Sent: March 1, 2004 12:48
To: Rick Cameron
Subject: Re: What's in a wchar_t string on unix?
Rick Cameron wrote on 3/1/2004, 2:13 PM:The reason is there are "NO answer" to the question you ask.
This may be an FAQ, but I couldn't find the answer on unicode.org.
Depend on which UNIX and which version. Depend on how you define "most flavours"
It seems that most flavours of unix define wchar_t to be 4 bytes.
No answer for that because
If the locale is set to be Unicode, what's in a wchar_t string?
1) ANSI C standard does not define it. (neither it's size nor it's content)
2) Several organization try to establish standard for Unix. One of that is "The Open Group"'s "Base Specifications" IEEE Std 1003.1, 2003. But neither that define what should wchar_t hold.
The more interesting question is, why do you need to know the answer of your question. And the ANSI/C wchar_t model basically suggest, if you ask that question, you are moving to a wrong direction....
Is it UTF-32, or UTF-16 with the code units zero-extended to 4 bytes?
- rick cameron
This archive was generated by hypermail 2.1.5 : Tue Mar 02 2004 - 11:59:04 EST