From: Rick Cameron (Rick.Cameron@businessobjects.com)
Date: Thu Mar 04 2004 - 12:56:31 EST
Woo-hoo! Finally, a real answer, rather than speculation.
Thanks very much, Ienup.
- rick
-----Original Message-----
From: Ienup Sung [mailto:is@mpkmail.eng.sun.com]
Sent: March 4, 2004 9:53
To: Rick Cameron
Cc: unicode@unicode.org
Subject: Re: What's in a wchar_t string on unix?
Solaris Unicode/UTF-8 locales are using UTF-32 and we guarantee that it has
been and will stay that way.
Just in case, there are also a set of C std API such as mbtowc(),
mbstowcs(), mbrtowc(), wctomb(), wcstombs(), wcrtomb(), and so on that will
convert between wide character (UTF-32) and multibyte character (UTF-8)
properly as long as you set the current locale to a Unicode/UTF-8 locale. If
you wish to use non-locale sensitive function of conversion, you could use
iconv() instead by openning the conversion descriptor with iconv_open() with
"UTF-32" and "UTF-8" as fromcode and tocode (or vice versa). (A sample
program example is available at iconv(3C) man page at Solaris by the way.)
I'm also quite sure all major Unix/Linux systems support the functions that
I mentioned. (I also believe majority will support UTF-32BE, UTF-32LE and
such variations too in the iconv() code conversions by the way.)
Additionally, since POSIX defines wchar_t as an opaque data type, we hope
that people are using the std C interfaces to do conversions between wchar_t
and multibyte characters if possible.
With regards,
Ienup
] From: Rick Cameron <Rick.Cameron@businessobjects.com>
] Subject: RE: What's in a wchar_t string on unix?
] Date: Mon, 1 Mar 2004 13:59:06 -0800
]
] OK, I guess I need to be more precise in my question.
]
] For each of the popular unices (Solaris, HP-UX, AIX, and - if possible - ]
linux), can anyone answer the following question:
]
] Assuming that the locale is set to Unicode, what is in a wchar_t string?
Is ] it UTF-32 or pseudo-UTF-16 (i.e. UTF-16 code units, zero-extended to 32
] bits)?
]
] I'm not expecting that there's single answer for all the unices of
interest.
] And I'm well aware that our application can store in a wchar_t [] whatever
] it wants. I'm trying to find out what the O/S expects to be in a wchar_t ]
string.
]
] The reason we want to know this is that we want to be able to write a ]
function that converts from UTF-8 (stored in a char []) to wchar_t [] ]
properly. Obviously the function may need to behave differently on different
] flavours of unix.
]
] I am aware of the utility functions offered by TUC to perform conversions
] between UTF-8, UTF-16 and UTF-32. These functions do not handle the case
of ] pseudo-UTF-16; which doesn't surprise me, since AFAIK it's not a
conformant ] encoding form. Nonetheless, I have a string suspicion that some
unices may ] use it.
]
] Cheers
]
] - rick cameron
This archive was generated by hypermail 2.1.5 : Thu Mar 04 2004 - 13:33:42 EST