From: Philippe Verdy (email@example.com)
Date: Fri Jan 16 2004 - 07:38:01 EST
Instead of relying on the support of UTF-8 locales by your C/C++ platform,
why don't you create your own function which would wrap the calls to
mbstowcs() and similar calls on Unix, or to WideCharToMultiByte() on Windows
(yes this works even on Windows 95 which does not support many charsets
except conversions between the system default OEMCP and ACP codepages and
UTF-8) depending on the platform and without requiring you to adjust
If you really want to support only UTF-8, then don't use locale-related
functions to perform this job. Create your own wrappers to support the
string functions you need to work with this encoding. And make sure that all
your interfaces will perform the necessary conversion between the external
charsets and the internal UTF-8.
My opinion however would be that it will be more convenient to use UTF-16 as
the internal encoding of your application, as it really simplifies things.
Each time you identify "standard library" functions that are in fact system
dependant, it's best to create your own simple wrappers to encapsulate the
portability logic and remove the system-dependant functions from your main
applicative code. Using a coherent internal charset will also simplify its
debugging and enhance the runtime performance. Trying to cope with multiple
charsets in the middle of your application will always be tricky. So
consider the standard library as a convenient gateway to create easily your
own wrappers for external interfaces, not as a general purpose tool used for
the design of your code.
In large projects, these string handling functions are almost always wrapped
(this is true for Java, except that the Java core library is normally
guaranteed to be natively portable as they are already implementing
internally the system-specific wrappers, so that you can be confident that
Java Strings will always be UTF-16 encoded without requiring you to handle
multiple charsets for the internal string handling methods of your
----- Original Message -----
From: "Deepak Chand Rathore" <firstname.lastname@example.org>
Sent: Friday, January 16, 2004 11:37 AM
Subject: UTF8 locale & shell encoding
> i am dealing with utf-8 unicode , using functions mbstowcs( ),wcwidth(
> ),etc defined in wchar.h
> for converting wide char to utf8 & other things.
> For these functions to behave correctly , i need to set locale to
> As solaris has en_US.UTF8 (w/o installing any extra support) , there is
> i don't know about HP, AIX, DEC, other flavours of unix ?? (any good URL
> where i can get this information ??)
> in unix i can generate utf8 locales using localedef.
> But i am having problem especially in windows, as i can't find a locale
> supporting this.
> i tried changing windows code page to utf8 using _setmbcp(65001), but it
> didn't work
> as the functions i am using is locale dependent.
> in java, it's really easy, but i am coding in c++
> What shall i do now????
> I also want to know the shell encoding in different OS (windows &
> flavours of unix)
> Is the shell encoding same as the default locale encoding
This archive was generated by hypermail 2.1.5 : Fri Jan 16 2004 - 08:11:37 EST