Re: UTF8 locale & shell encoding

From: Edward H. Trager (
Date: Fri Jan 16 2004 - 11:33:25 EST

  • Next message: Rick Cameron: "RE: UTF8 locale & shell encoding"

    On Friday 2004.01.16 13:38:01 +0100, Philippe Verdy wrote:
    > Instead of relying on the support of UTF-8 locales by your C/C++ platform,
    > why don't you create your own function which would wrap the calls to
    > mbstowcs() and similar calls on Unix, or to WideCharToMultiByte() on Windows
    > (yes this works even on Windows 95 which does not support many charsets
    > except conversions between the system default OEMCP and ACP codepages and
    > UTF-8) depending on the platform and without requiring you to adjust
    > locales?
    > If you really want to support only UTF-8, then don't use locale-related
    > functions to perform this job. Create your own wrappers to support the
    > string functions you need to work with this encoding. And make sure that all
    > your interfaces will perform the necessary conversion between the external
    > charsets and the internal UTF-8.
    > My opinion however would be that it will be more convenient to use UTF-16 as
    > the internal encoding of your application, as it really simplifies things.

    I just wonder why you say that? I think it depends on the application. I have
    an application which originally only handled ASCII: to make it Unicode-enabled
    UTF-8 is the obvious answer as I only need to add/change things in a very few
    places to make it all work. As (extended) UTF-16 is also a variable-length
    encoding format (when going beyond the Basic Multilingual Plane), I don't see
    it as being "more convenient" than UTF-8. In fact, I see UTF-8 as being more
    convenient, since it is completely compatible with ASCII and the basic C string
    handling functions.

    > Each time you identify "standard library" functions that are in fact system
    > dependant, it's best to create your own simple wrappers to encapsulate the
    > portability logic and remove the system-dependant functions from your main
    > applicative code. Using a coherent internal charset will also simplify its
    > debugging and enhance the runtime performance. Trying to cope with multiple
    > charsets in the middle of your application will always be tricky. So
    > consider the standard library as a convenient gateway to create easily your
    > own wrappers for external interfaces, not as a general purpose tool used for
    > the design of your code.
    > In large projects, these string handling functions are almost always wrapped
    > (this is true for Java, except that the Java core library is normally
    > guaranteed to be natively portable as they are already implementing
    > internally the system-specific wrappers, so that you can be confident that
    > Java Strings will always be UTF-16 encoded without requiring you to handle
    > multiple charsets for the internal string handling methods of your
    > application).
    > ----- Original Message -----
    > From: "Deepak Chand Rathore" <>
    > To: <>
    > Sent: Friday, January 16, 2004 11:37 AM
    > Subject: UTF8 locale & shell encoding
    > > i am dealing with utf-8 unicode , using functions mbstowcs( ),wcwidth(
    > > ),etc defined in wchar.h
    > > for converting wide char to utf8 & other things.
    > > For these functions to behave correctly , i need to set locale to
    > xxx.UTF-8
    > > As solaris has en_US.UTF8 (w/o installing any extra support) , there is
    > no
    > > problem.
    > > i don't know about HP, AIX, DEC, other flavours of unix ?? (any good URL
    > > where i can get this information ??)
    > > in unix i can generate utf8 locales using localedef.
    > > But i am having problem especially in windows, as i can't find a locale
    > > supporting this.
    > > i tried changing windows code page to utf8 using _setmbcp(65001), but it
    > > didn't work
    > > as the functions i am using is locale dependent.
    > > in java, it's really easy, but i am coding in c++
    > > What shall i do now????
    > >
    > > I also want to know the shell encoding in different OS (windows &
    > different
    > > flavours of unix)
    > > Is the shell encoding same as the default locale encoding
    > >
    > > Thanks
    > >
    > > DC
    > >

    This archive was generated by hypermail 2.1.5 : Fri Jan 16 2004 - 11:30:43 EST