Re: UTF8 locale & shell encoding

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Jan 16 2004 - 07:38:01 EST

  • Next message: Philippe Verdy: "Re: UTF8 locale & shell encoding"

    Instead of relying on the support of UTF-8 locales by your C/C++ platform,
    why don't you create your own function which would wrap the calls to
    mbstowcs() and similar calls on Unix, or to WideCharToMultiByte() on Windows
    (yes this works even on Windows 95 which does not support many charsets
    except conversions between the system default OEMCP and ACP codepages and
    UTF-8) depending on the platform and without requiring you to adjust
    locales?

    If you really want to support only UTF-8, then don't use locale-related
    functions to perform this job. Create your own wrappers to support the
    string functions you need to work with this encoding. And make sure that all
    your interfaces will perform the necessary conversion between the external
    charsets and the internal UTF-8.

    My opinion however would be that it will be more convenient to use UTF-16 as
    the internal encoding of your application, as it really simplifies things.

    Each time you identify "standard library" functions that are in fact system
    dependant, it's best to create your own simple wrappers to encapsulate the
    portability logic and remove the system-dependant functions from your main
    applicative code. Using a coherent internal charset will also simplify its
    debugging and enhance the runtime performance. Trying to cope with multiple
    charsets in the middle of your application will always be tricky. So
    consider the standard library as a convenient gateway to create easily your
    own wrappers for external interfaces, not as a general purpose tool used for
    the design of your code.

    In large projects, these string handling functions are almost always wrapped
    (this is true for Java, except that the Java core library is normally
    guaranteed to be natively portable as they are already implementing
    internally the system-specific wrappers, so that you can be confident that
    Java Strings will always be UTF-16 encoded without requiring you to handle
    multiple charsets for the internal string handling methods of your
    application).

    ----- Original Message -----
    From: "Deepak Chand Rathore" <deepakr@aztec.soft.net>
    To: <unicode@unicode.org>
    Sent: Friday, January 16, 2004 11:37 AM
    Subject: UTF8 locale & shell encoding

    > i am dealing with utf-8 unicode , using functions mbstowcs( ),wcwidth(
    > ),etc defined in wchar.h
    > for converting wide char to utf8 & other things.
    > For these functions to behave correctly , i need to set locale to
    xxx.UTF-8
    > As solaris has en_US.UTF8 (w/o installing any extra support) , there is
    no
    > problem.
    > i don't know about HP, AIX, DEC, other flavours of unix ?? (any good URL
    > where i can get this information ??)
    > in unix i can generate utf8 locales using localedef.
    > But i am having problem especially in windows, as i can't find a locale
    > supporting this.
    > i tried changing windows code page to utf8 using _setmbcp(65001), but it
    > didn't work
    > as the functions i am using is locale dependent.
    > in java, it's really easy, but i am coding in c++
    > What shall i do now????
    >
    > I also want to know the shell encoding in different OS (windows &
    different
    > flavours of unix)
    > Is the shell encoding same as the default locale encoding
    >
    > Thanks
    >
    > DC
    >



    This archive was generated by hypermail 2.1.5 : Fri Jan 16 2004 - 08:11:37 EST