Re: FW: FW: unicode character on Different Unix platforms ....

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Wed Nov 03 1999 - 06:14:01 EST


"Hemant Ramnani" wrote on 1999-11-03 03:54 UTC:
> Can you please tell me on which platforms is wchar_t 1 byte.

None. This is just a theoretical possibility allowed by the standard,
but not anything implemented on a widely used platform.

The question should not be

  How large is wchar_t?

but instead

  Does this C implementation have suitable Unicode locale support?

If there is suitable Unicode locale support and you plan to use it, then
you can be sure that wchar_t is of suitable size. If there is no
suitable Unicode locale support, then there is absolutely no reason for
you to use wchar_t on this machine. Wchar_t is just a type used to
communicate with C's standard wide character library functions. If you
use these function, you have no choice but using wchar_t, if you don't
use these, there is no point in using wchar_t. The size of wchar_t is
mostly irrelevant in this decision.

A recommendation to categorically not use wchar_t for Unicode would be
nonsense. The correct recommendation should be: Consider carefully
whether you want to use C's wide character functions to handle Unicode,
because at this time they might not yet be supporting Unicode on all
your target platforms. For instance, under Linux, we are still eagerly
waiting on the release of glibc 2.2, which is supposed to finally have
them fully supported, including a comprehensive set of Unicode/UTF-8
locales. Other systems, such as Solaris, had them already for quite some
time.

Commonly you fill find only three variants for wchar_t:

  - 16 bit unsigned
  - 32 bit unsigned
  - 32 bit signed
  - 8 bit (for implementations without any wide character locale)

My personal strong preference is for 32-bit signed (as this covers all
of UCS and allows wchar_t = wint_t, avoiding signed/unsigned type
confusion), and I hope that this is what all quality C implementations
will finally converge to. It would be neat if ISO C would formally
prescribe wchar_t to be at least a signed 32-bit type, but backwards
compatible committee politics will not allow this.

You shouldn't write wchar_t values directly to files and network
connections, for exactly the same reason why you shouldn't do this with
int values: incompatibilities in size and endianess.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT