Re: What's in a wchar_t string ...

From: Nelson H. F. Beebe (beebe@math.utah.edu)
Date: Wed Mar 03 2004 - 13:49:29 EST

  • Next message: Frank Yung-Fong Tang: "Re: What's in a wchar_t string on unix?"

    "Frank Yung-Fong Tang" <ytang0648@aol.com> asks on Wed, 3 Mar 2004 12:38:49
    -0500:

    >> Does it also mean wchar_t is 4 bytes if __STDC_ISO_10646__ is defined?
    >> or does it only mean wchar_t hold the character in ISO_10646
    >> (which mean it could be 2 bytes, 4 bytes or more than that?)

    Here is the exact text from

            INTERNATIONAL ISO/IEC STANDARD 9899
            Second edition
            1999-12-01
            Programming languages -- C

    >> ...
    >> __STDC_ISO_10646__ An integer constant of the form yyyymmL (for
    >> example, 199712L), intended to indicate
    >> that values of type wchar_t are the coded
    >> representations of the characters defined
    >> by ISO/IEC 10646, along with all amendments
    >> and technical corrigenda as of the
    >> specified year and month.
    >> ...

    It says nothing more about the size of wchar_t, or what encodings are
    used: note the vague language "coded representations...". This means
    effectively that the implementation, not the Standard, decides.

    Very few current Unix C or C++ compilers even define the symbol
    __STDC_ISO_10646__; the C/C++ feature test package at

            ftp://ftp.math.utah.edu/pub/features
            http://www.math.utah.edu/pub/features

    probes that macro value, and many others.

    My logs of its runs in about 90 build environments show definitions
    with values 200009 for GNU gcc versions 3.x (all platforms), Intel icc
    versions 7.x and 8.0 (Intel IA-32 and IA-64), and Portland Group pgcc
    versions 4.x and 5.x (Intel IA-32). On all of these, it reports that
    sizeof(wchar_t) = 4, but of course, that says nothing whatever about
    the encoding.

    -------------------------------------------------------------------------------
    - Nelson H. F. Beebe Tel: +1 801 581 5254 -
    - University of Utah FAX: +1 801 581 4148 -
    - Department of Mathematics, 110 LCB Internet e-mail: beebe@math.utah.edu -
    - 155 S 1400 E RM 233 beebe@acm.org beebe@computer.org -
    - Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe -
    -------------------------------------------------------------------------------



    This archive was generated by hypermail 2.1.5 : Wed Mar 03 2004 - 14:42:17 EST