Re: What's in a wchar_t string ...

From: Frank Yung-Fong Tang (ytang0648@aol.com)
Date: Wed Mar 03 2004 - 14:11:07 EST

  • Next message: Antoine Leca: "Re: What's in a wchar_t string on unix?"

    So that mean __STDC_ISO_10646__ defined may not be UCS4 but UCS2 or
    UTF-16, right?

    Nelson H. F. Beebe wrote on 3/3/2004, 1:49 PM:

    > "Frank Yung-Fong Tang" <ytang0648@aol.com> asks on Wed, 3 Mar 2004
    > 12:38:49
    > -0500:
    >
    > >> Does it also mean wchar_t is 4 bytes if __STDC_ISO_10646__ is
    > defined?
    > >> or does it only mean wchar_t hold the character in ISO_10646
    > >> (which mean it could be 2 bytes, 4 bytes or more than that?)
    >
    > Here is the exact text from
    >
    > INTERNATIONAL ISO/IEC STANDARD 9899
    > Second edition
    > 1999-12-01
    > Programming languages -- C
    >
    > >> ...
    > >> __STDC_ISO_10646__ An integer constant of the form yyyymmL (for
    > >> example, 199712L), intended to indicate
    > >> that values of type wchar_t are the coded
    > >> representations of the characters defined
    > >> by ISO/IEC 10646, along with all amendments
    > >> and technical corrigenda as of the
    > >> specified year and month.
    > >> ...
    >
    > It says nothing more about the size of wchar_t, or what encodings are
    > used: note the vague language "coded representations...". This means
    > effectively that the implementation, not the Standard, decides.
    >
    > Very few current Unix C or C++ compilers even define the symbol
    > __STDC_ISO_10646__; the C/C++ feature test package at
    >
    > ftp://ftp.math.utah.edu/pub/features
    > http://www.math.utah.edu/pub/features
    >
    > probes that macro value, and many others.
    >
    > My logs of its runs in about 90 build environments show definitions
    > with values 200009 for GNU gcc versions 3.x (all platforms), Intel icc
    > versions 7.x and 8.0 (Intel IA-32 and IA-64), and Portland Group pgcc
    > versions 4.x and 5.x (Intel IA-32). On all of these, it reports that
    > sizeof(wchar_t) = 4, but of course, that says nothing whatever about
    > the encoding.
    >
    >
    -------------------------------------------------------------------------------

    >
    > - Nelson H. F. Beebe Tel: +1 801 581
    > 5254 -
    > - University of Utah FAX: +1 801 581
    > 4148 -
    > - Department of Mathematics, 110 LCB Internet e-mail:
    > beebe@math.utah.edu -
    > - 155 S 1400 E RM 233 beebe@acm.org
    > beebe@computer.org -
    > - Salt Lake City, UT 84112-0090, USA URL:
    > http://www.math.utah.edu/~beebe -
    >
    -------------------------------------------------------------------------------

    >
    >



    This archive was generated by hypermail 2.1.5 : Wed Mar 03 2004 - 15:04:09 EST