Re: 32'nd bit & UTF-8

From: Clark Cox (
Date: Fri Jan 21 2005 - 08:27:16 CST

  • Next message: Arcane Jill: "So how about U+D7FD for a NOP then?"

    On Fri, 21 Jan 2005 08:42:51 -0000, Arcane Jill <> wrote:
    > -----Original Message-----
    > From: []On
    > Behalf Of Hans Aberg
    > Sent: 20 January 2005 20:47
    > To: Antoine Leca;
    > Subject: Re: 32'nd bit & UTF-8
    > > That already seems to have happened with GNU GCC, which fixes wchar_t to
    > > 32-bits.
    > and Microsoft Wisual C++, which fixes wchar_t to SIXTEEN bits.
    > The existence of wchar_t does not imply UTF-32. It does imply UTF-16. It does
    > not even imply Unicode. It's just a type.

    But, if __STDC_ISO_10646__ is defined, then it does imply that wchar_t
    can represent all of the Unicode/ISO-10646 characters. From the C

    "__STDC_ISO_10646__ An integer constant of the form yyyymmL (for
    example, 199712L). If this symbol is defined, then every character in
    the "Unicode required set", when stored in an object of type
    wchar_t, has the same value as the short identifier of that
    character. The "Unicode required set" consists of all the characters
    that are defined by ISO/IEC 10646, along with all amendments and
    technical corrigenda, as of the specified year and month."

    In addition, it seems that there is no way that a conforming C
    implementation can use wchar_t to represent UTF-16. If
    __STDC_ISO_10646__ is less than 200111, then UTF-16 didn't exist at
    the time, so wchar_t must be UCS-2 in that case, and if
    __STDC_ISO_10646__ is greater than or equal to 200111, then a single
    16-bit wchar_t is not large enough to contain a representation of any
    given character "defined by ISO/IEC 10646".

    Clark S. Cox III

    This archive was generated by hypermail 2.1.5 : Fri Jan 21 2005 - 08:30:28 CST