__STDC_ISO_10646__ [Was: 32'nd bit & UTF-8]

From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Fri Jan 21 2005 - 09:26:08 CST

  • Next message: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"

    Clark Cox wrote, quoting C99 with TC2 applied:
    > "__STDC_ISO_10646__ An integer constant of the form yyyymmL (for
    > example, 199712L). If this symbol is defined, then every character
    > in the "Unicode required set", when stored in an object of type
    > wchar_t, has the same value as the short identifier of that
    > character. The "Unicode required set" consists of all the
    > characters that are defined by ISO/IEC 10646, along with all
    > amendments and technical corrigenda, as of the specified year and
    > month."
    > In addition, it seems that there is no way that a conforming C
    > implementation can use wchar_t to represent UTF-16. If
    > __STDC_ISO_10646__ is less than 200111, then UTF-16 didn't exist at
    > the time,

    Yes it did. It was introduced long ago, perhaps 1997.

    The key point of this definition is the repertoire. If your repertoire does
    not include the SMP and the SIP (nor PUA nor tags), you can freely restrict
    yourself to 16-bit characters, and then using a 16-bit UTF-16 wchar_t is
    perfectly conforming (to C99).
    Of course, it would be the same as UCS-2 for practical matters.

    You are correct that an implementation that uses 16-bit wchar_t cannot
    encode (there) Unicode 3.1 or later. Still, there is a lot of work to be
    done to have 3.0 right.


    This archive was generated by hypermail 2.1.5 : Fri Jan 21 2005 - 09:32:38 CST