Re: What's in a wchar_t string on unix?

From: Clark Cox (clarkcox3@mac.com)
Date: Wed Mar 03 2004 - 16:33:27 EST

  • Next message: Magda Danish \(Unicode\): "FW: Web Form: Other Question: Does use of Unicode charset in Oracle database affect performance?"

    On Mar 03, 2004, at 14:13, Frank Yung-Fong Tang wrote:

    >
    >
    > Clark Cox wrote on 3/3/2004, 1:28 PM:
    >
    >> From the C standard:
    >>
    >> __STDC_ISO_10646_ _An integer constant of the formyyyymmL(for example,
    >> 199712L), intended to indicate that values of type wchar_t are the
    >> coded representations of the characters defined by ISO/IEC10646, along
    >> with all amendments and technical corrigenda as of the specified year
    >> and month.
    >>
    >> This, to me suggests that wchar_t would indeed be a 32-bit type (well,
    >> at least a 20-bit type) when this macro is defined. However, to be
    >> sure, I'd suggest posting to news:comp.std.c
    >
    > The language in the standard does not prevent someone to make it 16
    > bits or 64 bits when that macro is defined, right?

            Not explicitly, but as I read it, when that macro is defined, wchar_t
    would have to be at least 20-bits, or else it couldn't be true that
    "values of type wchar_t are the coded representations of the characters
    defined by ISO/IEC10646". That is, I would think that wchar_t would
    have to be able to represent values in the range [0, 0x10FFFF]. But my
    interpretation could be off, which is why I recommended asking on
    comp.std.c.

    >
    > And what does the year and month mean?

    It indicates which version of ISO10646 is used by the implementation.
    In the above example, it indicates whatever version was in effect in
    December of 1997.

    -- 
    Clark S. Cox III
    clarkcox3@mac.com
    http://homepage.mac.com/clarkcox3/
    http://homepage.mac.com/clarkcox3/blog/B1196589870/index.html
    
    




    This archive was generated by hypermail 2.1.5 : Wed Mar 03 2004 - 17:08:08 EST