Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

From: Tim Greenwood (timg1952@aol.com)
Date: Thu Dec 11 2003 - 10:57:47 EST

  • Next message: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"

    In my interpretation of the C standard (which I am reading from
    http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.pdf) UTF-8 is not a
    valid wchar_t encoding if your execution character set contains
    characters outside the C0 controls and Basic Latin range, and UTF-16 is
    not a valid wchar_t encoding if your execution character set has
    characters outside the BMP. In other words whatever you consider to be a
    character (which may be a combining character) must be encoded in one
    wchar_t code unit.

    The relevant passage is

    11 A wide character constant has type wchar_t, an integer type defined
    in the <stddef.h> header. The value of a wide character constant
    containing a single multibyte character that maps to a member of the
    extended execution character set is the wide character (code)
    corresponding to that multibyte character, as defined by the mbtowc
    function, with an implementation-defined current locale. The value of a
    wide character constant containing more than one multibyte character, or
    containing a multibyte character or escape sequence not represented in
    the extended execution character set, is implementation-defined.

    Tim



    This archive was generated by hypermail 2.1.5 : Thu Dec 11 2003 - 11:52:16 EST