RE: [OT?] The C standard library and UTF's (was RE: Text Editors and Canonical Equivalence (was Coloured diacritics))

From: Kent Karlsson (kentk@cs.chalmers.se)
Date: Fri Dec 12 2003 - 13:55:16 EST

  • Next message: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"

    > Tim Greenwood wrote:
    > > In my interpretation of the C standard (which I am reading from
    > > http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.pdf) UTF-8 is not a
    > > valid wchar_t encoding if your execution character set contains
    > > characters outside the C0 controls and Basic Latin range, and
    > > UTF-16 is not a valid wchar_t encoding if your execution character
    > > set has characters outside the BMP. In other words whatever you
    > > consider to be a character (which may be a combining character)
    > > must be encoded in one wchar_t code unit.

    True. But there are well-known implementations that break that
    and has UTF-16 code units as wchar_t instead (something that
    upsets the C standardisation committee a bit).

    There have been **suggestions** to have utf16_t and utf32_t
    (for the respective code units, "char" is judged good enough for
    UTF-8 code units), together with character (code unit really)
    and string literal syntaxes put into standard C. But don't hold
    your breath...

                    /kent k



    This archive was generated by hypermail 2.1.5 : Fri Dec 12 2003 - 14:38:32 EST