From: Kent Karlsson (kentk@cs.chalmers.se)
Date: Fri Dec 12 2003 - 13:55:16 EST
> Tim Greenwood wrote:
> > In my interpretation of the C standard (which I am reading from
> > http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.pdf) UTF-8 is not a
> > valid wchar_t encoding if your execution character set contains
> > characters outside the C0 controls and Basic Latin range, and
> > UTF-16 is not a valid wchar_t encoding if your execution character
> > set has characters outside the BMP. In other words whatever you
> > consider to be a character (which may be a combining character)
> > must be encoded in one wchar_t code unit.
True. But there are well-known implementations that break that
and has UTF-16 code units as wchar_t instead (something that
upsets the C standardisation committee a bit).
There have been **suggestions** to have utf16_t and utf32_t
(for the respective code units, "char" is judged good enough for
UTF-8 code units), together with character (code unit really)
and string literal syntaxes put into standard C. But don't hold
your breath...
/kent k
This archive was generated by hypermail 2.1.5 : Fri Dec 12 2003 - 14:38:32 EST