Re: Corrigendum #9 clarifies noncharacter usage in Unicode

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Thu, 21 Feb 2013 19:06:33 +0000

On Wed, 20 Feb 2013 12:49:39 -0800
announcements_at_unicode.org wrote:

> They should be supported by APIs, components, and
> applications that handle (i.e., either process or pass through) all
> Unicode strings, such as a text editor or string class. Where an
> application does make internal use of a noncharacter, it should take
> some measures to sanitize input text from unknown sources.

Does this mean that a general purpose application written in C that uses
Microsoft's 16-bit wchar_t to handle little-endian UTF-16 input using
the fgetwc() function should be regarded as broken? The problem is
that a return value of 0xFFFF means not non-character U+FFFF, but end
of file!

U+FFFE at the start of a UTF-16 file should also cause some headaches!
Doesn't Microsoft Windows still interpret this as a byte-order mark
without asking whether there may be a byte-order mark?

Richard.
Received on Thu Feb 21 2013 - 13:10:03 CST

This archive was generated by hypermail 2.2.0 : Thu Feb 21 2013 - 13:10:04 CST