Corrigendum #9 clarifies noncharacter usage in Unicode

From: <>
Date: Wed, 20 Feb 2013 12:49:39 -0800

There has been confusion about whether noncharacters were permitted in
Unicode text. The new Corrigendum #9: Clarification About Noncharacters
<> makes it clear that
noncharacters are permissible even in open interchange, although their
intended semantics may not beinterpretable in such contexts. The UTF-8,
UTF-16, UTF-32 & BOM FAQ <> has
also been updated for clarity, and other informative text about
noncharacters will be revised over time, including the Core Specification.

Background. There are 66 noncharacters permanently reserved for internal
use, typically used for some sort of control function or sentinel value.
They should be supported by APIs, components, and applications that
handle (i.e., either process or pass through) all Unicode strings, such
as a text editor or string class. Where an application does make
internal use of a noncharacter, it should take some measures to sanitize
input text from unknown sources. The best practice is to replace that
particular noncharacter on input by U+FFFD. (The noncharacter should not
be simply deleted, since that has security problems. For more
information, see Section 3.5 Deletion of Code Points
<> in UTR
#36, Unicode Security Guidelines <>.)

All of the Unicode Consortium lists are strictly opt-in lists for members
or interested users of our standards. We make every effort to remove
users who do not wish to receive e-mail from us. To see why you are getting
this mail and how to remove yourself from our lists if you want, please
Received on Wed Feb 20 2013 - 14:57:04 CST

This archive was generated by hypermail 2.2.0 : Wed Feb 20 2013 - 14:57:05 CST