L2/01-355

ISO/IEC JTC1/SC2/WG2 2369

Universal Multiple-Octet Coded Character Set
International Organization for Standardization
Organisation internationale de normalisation

Doc Type: Working Group Document
Title: Request to allow FFFF, FFFE in UTF-8 in the text of ISO/IEC 10646 
Source:  Unicode Technical Committee 
Status:  Liaison Statement
Action: For adoption by JTC1/SC2/WG2
Date: 2001-09-26


The Unicode Technical Committee requests that WG2 change its definition of UTF-8 to allow the representation of the code points U+FFFF and U+FFFE. These are disallowed in ISO/IEC 10646, but are clearly an anomaly: other non-characters (U+1FFFE, U+1FFFF, etc.) as well as the new non-characters U+FDD0..U+FDEF are allowed.

Moreover, these code points are all legal in HTML: see the SGML declaration
(http://www.w3.org/TR/REC-html40/sgml/sgmldecl.html).

The 10646 definition of UTF-8 should be amended as soon as possible to allow all non-characters to be represented in UTF-8.