L2/02-262 Source: Markus Scherer Date: July 11, 2002 Proposal to add another conformance clause to UTS #6 SCSU (http://www.unicode.org/reports/tr6/#Conformance) Additional text: Conformant encoders must remain in Single Byte Mode at least until the first code point is encountered that is not U+0000 (NUL), U+0009 (HT), U+000A (LF), U+000D (CR), or U+0020..U+00FF (Latin-1). Rationale: This restriction makes SCSU documents with internal encoding declarations possible, especially in XML and HTML. Such documents can be parsed assuming ASCII-compatible encodings up to the encoding declaration if the SCSU encoder does not switch to Unicode Mode until then. The process emitting the document should place the encoding declaration at the earliest possible place, before any non-Latin-1 characters. (This is possible in HTML and required in XML.) Without this restriction, a trivial encoder could just switch to Unicode Mode immediately. markus