From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Tue Oct 22 2002 - 13:36:04 EDT
Doug Ewell wrote:
>> [92-C23] Consensus: Add a definition of "XML Suitable" and a
>> recommendation that SCSU encoders should be "XML Suitable".
>> [L2/02-262]
>>
>> [92-A46] Action Item for Markus Scherer, Editorial Committee:
>>Post a proposed update to Unicode Technical Standard #6 A
>>Standard Compression Scheme for Unicode, adding the update on
>>"XML Suitable". [L2/02-262]
>
> OK, so what the heck does "XML Suitable" mean? How can I determine
> whether my SCSU encoder is "XML Suitable," or fix it so it will be?
> (Yes, I do know what XML is.)
The idea is that an SCSU encoder should stay in single-byte mode at least for as long as the initial part of the text fits into Latin-1 (code points <= U+00FF). With the exception of the SQU for a possible signature (U+FEFF) of course.
This ensures that if a document begins with <?xml version="1.0" encoding="SCSU"?> then that part will be encoded the same as with US-ASCII, and the XML parser has a chance to read the encoding declaration. Same for HTML.
Without such a clause, an encoder could immediately switch into Unicode mode or do other funny things that destroy the ASCII readability of the encoding declaration.
Note that any reasonable and not purely trivial (SCU + UTF-16BE) encoder does this anyway.
markus
This archive was generated by hypermail 2.1.5 : Tue Oct 22 2002 - 14:26:57 EDT