Re: XML Suitable (was: Meeting minutes for UTC 92 in August)

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Tue Oct 22 2002 - 13:36:04 EDT

  • Next message: jameskass@att.net: "Testing Tamil and Telugu"

    Doug Ewell wrote:

    >> [92-C23] Consensus: Add a definition of "XML Suitable" and a
    >> recommendation that SCSU encoders should be "XML Suitable".
    >> [L2/02-262]
    >>
    >> [92-A46] Action Item for Markus Scherer, Editorial Committee:
    >>Post a proposed update to Unicode Technical Standard #6 A
    >>Standard Compression Scheme for Unicode, adding the update on
    >>"XML Suitable". [L2/02-262]
    >
    > OK, so what the heck does "XML Suitable" mean? How can I determine
    > whether my SCSU encoder is "XML Suitable," or fix it so it will be?
    > (Yes, I do know what XML is.)

    The idea is that an SCSU encoder should stay in single-byte mode at least for as long as the initial part of the text fits into Latin-1 (code points <= U+00FF). With the exception of the SQU for a possible signature (U+FEFF) of course.
    This ensures that if a document begins with <?xml version="1.0" encoding="SCSU"?> then that part will be encoded the same as with US-ASCII, and the XML parser has a chance to read the encoding declaration. Same for HTML.
    Without such a clause, an encoder could immediately switch into Unicode mode or do other funny things that destroy the ASCII readability of the encoding declaration.
    Note that any reasonable and not purely trivial (SCU + UTF-16BE) encoder does this anyway.

    markus



    This archive was generated by hypermail 2.1.5 : Tue Oct 22 2002 - 14:26:57 EDT