Re: character entities in UTF-8 files

From: Eric Muller (emuller@adobe.com)
Date: Wed Jul 13 2005 - 10:35:19 CDT

  • Next message: Gregg Reynolds: "Re: character entities in UTF-8 files"

    Elliotte Harold wrote:

    >
    > <![CDATA[foo]]&gt;bar]]> is parsed as the string "foo]]&gt;bar", not
    > "foo]]>bar". There is no way to represent the three character sequence
    > ]]> inside a CDATA section. You have to close the CDATA section, emit
    > a > character, and open a new CDATA section.

    You are right, I was mislead by XML 1.0, third edition
    (<http://www.w3.org/TR/2004/REC-xml-20040204/>) section 2.4, end of 3rd
    paragraph:

        The right angle bracket (>) /MAY/ be represented using the string
        "|&gt;|", and /MUST/, for compatibility
        <http://www.w3.org/TR/2004/REC-xml-20040204/#dt-compat>, be escaped
        using either "|&gt;|" or a character reference when it appears in
        the string "|]]>|" in content, when that string is not marking the
        end of a CDATA section
        <http://www.w3.org/TR/2004/REC-xml-20040204/#dt-cdsection>.

    which could be improved by adding "... and is not in a CDATA section."
    and a sentence like "The string "]]>" cannot appear in a single CDATA
    section ("<![CDATA[...]]>]]><![CDATA[...]]>" is a possible pattern for
    the content "...]]>..." that overcomes this limitation.)"

    Eric.



    This archive was generated by hypermail 2.1.5 : Wed Jul 13 2005 - 10:37:02 CDT