Re: character entities in UTF-8 files

From: Andy Heninger (andyh@jtcsv.com)
Date: Thu Jul 14 2005 - 01:13:47 CDT

  • Next message: Donald Z. Osborn: "Questions re ISO-639-1,2,3"

    Gregg Reynolds wrote:

    > an XML parser will
    > first *replace* character entities, before passing the data to the
    > consuming application. When that happens in relation to parsing (i.e.
    > checking for well-formedness) is implementation-dependent,

    It's implementation dependent only because so many implementations get
    it wrong. XML's rules for entity replacement and construction of the
    text to be delivered by a parser to the application are astoundingly,
    mind bogglingly complicated. SGML heritage is largely to blame, I've
    been told.

    See http://www.w3.org/TR/2004/REC-xml11-20040204/#sec-entexpand
    for the official story.

    > if I'm not
    > mistaken. I find the XML spec a little fuzzy on that point (I can't
    > wait for the English translation); it talks about at least < and some
    > other char entities being "escaped".
    >

    -- 
       -- Andy Heninger
          heninger@us.ibm.com
    


    This archive was generated by hypermail 2.1.5 : Thu Jul 14 2005 - 01:16:02 CDT