Re: character entities in UTF-8 files

From: Andy Heninger ([email protected])
Date: Thu Jul 14 2005 - 01:13:47 CDT

Next message: Donald Z. Osborn: "Questions re ISO-639-1,2,3"

Previous message: Peter Constable: "RE: Regarding Correct Display of Extended Latin Devanagari"
In reply to: Gregg Reynolds: "Re: character entities in UTF-8 files"
Next in thread: Asmus Freytag: "Re: character entities in UTF-8 files"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Gregg Reynolds wrote:

> an XML parser will
> first *replace* character entities, before passing the data to the
> consuming application. When that happens in relation to parsing (i.e.
> checking for well-formedness) is implementation-dependent,

It's implementation dependent only because so many implementations get
it wrong. XML's rules for entity replacement and construction of the
text to be delivered by a parser to the application are astoundingly,
mind bogglingly complicated. SGML heritage is largely to blame, I've
been told.

See http://www.w3.org/TR/2004/REC-xml11-20040204/#sec-entexpand
for the official story.

> if I'm not
> mistaken. I find the XML spec a little fuzzy on that point (I can't
> wait for the English translation); it talks about at least < and some
> other char entities being "escaped".
>

-- 
   -- Andy Heninger
      [email protected]

Next message: Donald Z. Osborn: "Questions re ISO-639-1,2,3"
Previous message: Peter Constable: "RE: Regarding Correct Display of Extended Latin Devanagari"
In reply to: Gregg Reynolds: "Re: character entities in UTF-8 files"
Next in thread: Asmus Freytag: "Re: character entities in UTF-8 files"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jul 14 2005 - 01:16:02 CDT