From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Tue Jul 12 2005 - 17:46:54 CDT
At 10:44 AM 7/12/2005, Avraham Shapiro wrote:
>** Low Priority **
>
>We have an XML based application that specifies UTF-8 files as
>input. Occasionally users will
>include numeric character entites, for example é for e acute instead
>of the UTF-8
>equivalent of C3 A9. My question is: Is this legal UTF-8? And are
>numeric or symbolic character
>entites valid for Ascii-7 characters such as "<"? My guess is the first
>one is not legal,
>and the second one is application defined, i.e. Unicode says nothing about
>it. Am I
>right?
Your message seems to imply that you are talking about XML files that are
encoded in UTF-8, but you don't state that explicitly. Under the assumption
that that is what you meant, it is XML that defines whether é is legal
and how it is interpreted. All the UTF-8 format can tell you is that each
of the characters in the sequence & # 2 3 3 ; will be represented by a
single ASCII byte in the UTF-8 file.
If your application can read plain text files (e.g. extension .txt and not
.xml and no XML header) as well, then inside those, neither Unicode nor XML
define any special interpretation. XML does not, since we assume the file
is not in XML, and Unicode does not, because an & is an & and not an escape
character in Unicode.
A./
This archive was generated by hypermail 2.1.5 : Tue Jul 12 2005 - 17:48:08 CDT