Actually, the XML spec is very clear on this: it is handled through the use
of a BOM, to help the parser know that it is UTF-16 text.
If there is no BOM, then UTF-8 is assumed, unless the encoding tag is
present. However, the encoding tag is not required and parsers are not
required to support it.
In other words, a valid parser supports UTF-16 and UTF-8. If it does not, it
is not an XML parser.
You can see
for more details.
----- Original Message -----
From: "Paul Deuter" <Paul.Deuter@plumtree.com>
To: "Unicode List" <firstname.lastname@example.org>
Sent: Thursday, July 13, 2000 8:47 AM
Subject: Using Unicode in XML
> I know that XML can contain Unicode by using the declaration
> <?xl version="1.0" encoding="ISO-10646-UCS-2">
> But there seems to be a chicken and egg dilemma here. If
> I encode my whole XML stream as Unicode, then the parser
> will need to know that the stream is Unicode in order to be able
> to parse the declaration which tells it that it is Unicode.
> If the parser cannot figure out that the stream is Unicode, then
> it won't be able to read the declaration. But if it can recognize
> the Unicode, then the declaration would seem to be superfluous.
> How do systems handle this?
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT