Re: Using Unicode in XML

From: Michael \(michka\) Kaplan (
Date: Thu Jul 13 2000 - 12:17:20 EDT

Actually, the XML spec is very clear on this: it is handled through the use
of a BOM, to help the parser know that it is UTF-16 text.

If there is no BOM, then UTF-8 is assumed, unless the encoding tag is
present. However, the encoding tag is not required and parsers are not
required to support it.

In other words, a valid parser supports UTF-16 and UTF-8. If it does not, it
is not an XML parser.

You can see

for more details.


----- Original Message -----
From: "Paul Deuter" <>
To: "Unicode List" <>
Sent: Thursday, July 13, 2000 8:47 AM
Subject: Using Unicode in XML

> I know that XML can contain Unicode by using the declaration
> <?xl version="1.0" encoding="ISO-10646-UCS-2">
> But there seems to be a chicken and egg dilemma here. If
> I encode my whole XML stream as Unicode, then the parser
> will need to know that the stream is Unicode in order to be able
> to parse the declaration which tells it that it is Unicode.
> If the parser cannot figure out that the stream is Unicode, then
> it won't be able to read the declaration. But if it can recognize
> the Unicode, then the declaration would seem to be superfluous.
> How do systems handle this?
> Thanks,
> Paul

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT