Re: Using Unicode in XML

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Thu Jul 13 2000 - 12:17:20 EDT


Actually, the XML spec is very clear on this: it is handled through the use
of a BOM, to help the parser know that it is UTF-16 text.

If there is no BOM, then UTF-8 is assumed, unless the encoding tag is
present. However, the encoding tag is not required and parsers are not
required to support it.

In other words, a valid parser supports UTF-16 and UTF-8. If it does not, it
is not an XML parser.

You can see

http://www.w3.org/TR/REC-xml#charencoding

for more details.

michka

----- Original Message -----
From: "Paul Deuter" <Paul.Deuter@plumtree.com>
To: "Unicode List" <unicode@unicode.org>
Sent: Thursday, July 13, 2000 8:47 AM
Subject: Using Unicode in XML

> I know that XML can contain Unicode by using the declaration
>
> <?xl version="1.0" encoding="ISO-10646-UCS-2">
>
> But there seems to be a chicken and egg dilemma here. If
> I encode my whole XML stream as Unicode, then the parser
> will need to know that the stream is Unicode in order to be able
> to parse the declaration which tells it that it is Unicode.
>
> If the parser cannot figure out that the stream is Unicode, then
> it won't be able to read the declaration. But if it can recognize
> the Unicode, then the declaration would seem to be superfluous.
>
> How do systems handle this?
>
> Thanks,
> Paul
>
>
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT