From: John Cowan (jcowan@reutershealth.com)
Date: Tue Jul 08 2003 - 10:40:51 EDT
Philippe Verdy scripsit:
> - UTF-32: with a recommanded byte order mark (00,00,FE,FF or FF,FE,00,00)
UTF-32 requires an XML declaration (always assuming there is no MIME header
in scope), even though it is easy to autodetect.
> With UTF16-BE, UTF16-LE, UTF-32BE, UTF-32LE, the encoding scheme can
> be ambiguous with legal UTF-8!
In fact no, because all of these schemes require an 0x00 byte somewhere
in the first four bytes (because the first character in an XML document
must be less than U+00FF, specifically either < or whitespace), and
that represents U+0000 in UTF-8, a character which cannot occur in
well-formed XML. No ambiguity is possible, but the XML Rec makes this
a well-formedness error anyway.
> However the last two planes 0x0F and 0x10 are
> private, and should not be used in XML,
It is not inappropriate to use the Private Use planes in XML, provided
you have an agreement in place with the recipient as to their meaning.
Not all XML documents are meant to be interchanged blind. Far from it, as
the private said when he missed the target and hit the gunnery instructor.
> Most Unicode-compliant softwares however store and manage strings directly
> in their UTF-16 encoding form
There is plenty of software that uses UTF-8 internally as well.
-- John Cowan jcowan@reutershealth.com www.reutershealth.com www.ccil.org/~cowan I am he that buries his friends alive and drowns them and draws them alive again from the water. I came from the end of a bag, but no bag went over me. I am the friend of bears and the guest of eagles. I am Ringwinner and Luckwearer; and I am Barrel-rider. --Bilbo to Smaug
This archive was generated by hypermail 2.1.5 : Tue Jul 08 2003 - 11:44:22 EDT