Re: UTF-8 to UTF-16LE

From: John Cowan (
Date: Tue Jul 08 2003 - 10:17:05 EDT

  • Next message: Francois Yergeau: "RE: UTF-8 to UTF-16LE"

    Jon Hanna scripsit:

    > Not strictly true. The default encoding scheme's is UTF-8 *or* UTF-16LE *or*
    > UTF-16BE, it's trivial to tell which of these an XML document is in by
    > looking at the first few bytes, as described in Appendix F of the XML Spec
    > <>. You MUST accept all of these to
    > comply with the XML spec.

    Ahem. The names "UTF-16LE" and "UTF-16BE" refer to BOMless versions of the
    UTF-16 encoding, and may *not* be used in XML documents without an XML
    declaration. Nor are all XML parsers required to support them.

    XML parsers MUST support UTF-16, with a BOM and in either order, and UTF-8.
    All other encodings MUST be properly declared.

    (Bogusly IMHO, an HTTP Content-Type: header overrides this rule.)

    "In my last lifetime,                           John Cowan
    I believed in reincarnation;          
    in this lifetime,                     
    I don't."  --Thiagi                   

    This archive was generated by hypermail 2.1.5 : Tue Jul 08 2003 - 11:03:58 EDT