Re: UTF-8 to UTF-16LE

From: John Cowan (jcowan@reutershealth.com)
Date: Tue Jul 08 2003 - 10:17:05 EDT

  • Next message: Francois Yergeau: "RE: UTF-8 to UTF-16LE"

    Jon Hanna scripsit:

    > Not strictly true. The default encoding scheme's is UTF-8 *or* UTF-16LE *or*
    > UTF-16BE, it's trivial to tell which of these an XML document is in by
    > looking at the first few bytes, as described in Appendix F of the XML Spec
    > <http://www.w3.org/TR/REC-xml#sec-guessing>. You MUST accept all of these to
    > comply with the XML spec.

    Ahem. The names "UTF-16LE" and "UTF-16BE" refer to BOMless versions of the
    UTF-16 encoding, and may *not* be used in XML documents without an XML
    declaration. Nor are all XML parsers required to support them.

    XML parsers MUST support UTF-16, with a BOM and in either order, and UTF-8.
    All other encodings MUST be properly declared.

    (Bogusly IMHO, an HTTP Content-Type: header overrides this rule.)

    -- 
    "In my last lifetime,                           John Cowan
    I believed in reincarnation;                    http://www.ccil.org/~cowan
    in this lifetime,                               jcowan@reutershealth.com
    I don't."  --Thiagi                             http://www.reutershealth.com
    


    This archive was generated by hypermail 2.1.5 : Tue Jul 08 2003 - 11:03:58 EDT