Re: UTF-8 and UTF-16 issues

From: Tony Graham (tgraham@mulberrytech.com)
Date: Tue Jun 20 2000 - 12:20:41 EDT

Next message: Michael Kaplan (Trigeminal Inc.): "RE: UTF-8N?"
Previous message: Antoine Leca: "Re: UTF-8N?"
Maybe in reply to: OLeary, Sean (NJ): "UTF-8 and UTF-16 issues"
Next in thread: John Cowan: "Re: UTF-8 and UTF-16 issues"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

At 19 Jun 2000 19:03 -0800, Tony Graham wrote:
> According to Appendix F, Autodetection of Character Encodings
> (Non-Normative), beginning a parsed entity with the UTF-8 BOM counts
> as:
>
> other: UTF-8 without an encoding declaration, or else the data
> stream is corrupt, fragmentary, or enclosed in a wrapper of some
> kind

Oops. The XML Recommendation errata at
http://www.w3.org/XML/xml-19980210-errata#E44 changes the list of
significant byte patterns to include:

With a Byte Order Mark:
00 00 FE FF: UCS-4, big-endian machine (1234 order)
FF FE 00 00: UCS-4, little-endian machine (4321 order)
FE FF 00 ##: UTF-16, big-endian
FF FE ## 00: UTF-16, little-endian
EF BB BF: UTF-8

UTF-8 with the BOM is (non-normatively) okay according to the XML
Recommendation. Success with XML processors may vary, however, since
this wasn't decided until May 1999 or, it seems, added to the
published errata until January of this year.

Regards,

Tony Graham
======================================================================
Tony Graham mailto:tgraham@mulberrytech.com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9632
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Next message: Michael Kaplan (Trigeminal Inc.): "RE: UTF-8N?"
Previous message: Antoine Leca: "Re: UTF-8N?"
Maybe in reply to: OLeary, Sean (NJ): "UTF-8 and UTF-16 issues"
Next in thread: John Cowan: "Re: UTF-8 and UTF-16 issues"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT