Well, Gary, if only all were that well.
In ISO 10646 view, there is no need for any "BOM", or "signature"
as it is called in an informative annex to 10646, at all. UCS-2,
UCS-4, and UTF-16, *when* serialised into bytes, all *must* be
serialised in big-endian order. That would be the end of story
if it weren't for that wretched annex (and Unicode...). The annex
*allows* for the use "signatures" for all of the encoding forms
(UCS-4, UCS-2, UTF-16, *and* UTF-8). It also says that the "signature"
("BOM" in Unicode terminology) *can* be used to correct an erroneous/
non-conforming byte order (but you are in no way required to;
terminating with an error message is quite ok when detecting a
non-big-endian order). The "signature" is not at all intended
to be about byte order in 10646, it is only to give a strong
hint about which encoding form is used (all big-endian).
And then there is Unicode that allows for both big- and little-
endian byte serialisation, as well as some applications that put a
"signature", in full conformity with 10646, also on UTF-8 encoded
> -----Original Message-----
> From: Gary L. Wade [mailto:firstname.lastname@example.org]
> Sent: Thursday, June 22, 2000 6:08 PM
> To: Unicode List
> Subject: UTF-8 BOM Nonsense
> After hundreds of e-mails on this topic, let it die!
> The BOM is only useful with UTF-16 or UCS-4 characters.
> There is no such thing as byte ordering when each character
> is a byte or
> a multibyte sequence with a well-documented ordering denoting how to
> interpret this! For further reference, turn to page 20 in the Unicode
> 3.0 book and let us get back to more important things, such as how to
> represent the price of tea in China! ;-)
> Gary L. Wade
> Product Development Consultant
> DesiSoft Systems | Voice: 214-642-6883
> 9619 E. Valley Ranch Parkway | Fax: 972-506-7478
> Suite 2125 | E-Mail: email@example.com
> Irving, TX 75063 |
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT