From: Tex Texin (tex@i18nguy.com)
Date: Sat Nov 02 2002 - 17:24:17 EST
Thanks Doug. I had looked at the standard not at the appendix.
I think that (non-normative) appendix is unfortunate. It seems to imply
(to my mind) that if other character sets define BOMs that it is ok to
use them as XML signatures.
My reasoning is that the standard itself only says that UTF-16 must have
a signature and everything else except utf-8 must declare their
encoding. The standard doesn't say whether other encodings should or
should not be allowed to use signatures. The appendix F by defining the
other Unicode signatures implies they are acceptable (without
specifically stating so).
The text of the standard however doesn't suggest even that UCS-4 would
use a signature, as it doesn't include it with utf-16 when speaking
about it requiring a BOM, and specifically says the name of UCS-4 to use
in the declaration, as with other encodings.
However, that leaves open the question whether only the Unicode
transform signatures are acceptable or other signatures are also
allowed. So if a vendor defines a code page, and defines a signature
(perhaps mapping BOM/ZWNSP specifically to some code point or byte
string) does that then become acceptable?
Of course we hope not, and I am sure the authors did not intend so, but
without a statement about which signatures are allowed or not allowed
beyond UTF-16, I think the can of worms is opened.
OK, having raised the issue I'll take it up with the w3c i18n group to
get their understanding and then the xml group if needed.
tex
Doug Ewell wrote:
>
> Tex Texin <tex at i18nguy dot com> wrote:
>
> > I didn't think the XML standard allowed for utf-8 files to have a BOM.
> > The standard is quite clear about requiring 0xFEFF for utf-16.
> > I would have thought a proper parser would reject a non-utf-16 file
> > beginning with something other than "<".
>
> The standard explicitly allows UCS-4, UTF-16, and UTF-8 files to begin
> with a BOM. See Appendix F.1, "Detection Without External Encoding
> Information":
>
> http://www.w3.org/TR/REC-xml#sec-guessing
>
> -Doug Ewell
> Fullerton, California
-- ------------------------------------------------------------- Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com Xen Master http://www.i18nGuy.com XenCraft http://www.XenCraft.com Making e-Business Work Around the World -------------------------------------------------------------
This archive was generated by hypermail 2.1.5 : Sat Nov 02 2002 - 18:01:49 EST