From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Sat Nov 02 2002 - 20:13:39 EST
You are mistaken about this -- XML claimed originally that it was valid but
was not required.
The notion that XML parsers would update to handle a new encoding form to
strip off three bytes but would not conditionally strip those three bytes if
they were the first three bytes of the file is an unrealistic one.
MichKa
----- Original Message -----
From: "Tex Texin" <tex@i18nguy.com>
To: "Michael (michka) Kaplan" <michka@trigeminal.com>
Cc: "Mark Davis" <mark.davis@jtcsv.com>; <unicode@unicode.org>
Sent: Saturday, November 02, 2002 11:08 AM
Subject: Re: Names for UTF-8 with and without BOM
> "Michael (michka) Kaplan" wrote:
> > > .xml UTF-8N Some XML processors may not cope with BOM
> >
> > Maybe they need to upgrade? Since people often edit the files in
notepad,
> > many files are going to have it. A parser that cannot accept this
reality is
> > not going to make it very long.
>
> I didn't think the XML standard allowed for utf-8 files to have a BOM.
> The standard is quite clear about requiring 0xFEFF for utf-16.
> I would have thought a proper parser would reject a non-utf-16 file
> beginning with something other than "<".
>
> (The fact that notepad puts it there should be irrelevant.)
>
> Am I wrong about XML and the utf-8 signature?
>
> tex
>
>
> --
> -------------------------------------------------------------
> Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com
> Xen Master http://www.i18nGuy.com
>
> XenCraft http://www.XenCraft.com
> Making e-Business Work Around the World
> -------------------------------------------------------------
>
>
This archive was generated by hypermail 2.1.5 : Sat Nov 02 2002 - 20:42:37 EST