Re: Names for UTF-8 with and without BOM

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Sat Nov 02 2002 - 20:13:39 EST

  • Next message: Doug Ewell: "Re: Names for UTF-8 with and without BOM"

    You are mistaken about this -- XML claimed originally that it was valid but
    was not required.

    The notion that XML parsers would update to handle a new encoding form to
    strip off three bytes but would not conditionally strip those three bytes if
    they were the first three bytes of the file is an unrealistic one.

    MichKa

    ----- Original Message -----
    From: "Tex Texin" <tex@i18nguy.com>
    To: "Michael (michka) Kaplan" <michka@trigeminal.com>
    Cc: "Mark Davis" <mark.davis@jtcsv.com>; <unicode@unicode.org>
    Sent: Saturday, November 02, 2002 11:08 AM
    Subject: Re: Names for UTF-8 with and without BOM

    > "Michael (michka) Kaplan" wrote:
    > > > .xml UTF-8N Some XML processors may not cope with BOM
    > >
    > > Maybe they need to upgrade? Since people often edit the files in
    notepad,
    > > many files are going to have it. A parser that cannot accept this
    reality is
    > > not going to make it very long.
    >
    > I didn't think the XML standard allowed for utf-8 files to have a BOM.
    > The standard is quite clear about requiring 0xFEFF for utf-16.
    > I would have thought a proper parser would reject a non-utf-16 file
    > beginning with something other than "<".
    >
    > (The fact that notepad puts it there should be irrelevant.)
    >
    > Am I wrong about XML and the utf-8 signature?
    >
    > tex
    >
    >
    > --
    > -------------------------------------------------------------
    > Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com
    > Xen Master http://www.i18nGuy.com
    >
    > XenCraft http://www.XenCraft.com
    > Making e-Business Work Around the World
    > -------------------------------------------------------------
    >
    >



    This archive was generated by hypermail 2.1.5 : Sat Nov 02 2002 - 20:42:37 EST