Re: Names for UTF-8 with and without BOM

From: Mark Davis (mark.davis@jtcsv.com)
Date: Sun Nov 03 2002 - 15:29:32 EST

  • Next message: Mark Davis: "Re: Header Reply-To"

    I don't know what you are trying to say. Perhaps you could explain it at the
    meeting next week.

    Mark
    __________________________________
    http://www.macchiato.com
    ► “Eppur si muove” ◄

    ----- Original Message -----
    From: "Michael (michka) Kaplan" <michka@trigeminal.com>
    To: "Mark Davis" <mark.davis@jtcsv.com>; "Murray Sargent"
    <murrays@exchange.microsoft.com>; "Joseph Boyle" <Boyle@siebel.com>
    Cc: <unicode@unicode.org>
    Sent: Saturday, November 02, 2002 04:18
    Subject: Re: Names for UTF-8 with and without BOM

    > From: "Mark Davis" <mark.davis@jtcsv.com>
    >
    > > That is not sufficient. The first three bytes could represent a real
    > content
    > > character, ZWNBSP or they could be a BOM. The label doesn't tell you.
    >
    > There are several problems with this supposition -- most notably the fact
    > that there are cases that specifically claim this is not recommended and
    > that U+2060 is prefered?
    >
    > > This is similar to UTF-16 CES vs UTF-16BE CES. In the first case, 0xFE
    > 0xFF
    > > represents a BOM, and is not part of the content. In the second case, it
    > > does *not* represent a BOM -- it represents a ZWNBSP, and must not be
    > > stripped. The difference here is that the encoding name tells you
    exactly
    > > what the situation is.
    >
    > I do not see this as a realistic scenario. I would argue that if the BOM
    > matches the encoding scheme, perhaps this was an intentional effort to
    make
    > sure that applications which may not understand the higher level protocol
    > can also see what the encoding scheme is.
    >
    > But even if we assume that someone has gone to the trouble of calling
    > something UTF16BE and has 0xFE 0xFF at the beginning of the file. What
    kind
    > of content *is* such a code point that this is even worth calling out as a
    > special case?
    >
    > If the goal is to clear and unambiguous text then the best way would to
    > simplify ALL of this. It was previously decided to always call it a BOM,
    why
    > not stick with that?
    >
    > MichKa
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Sun Nov 03 2002 - 15:59:53 EST