Re: Names for UTF-8 with and without BOM

From: Mark Davis (mark.davis@jtcsv.com)
Date: Sun Nov 03 2002 - 19:41:38 EST

  • Next message: Marco Cimarosti: "RE: Header Reply-To"

    > So even if it were in there, who cares? I mean, can anyone explain why it
    > would make a difference?

    I personally wouldn't care if every instance of "Michael Kaplan" at the
    start of a file were deleted. Not the point.

    The actual point is that currently, as defined -- not as you would wish for
    it to be, the FEFF is an actual character, and in circumstances where it is
    not clearly defined for use as a BOM, it cannot be removed without altering
    the content of the text.

    As I said in another message, the UTC could change this situation by
    completely deprecating the use of FEFF as anything but BOM. But it hasn't
    done it yet.

    Mark
    __________________________________
    http://www.macchiato.com
    ► “Eppur si muove” ◄

    ----- Original Message -----
    From: "Michael (michka) Kaplan" <michka@trigeminal.com>
    To: "Mark Davis" <mark.davis@jtcsv.com>; "Unicode Mailing List"
    <unicode@unicode.org>
    Sent: Sunday, November 03, 2002 13:02
    Subject: Re: Names for UTF-8 with and without BOM

    > From: "Mark Davis" <mark.davis@jtcsv.com>
    >
    > Ironic that for the purpose of dealing with THREE bytes that so many bytes
    > are being wasted. :-)
    >
    > > Little probability that right double quote would appear at the start of
    a
    > > document either. Doesn't mean that you are free to delete it (*and* say
    > that
    > > you are not modifying the contents).
    >
    > Interesting strawman there, Mark -- but there is a huge difference there.
    > But even if we leave in the notion of it as a character and just deprecate
    > its usage and people ignore that, then we are talking about a ZERO WIDTH
    NO
    > BREAK SPACE. This character has the job of:
    >
    > 1) being invisible
    > 2) not breaking text with it
    >
    > So even if it were in there, who cares? I mean, can anyone explain why it
    > would make a difference?
    >
    > The one thing that no one has ever come up with is a reasonable case where
    > it would be at the beginning of the document *yet* it was not a BOM.
    >
    > So we have a clear semantic for it at the beginning of a file -- its a
    BOM.
    > Period.
    >
    > If there is a higher level protocol as well and the protocol and the BOM
    > both match, then that is great! Considering how much redundancy there is
    in
    > the Unicode standard about some definitions, a redundant marker for a file
    > seems a very trivial issue.
    >
    > If there is a higher level protocol as well and they do not match, then we
    > are in fantasy land bizarro world, inventing edge cases because we have
    > nothing better to do. :-) But for the sake of argument, lets pretend its
    a
    > real scenario -- in which case we treat it the same way as if your higher
    > level protocol claims its ISO-8859-1 and the BOM says its UTF-32. Its an
    > error.
    >
    > Problem solved!
    >
    > > I agree that when the UTC decides that a BOM is *only* to be used as a
    > > signature, and that it would be ok to delete it anywhere in a document
    > (like
    > > a non-character), then we are in much better shape. This was, as a
    matter
    > of
    > > fact proposed for 3.2, but not approved. If we did that for 4.0, then
    > there
    > > would be much less reason to distinguish UTF-8 'withBOM' from UTF-8
    > > 'withoutBOM'.
    >
    > There is no reason to worry about this case and no need to delete
    anything.
    > This is a ZERO WIDTH NO BREAK SPACE we are talking about. The burden is on
    > the people who think this is a scenario to bring proof that anyone is
    doing
    > anything as unrealistic as this.
    >
    > There is an easy, clear, and unambigous plan that can be used here which
    > will always work. For ones lets not opt to complicate it without reason.
    >
    > MichKa
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Sun Nov 03 2002 - 20:22:59 EST