Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

From: Edward H Trager (ehtrager@umich.edu)
Date: Mon Nov 04 2002 - 12:19:25 EST

  • Next message: Otto Stolz: "Re: In defense of Plane 14 language tags (long)"

    Hi, everyone,

    It's almost unbelievable to me how many email postings are wasted on
    discussions such as this UTF-8 BOM issue ... I guess it means that there
    is a lot of BADLY WRITTEN software out there in the world ;-)

    With regard to READING incoming UTF-8 text streams, surely any good
    software designer will do exactly as Michael Michka has suggested here:

    > INCOMING TEXT: Trivial to simply check. I say (once again) its THREE
    > BYTES.

    With regard to EMITTING outgoing UTF-8 text streams, IMHO the default
    should be to do what is simplest, which is *not* output the BOM. It is
    superfluous to have it on UTF-8 streams. There's no harm in having a
    global option to turn BOM outputting on for the benefit of BRAIN-DEAD
    programs that are going to read the text:

    > EMITTING: They could simply choose globally whether to emit the BOM or not.
    > If they wanted to get "fancy" they could have a command line option which
    > said whether to emit the bytes or not. But that is optional.

    The whole issue is analogous to the CR\LF issue in ASCII texts across
    different platforms. Well-written software is able to READ the text
    properly regardless of whether lines end in CR, LF, or CR\LF.



    This archive was generated by hypermail 2.1.5 : Mon Nov 04 2002 - 12:49:44 EST