From: Edward H Trager (ehtrager@umich.edu)
Date: Mon Nov 04 2002 - 12:19:25 EST
Hi, everyone,
It's almost unbelievable to me how many email postings are wasted on
discussions such as this UTF-8 BOM issue ... I guess it means that there
is a lot of BADLY WRITTEN software out there in the world ;-)
With regard to READING incoming UTF-8 text streams, surely any good
software designer will do exactly as Michael Michka has suggested here:
> INCOMING TEXT: Trivial to simply check. I say (once again) its THREE
> BYTES.
With regard to EMITTING outgoing UTF-8 text streams, IMHO the default
should be to do what is simplest, which is *not* output the BOM. It is
superfluous to have it on UTF-8 streams. There's no harm in having a
global option to turn BOM outputting on for the benefit of BRAIN-DEAD
programs that are going to read the text:
> EMITTING: They could simply choose globally whether to emit the BOM or not.
> If they wanted to get "fancy" they could have a command line option which
> said whether to emit the bytes or not. But that is optional.
The whole issue is analogous to the CR\LF issue in ASCII texts across
different platforms. Well-written software is able to READ the text
properly regardless of whether lines end in CR, LF, or CR\LF.
This archive was generated by hypermail 2.1.5 : Mon Nov 04 2002 - 12:49:44 EST