Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

From: Edward H Trager (ehtrager@umich.edu)
Date: Mon Nov 04 2002 - 12:19:25 EST

Next message: Otto Stolz: "Re: In defense of Plane 14 language tags (long)"

Previous message: John Cowan: "Re: PRODUCING and DESCRIBING UTF-8 with and without BOM"
In reply to: Michael \(michka\) Kaplan: "Re: PRODUCING and DESCRIBING UTF-8 with and without BOM"
Next in thread: Doug Ewell: "Re: PRODUCING and DESCRIBING UTF-8 with and without BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hi, everyone,

It's almost unbelievable to me how many email postings are wasted on
discussions such as this UTF-8 BOM issue ... I guess it means that there
is a lot of BADLY WRITTEN software out there in the world ;-)

With regard to READING incoming UTF-8 text streams, surely any good
software designer will do exactly as Michael Michka has suggested here:

> INCOMING TEXT: Trivial to simply check. I say (once again) its THREE
> BYTES.

With regard to EMITTING outgoing UTF-8 text streams, IMHO the default
should be to do what is simplest, which is *not* output the BOM. It is
superfluous to have it on UTF-8 streams. There's no harm in having a
global option to turn BOM outputting on for the benefit of BRAIN-DEAD
programs that are going to read the text:

> EMITTING: They could simply choose globally whether to emit the BOM or not.
> If they wanted to get "fancy" they could have a command line option which
> said whether to emit the bytes or not. But that is optional.

The whole issue is analogous to the CR\LF issue in ASCII texts across
different platforms. Well-written software is able to READ the text
properly regardless of whether lines end in CR, LF, or CR\LF.

Next message: Otto Stolz: "Re: In defense of Plane 14 language tags (long)"
Previous message: John Cowan: "Re: PRODUCING and DESCRIBING UTF-8 with and without BOM"
In reply to: Michael \(michka\) Kaplan: "Re: PRODUCING and DESCRIBING UTF-8 with and without BOM"
Next in thread: Doug Ewell: "Re: PRODUCING and DESCRIBING UTF-8 with and without BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Nov 04 2002 - 12:49:44 EST