Re: illegal UTF-8 sequences and mbtowc()

From: John Cowan (
Date: Fri Dec 10 1999 - 10:14:05 EST

"Martin J. Duerst" wrote:

> I'm late to reply to this, but I think it is a very
> dangerous proposal. It has a well-known acronym:
> GIGO (garbage in, garbage out). The more data is
> exchanged between all kinds of components of the
> Internet and Web infrastructure without human invention,
> the higher the danger that it will be impossible
> to figure out where the data came from, what it
> was supposed to be, and where the error happened.

I recognize that this comes from the "HTML nightmare" experience
that the Web has been undergoing in the last few years. However,
I cannot agree that the appropriate thing for an editor or other
text-processing tool, on finding an encoding error, is simply
to refuse to process the text at all.


Schlingt dreifach einen Kreis vom dies! || John Cowan <> Schliesst euer Aug vor heiliger Schau, || Denn er genoss vom Honig-Tau, || Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT