Re: illegal UTF-8 sequences and mbtowc()

From: Martin J. Duerst ([email protected])
Date: Wed Dec 08 1999 - 15:48:58 EST

Next message: Deborah Goldsmith: "Re: Unicode support on Macintosh?"
Previous message: Yung-Fong Tang: "Re: Unicode support on Macintosh?"
Next in thread: John Cowan: "Re: illegal UTF-8 sequences and mbtowc()"
Reply: John Cowan: "Re: illegal UTF-8 sequences and mbtowc()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I'm late to reply to this, but I think it is a very
dangerous proposal. It has a well-known acronym:
GIGO (garbage in, garbage out). The more data is
exchanged between all kinds of components of the
Internet and Web infrastructure without human invention,
the higher the danger that it will be impossible
to figure out where the data came from, what it
was supposed to be, and where the error happened.

Therefore, early error detection is very important!

Regards, Martin.

At 11:48 1999/10/29 -0700, John Cowan wrote:
> Markus Kuhn wrote:
>
> > There is however a simple way out of this:
> >
> > The C library could implement the mbtowc() UTF-8 decoder, such that it
> > *NEVER* returns -1 to signal that it encountered a malformed sequence.
> > It could by convention just treat every malformed (and overlong) UTF-8
> > sequence just like a valid encoding of the REPLACEMENT CHARACTER.
>
> This is almost exactly what the Plan 9 implementation does, except that it uses
> a different character, on the grounds that an encoding error is not the same as
> an unrepresentable character (the higher-level recovery strategy, if any,
> is different). The implementers' specific choice was the (basically)
> unused control character U+0080.
>
> --
>
> John Cowan http://www.reutershealth.com [email protected]
> Schlingt dreifach einen Kreis vom dies / Schliess eurer Aug vor heiliger Schau
> Den er genoss vom Honig-Tau / Und trank die Milch vom Paradies.
> -- Coleridge (tr. Politzer)
>
>

#-#-# Martin J. Du"rst, World Wide Web Consortium
#-#-# mailto:[email protected] http://www.w3.org

Next message: Deborah Goldsmith: "Re: Unicode support on Macintosh?"
Previous message: Yung-Fong Tang: "Re: Unicode support on Macintosh?"
Next in thread: John Cowan: "Re: illegal UTF-8 sequences and mbtowc()"
Reply: John Cowan: "Re: illegal UTF-8 sequences and mbtowc()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT