Re: cp1252 decoder implementation

From: <martin_at_v.loewis.de>
Date: Sat, 17 Nov 2012 01:13:15 +0100

Zitat von Buck Golemon <buck_at_yelp.com>:

> cp1252 (aka windows-1252) defines 27 characters which iso-8859-1 does not.
> This leaves five bytes with undefined semantics.
>
> Currently the python cp1252 decoder allows us to ignore/replace/error on
> these bytes, but there's no facility for allowing these unknown bytes to
> round-trip through the codec, as the latin1 codec does.

That's not true: there are actually *two* facilities that allow exactly that.
1. you can write a new codec which round-trips these bytes through
some characters,
    or
2. you can write an error handler that does such round-tripping. The
    surrogate-escape error handler was specifically designed to allow such
    round-tripping, see http://www.python.org/dev/peps/pep-0383/
    (not just for this codec, but for any codec).

Regards,
Martin
Received on Fri Nov 16 2012 - 18:14:54 CST

This archive was generated by hypermail 2.2.0 : Fri Nov 16 2012 - 18:14:54 CST