Re: cp1252 decoder implementation from Doug Ewell on 2012-11-16 (Unicode Mail List Archive)

From: Doug Ewell <doug_at_ewellic.org>
Date: Fri, 16 Nov 2012 17:11:12 -0700

Buck Golemon wrote:

> Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and
> to map it to the equally-non-semantic U+81 ?
>
> This would allow systems that follow the html5 standard and use cp1252
> in place of latin1 to continue to be binary-faithful and reversible.

This isn't quite as black-and-white as the question about Latin-1. If
you are targeting HTML5, you are probably safe in treating an incoming
0x81 (for example) as either U+0081 or U+FFFD, or throwing some kind of
error. HTML5 insists that you treat 8859-1 as if it were CP1252, so it
no longer matters what the byte is in 8859-1.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell

Received on Fri Nov 16 2012 - 18:12:18 CST

This archive was generated by hypermail 2.2.0 : Fri Nov 16 2012 - 18:12:19 CST