RE: cp1252 decoder implementation from Shawn Steele on 2012-11-16 (Unicode Mail List Archive)

From: Shawn Steele <Shawn.Steele_at_microsoft.com>
Date: Sat, 17 Nov 2012 00:28:02 +0000

People really should be using UTF-8 or something else :) IMO these are legacy encodings and should be deprecated.

-Shawn

-----Original Message-----
From: unicode-bounce_at_unicode.org [mailto:unicode-bounce_at_unicode.org] On Behalf Of Doug Ewell
Sent: Friday, November 16, 2012 4:11 PM
To: Buck Golemon; unicode
Subject: Re: cp1252 decoder implementation

Buck Golemon wrote:

> Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and
> to map it to the equally-non-semantic U+81 ?
>
> This would allow systems that follow the html5 standard and use cp1252
> in place of latin1 to continue to be binary-faithful and reversible.

This isn't quite as black-and-white as the question about Latin-1. If you are targeting HTML5, you are probably safe in treating an incoming
0x81 (for example) as either U+0081 or U+FFFD, or throwing some kind of error. HTML5 insists that you treat 8859-1 as if it were CP1252, so it no longer matters what the byte is in 8859-1.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell

Received on Fri Nov 16 2012 - 18:30:04 CST

This archive was generated by hypermail 2.2.0 : Fri Nov 16 2012 - 18:30:04 CST