What does one do if the encoding is unknown and all you have is a sequence of bytes?

From: Costello, Roger L. <costello_at_mitre.org>
Date: Fri, 19 Jul 2013 17:51:07 +0000

Hi Folks,

Suppose that these hex bytes:

        C3 83 C2 B1

show up in a message and the message contains no hint what its encoding is.

Perhaps it is 8859-1, in which case the message consists of four 1-byte characters:

C3 = Ã
83 = the “no break here” character
C2 = Â
B1 = ±

Perhaps it is UTF-8, in which case the message consists of two 2-byte characters:

C383 = 쎃
C2B1 = 슱

Or, perhaps it is some other encoding.

What does one do in such a situation?

/Roger
Received on Fri Jul 19 2013 - 12:56:13 CDT

This archive was generated by hypermail 2.2.0 : Fri Jul 19 2013 - 12:56:15 CDT