Re: Aw: Re: Re: Do you know a tool to decode "UTF-8 twice"

From: Frédéric Grosshans <frederic.grosshans_at_gmail.com>
Date: Wed, 30 Oct 2013 15:53:27 +0100

Le 30/10/2013 15:34, Frédéric Grosshans a écrit :
> Le 29/10/2013 17:15, "Jörg Knappen" a écrit :
>> After running this script, a few more things were there:
>> Non-normalised accents and some really strange
>> encodings I could not really explain but rather guess their meanings,
>> like
>> s/Ãœ/Ü/g
>> s/É/É/g
>> s/AÌ€/À/g
>> s/aÌ€/à/g
>> s/EÌ€/È/g
>> s/eÌ€/è/g
>> s/„/„/g
>> s/“/“/g
>> s/ß/ß/g
>> s/’/’/g
>> s/Ä/Æ/g
>
> It was probably not utf8 read as latin 1 and reencoded in utf8, but
> utf_8 encoding read as Windows 1252 (
> http://en.wikipedia.org/wiki/Windows-1252 ) and reencoded as utf-8.
> Each of the combination above contains a character absent in latin-1
> (œ‰€žŸ™„), and some of them are only present in Windows-1252 (‰™„) and
> not in Latin-15, the other possible mistake.
>
> I'v e check that this is consistent with Ü É and ß but not with your
> Æ. This double encoding would give Ä :
> Ä=Win1252(C3 84)=110.00011 10.000100 = UTF8(00011 000100)=unicode
> 00C4 =Ä (and not Æ)
>
I've also checked the other combiniations, including ̀ = U+0300
COMBINING GRAVE ACCENT and everything is consistent with Windows-1252,
except your Æ which should be Ä.

     Frédéric
Received on Wed Oct 30 2013 - 09:55:31 CDT

This archive was generated by hypermail 2.2.0 : Wed Oct 30 2013 - 09:55:31 CDT