Re: Character set conversion question

From: Bjoern Hoehrmann (derhoermi@gmx.net)
Date: Tue Jun 16 2009 - 17:38:20 CDT

Next message: Charlie Ruland: "Re: [unicode] Unihan database: kCangjie field"

Previous message: Charlie Ruland: "Re: [unicode] Unihan database: kCangjie field"
In reply to: Leo Broukhis: "Character set conversion question"
Next in thread: Leo Broukhis: "Re: Character set conversion question"
Reply: Leo Broukhis: "Re: Character set conversion question"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

* Leo Broukhis wrote:
>What would be a way to find out what character set conversions were
>applied to the text?

Where the brute force approach fails and you have not misanalyzed the
byte stream (copy and paste from a mail program may be unreliable) it
is likely that you either have not tried enough encodings, or the en-
coding is the result of function composition, for example, it might
have been ISO-8859-X which is then interpreted as ISO-8859-Y and then
encoded using ISO-8859-Z by some process; a popular example is UTF-8
encoded data re-interpreted as ISO-8859-1 and re-encoded as UTF-8.
Then your brute force search has to include such compositions aswell.

-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Next message: Charlie Ruland: "Re: [unicode] Unihan database: kCangjie field"
Previous message: Charlie Ruland: "Re: [unicode] Unihan database: kCangjie field"
In reply to: Leo Broukhis: "Character set conversion question"
Next in thread: Leo Broukhis: "Re: Character set conversion question"
Reply: Leo Broukhis: "Re: Character set conversion question"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Jun 16 2009 - 17:40:56 CDT