Character set conversion question

From: Leo Broukhis (
Date: Tue Jun 16 2009 - 17:04:14 CDT

  • Next message: Charlie Ruland: "Re: [unicode] Unihan database: kCangjie field"


    This may be not so much a (not "az") Unicode question as a general
    computational math question:

    Some time ago I wrote - in UTF-8 encoded Russian - to an old
    acquaintance of mine.
    His response was - still in Russian - denoted as iso-8859-1, and
    contained an illegible combination of characters
    that stumped all online Cyrillic decoders. These decoders usually try
    several conversions among utf-8, iso-8859-1, koi8-r,
    cp1251, cp866, mac cyrillic, and iso-8859-5, but they failed in my case.
    Indeed, the set of byte values in the message did not fit into the the
    value range occupied by the Cyrillic letters in any single encoding:
    the word "Привет" ("Hello", U+041F 0440 0438 0432 0435 0442) became
    "è•Ë’ÂÚ" ( E8 95 CB 92 C2 DA).

    Luckily, he quoted my original message, and I was able to decrypt his
    response by simple search-and-replace letter by letter, without
    resorting to
    letter frequency cryptanalytics.

    What would be a way to find out what character set conversions were
    applied to the text?



    This archive was generated by hypermail 2.1.5 : Tue Jun 16 2009 - 17:07:05 CDT