From: Doug Ewell (firstname.lastname@example.org)
Date: Sat Feb 10 2007 - 02:55:25 CST
I'm looking for tips on automatically detecting text data in MS-DOS
CP437 (or 850, etc.) versus Latin-1 or Windows CP1252. It doesn't have
to be a perfect solution, but pretty good.
One problem is detecting text with the MS-DOS box-drawing characters,
many of which occupy the same code points as Latin-1 accented letters.
This means that simple range-checking often doesn't work.
Please send replies off-list unless you feel they would interest the
list. Please don't tell me this is anachronistic; I know it is. I'm
trying to migrate a lot of that anachronistic data to UTF-8, as
automatically as possible.
-- Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14 http://users.adelphia.net/~dewell/ http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages
This archive was generated by hypermail 2.1.5 : Sat Feb 10 2007 - 02:58:28 CST