RE: Detecting encoding in Plain text

From: Chris Pratley (chrispr@Exchange.Microsoft.com)
Date: Thu Jan 08 2004 - 16:45:47 EST

  • Next message: Hausmann, Michael: "unsubscribe - mhausmann@bridgew.edu"

    If you are on the Windows platform, look at mlang.dll, and at the
    IMultiLanguage2 and IMultiLanguage3 APIs, which provide this service. As
    others have noted you will get false detections with too little or
    ambiguous data, but you may be quite surprised at just how accurate this
    detection is (sometimes just one character outside of the "ASCII"
    repertoire), since there is language frequency data used as well as
    merely encoding rules.

    Chris

    -----Original Message-----
    From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On
    Behalf Of Brijesh Sharma
    Sent: January 8, 2004 3:08 AM
    To: Unicode Mailing List
    Subject: Detecting encoding in Plain text

    Hi All,
    I am new to Unicode.
    I writing a small tool to get text from a txt file into a edit box.
    Now this txt file could be in any encoding for eg(UTF-8,UTF-16,Mac
    Roman,Windows ANSI,Western (ISO-8859-1),JIS,Shift-JIS etc)
    My problem is that I can distinguish between UTF-8 or UTF-16 using the
    BOM.
    But how do I auto detect the others.
    Any kind of help will be appreciated.
     

    Regards
    Brijesh Sharma

    "You're not obligated to win. You're obligated to keep trying to do the
    best
    you can every day."



    This archive was generated by hypermail 2.1.5 : Thu Jan 08 2004 - 17:27:52 EST