Re: encoding checker

From: Ben Dougall (bend@freenet.co.uk)
Date: Tue May 13 2003 - 08:19:29 EDT

  • Next message: Gary P. Grosso: "how to sort by stroke (not radical/stroke)"

    > All I need is one thing:
    >
    > What I actually look for is a way to check files about the encoding
    > they are
    > encoded in. Is there a SW that just tells me: This text is encoded in
    > UTF8,
    > ASCII, UCS2 or whatever?

    i would also like to do the same thing, so if you find any more useful
    info than what i point to here, i'd really appreciate it if you could
    let me know about it :

    have a look at the very recent thread on this list, in the archives:
    "suggestions for strategy on dealing with plain text in potentially any
    (unspecified) encoding?" there's a lot of useful stuff in that.

    basically nearly all text encodings just go ahead and use their
    encoding without stating "i'm 7bit ascii" or whatever, first. (even
    unicode, when it doesn't use a bom). so, often the required info simply
    isn't there. some html, most(maybe all) xml, some unicode(via a bom)
    and most(maybe all) emails have information to which encoding is being
    used.

    so it seems if anything is going to tell you explicitly which encoding
    is being used, it's going to be the text format rather than the
    encoding itself (apart from unicode and it's boms). if the text or the
    encoding itself does not specify the encoding, i don't think there is
    any absolute, sure way to find out. but there are various methods to
    make good, educated guesses (see the thread i mentioned).

    also someone on this list pointed me to this which you might find
    useful:
    <http://www.mlmassociates.cc/dl-win32.htm>
    Dcpcmd is a command line program that illustrates using the Windows
    IMultiLanguage interface to detect a code page. Several sample text
    files
    are provided.



    This archive was generated by hypermail 2.1.5 : Tue May 13 2003 - 09:28:30 EDT