Re: Parsing Unicode strings

From: Petite Abeille (petite.abeille@gmail.com)
Date: Wed May 28 2008 - 16:52:30 CDT

  • Next message: Asmus Freytag: "Re: Parsing Unicode strings"

    On May 28, 2008, at 10:49 PM, Peter Johansson wrote:

    > Is the Unicode-encoded character string self-descriptive?

    No.

    > That is, do I need a priori knowledge that it is encoded as, for
    > example, UTF-8 rather than UTF-32?

    Yes.

    > Or, by examining the first byte (or first few bytes) can I determine
    > the encoding?

    Not really, but...

    "Encoding Detector"
    http://chardet.feedparser.org/docs/faq.html

    "A composite approach to language/encoding detection"
    http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html

    --
    PA.
    http://alt.textdrive.com/nanoki/
    


    This archive was generated by hypermail 2.1.5 : Wed May 28 2008 - 16:55:45 CDT