detecting ill-formed UTF-8 (was RE: [question] UTF-8 issue)

From: Chris Weber (
Date: Sat Oct 10 2009 - 14:29:37 CDT

  • Next message: Satoshi Nakagawa: "Japanese text handling problem in Unicode Collation Algorithm"

    We have a runtime Web-application security testing and auditing tool called
    Watcher, available at It includes a
    check that detects ill-formed UTF-8 in HTTP/S-based Web-applications. It
    seems to be a rare occurrence in my experience, but when a Web-app does emit
    ill-formed UTF-8 it's usually from an interesting bug/root cause.


    After reading some of the responses here, I need to revisit this check and
    make sure it's detecting the surrogates. It's open source so if anyone
    happens to take a look and notice an error please let me know!


    - Chris Weber




    This archive was generated by hypermail 2.1.5 : Sat Oct 10 2009 - 14:34:41 CDT