Re: HTML5 encodings

From: verdy_p (
Date: Sun Dec 27 2009 - 21:27:33 CST

  • Next message: Asmus Freytag: "Re: Filtering and displaying untrusted UTF-8"

    "Asmus Freytag" wrote:
    > Incidentally, null-terminated anything (even ASCII or UTF-8) is one of
    > those cases where you need to read from the start to make sure you
    > haven't overrun the terminator.

    Null-terminated strings are legacies of the past that has caused too many problems: avoid using it or depending on
    it in any secure code.

    But given that nulls should not exist in any conforming plain-text or HTML or XML, you can avoid that problem by
    making sure that all your buffers are always properly null-padded up to their maximum size (any null in your buffer
    will then act as a guard, provided that your buffer is properly aligned with the byte stream transporting it, if it
    uses 16-bit or 32-bit code units).

    But buffer length needs to be tracked always (this will remain true if you use null-termination, or not using a
    separate counter for the effective length in your buffers).

    Some C++ libraries (e.g. in MS Visual C++) provide "safe" equivalents for legacies C APIs (using custom macros or
    inline function overloads that will add the buffer-length tracking, and that will rename the APIs to their safe
    version, including for scanf/printf-like APIs), or provide debugging alerts for programmers when such tracking
    cannot be deduced from the source code (due to insufficient datatyping of parameters). This works for most parts of
    existing source codes, or can be used with minimal efforts.

    It is much more difficult with existing legacy C code (unless you recompile it in C++ mode, with some modifications
    and retest all needed your modifications).

    Newer languages (notably those for VM environments, like .Net, Java, Parrot, ... or Lisp-like languages) require the
    use of buffer-length tracking, or implicitly delimit the character streams with some end-of-file/end-of-stream/end-
    of-list mechanisms): you normally cannot avoid this check with those languages (except when you explicitly write and
    use potentially unsafe mechanisms like JNI in Java, and only if the VM allows you to link with an external library
    that can't be deployed from within the language itself without starting a new VM instance with less strict

    But because now you code will possibily raise exceptions, its API will also need to be updated explicitly (a new
    contract will be made), and you'll need to test this contract to make sure that all code paths (including after
    exceptions) will be covered by your test and won't cause your app to fail completely with uncaught exceptions
    (notably in server applications, where such exception in a worker thread may cause your whole process to be
    affected, opening the way to easy DoS attacks).

    Uncaught exceptions must also be tested and handled gracefully without revealing secure/secret data elements in
    their displayed messages/results (such messages with detailed information useful for the debugging, should be
    preferably logged securely in a server file or to an administration console with a secured connection, and not
    presented directly to unauthorized clients or visitors of your service).

    PHP is midway: it is not completely safe in many of its supported legacy libraries (there's often no check at all in
    the OS-binding APIs, such as filesystem I/O, or in its SQL integration client libraries that still allow SQL request
    modifications when some parameters include some null : input parameter validation is required before processing or
    interfacing with external or underlying OSes and SQL engines). PHP still highly depends on the legacy string/stdio
    libraries for its basic functions (that's why only a small, slowly growing, tested part of its libraries is
    accessible when running it in "safe mode").


    This archive was generated by hypermail 2.1.5 : Sun Dec 27 2009 - 21:31:07 CST