Re: HTML5 encodings

From: verdy_p (verdy_p@wanadoo.fr)
Date: Sun Dec 27 2009 - 21:27:33 CST

Next message: Asmus Freytag: "Re: Filtering and displaying untrusted UTF-8"

Previous message: verdy_p: "re: Filtering and displaying untrusted UTF-8"
In reply to: Asmus Freytag: "Re: HTML5 encodings"
Next in thread: Doug Ewell: "Re: HTML5 encodings"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

"Asmus Freytag" wrote:
> Incidentally, null-terminated anything (even ASCII or UTF-8) is one of
> those cases where you need to read from the start to make sure you
> haven't overrun the terminator.

Null-terminated strings are legacies of the past that has caused too many problems: avoid using it or depending on
it in any secure code.

But given that nulls should not exist in any conforming plain-text or HTML or XML, you can avoid that problem by
making sure that all your buffers are always properly null-padded up to their maximum size (any null in your buffer
will then act as a guard, provided that your buffer is properly aligned with the byte stream transporting it, if it
uses 16-bit or 32-bit code units).

But buffer length needs to be tracked always (this will remain true if you use null-termination, or not using a
separate counter for the effective length in your buffers).

Some C++ libraries (e.g. in MS Visual C++) provide "safe" equivalents for legacies C APIs (using custom macros or
inline function overloads that will add the buffer-length tracking, and that will rename the APIs to their safe
version, including for scanf/printf-like APIs), or provide debugging alerts for programmers when such tracking
cannot be deduced from the source code (due to insufficient datatyping of parameters). This works for most parts of
existing source codes, or can be used with minimal efforts.

It is much more difficult with existing legacy C code (unless you recompile it in C++ mode, with some modifications
and retest all needed your modifications).

Newer languages (notably those for VM environments, like .Net, Java, Parrot, ... or Lisp-like languages) require the
use of buffer-length tracking, or implicitly delimit the character streams with some end-of-file/end-of-stream/end-
of-list mechanisms): you normally cannot avoid this check with those languages (except when you explicitly write and
use potentially unsafe mechanisms like JNI in Java, and only if the VM allows you to link with an external library
that can't be deployed from within the language itself without starting a new VM instance with less strict
security).

But because now you code will possibily raise exceptions, its API will also need to be updated explicitly (a new
contract will be made), and you'll need to test this contract to make sure that all code paths (including after
exceptions) will be covered by your test and won't cause your app to fail completely with uncaught exceptions
(notably in server applications, where such exception in a worker thread may cause your whole process to be
affected, opening the way to easy DoS attacks).

Uncaught exceptions must also be tested and handled gracefully without revealing secure/secret data elements in
their displayed messages/results (such messages with detailed information useful for the debugging, should be
preferably logged securely in a server file or to an administration console with a secured connection, and not
presented directly to unauthorized clients or visitors of your service).

PHP is midway: it is not completely safe in many of its supported legacy libraries (there's often no check at all in
the OS-binding APIs, such as filesystem I/O, or in its SQL integration client libraries that still allow SQL request
modifications when some parameters include some null : input parameter validation is required before processing or
interfacing with external or underlying OSes and SQL engines). PHP still highly depends on the legacy string/stdio
libraries for its basic functions (that's why only a small, slowly growing, tested part of its libraries is
accessible when running it in "safe mode").

Philippe.

Next message: Asmus Freytag: "Re: Filtering and displaying untrusted UTF-8"
Previous message: verdy_p: "re: Filtering and displaying untrusted UTF-8"
In reply to: Asmus Freytag: "Re: HTML5 encodings"
Next in thread: Doug Ewell: "Re: HTML5 encodings"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Dec 27 2009 - 21:31:07 CST