RE: When to validate?

From: Lars Kristan (lars.kristan@hermes.si)
Date: Sat Dec 11 2004 - 04:17:07 CST

  • Next message: Marcin 'Qrczak' Kowalczyk: "Re: Nicest UTF"

    Andy Heninger wrote:
    >
    > Some important things in designing a function API are
    >
    > o Fully define what the behavior is. With a function like
    > tolower(), you could leave malformed sequences unaltered;
    > you could replace them with some substitution character;
    > you could return or not return a separate error indication;
    > or you can do anything else you can think of.
    Another important thing in designing an API is to split functionality except
    in cases where you cannot. An example of the latter is strcpy, which should
    not be split into strlen plus memcpy. Unfortunately, it should return strlen
    as it costs nothing and can be useful. Instead, strlen returns target
    pointer, which is an input parameter and thus useless. But back to splitting
    functionality. Making API functions too smart can be a benefit in short
    term, because you don't need to bother with many parameters and/or proper
    sequence of calling the functions. But it also causes performance
    degradation because checks and validations are overdone. And, it prevents
    the basic functionality from being used on its own. And, as by a rule, a
    need for that arises sooner or later. Leading to creation of 'Ex' functions.
    If API is extended as soon as the need arises, things remain manageable. If
    not, users of the API start using workarounds, writing their own functions
    and so on. The consequences are many and can be severe.

    >
    > Just don't choose "the behavior is undefined". And don't crash.
    Undefined behavior is generally bad. But making it defined at a point you
    don't have enough information is also not good. There are cases where a
    function is able to process 'invalid' data and returns 'invalid' data, but
    it does not crash. An example of that is conversion of a surrogate code
    point from UTF-32 to UTF-8. Letting the function do that is not wrong. In
    fact it can be desired. It can prove to be useful in a case you do not
    perceive at the time you are writing the function. Actually, it often
    happens that you don't even think about it. But the typical implementations
    of the algorithms are such that it works. In this particular example, I
    would say: do define the behavior, and the behavior should be to convert the
    invalid data, do not validate and drop it.

    > An application as a whole needs to validate external input that is
    > alleged to be in some format, and ensure that any output that is
    > promised to be in some format is indeed completely in that
    > format. But
    > this doesn't say anything at all about what individual
    > library functions
    > do or don't do.
    But let's ask ourselves, what is an application, and what is a function. To
    a developer in a team, an application can be a program that can be run. But
    if that program is not run by a user, but rather by other programs in a
    compelex product? The programmer will be tempted to validate all input and
    output, causing the same problems we identified with functions: performance
    degradation and potential problems with extending the funtionality.

    And it goes beyond that. This complex product may also be just a brick in
    LAN, WAN, WEB. Is it now more clear what I meant with "don't know where to
    start and where to end"?

    Lars



    This archive was generated by hypermail 2.1.5 : Sat Dec 11 2004 - 04:20:28 CST