RE: When to validate?

From: Lars Kristan (lars.kristan@hermes.si)
Date: Sat Dec 11 2004 - 04:17:07 CST

Next message: Marcin 'Qrczak' Kowalczyk: "Re: Nicest UTF"

Previous message: Clark Cox: "Re: US-ASCII (was: Re: Invalid UTF-8 sequences)"
Maybe in reply to: Arcane Jill: "When to validate?"
Next in thread: Lars Kristan: "RE: When to validate?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Andy Heninger wrote:
>
> Some important things in designing a function API are
>
> o Fully define what the behavior is. With a function like
> tolower(), you could leave malformed sequences unaltered;
> you could replace them with some substitution character;
> you could return or not return a separate error indication;
> or you can do anything else you can think of.
Another important thing in designing an API is to split functionality except
in cases where you cannot. An example of the latter is strcpy, which should
not be split into strlen plus memcpy. Unfortunately, it should return strlen
as it costs nothing and can be useful. Instead, strlen returns target
pointer, which is an input parameter and thus useless. But back to splitting
functionality. Making API functions too smart can be a benefit in short
term, because you don't need to bother with many parameters and/or proper
sequence of calling the functions. But it also causes performance
degradation because checks and validations are overdone. And, it prevents
the basic functionality from being used on its own. And, as by a rule, a
need for that arises sooner or later. Leading to creation of 'Ex' functions.
If API is extended as soon as the need arises, things remain manageable. If
not, users of the API start using workarounds, writing their own functions
and so on. The consequences are many and can be severe.

>
> Just don't choose "the behavior is undefined". And don't crash.
Undefined behavior is generally bad. But making it defined at a point you
don't have enough information is also not good. There are cases where a
function is able to process 'invalid' data and returns 'invalid' data, but
it does not crash. An example of that is conversion of a surrogate code
point from UTF-32 to UTF-8. Letting the function do that is not wrong. In
fact it can be desired. It can prove to be useful in a case you do not
perceive at the time you are writing the function. Actually, it often
happens that you don't even think about it. But the typical implementations
of the algorithms are such that it works. In this particular example, I
would say: do define the behavior, and the behavior should be to convert the
invalid data, do not validate and drop it.

> An application as a whole needs to validate external input that is
> alleged to be in some format, and ensure that any output that is
> promised to be in some format is indeed completely in that
> format. But
> this doesn't say anything at all about what individual
> library functions
> do or don't do.
But let's ask ourselves, what is an application, and what is a function. To
a developer in a team, an application can be a program that can be run. But
if that program is not run by a user, but rather by other programs in a
compelex product? The programmer will be tempted to validate all input and
output, causing the same problems we identified with functions: performance
degradation and potential problems with extending the funtionality.

And it goes beyond that. This complex product may also be just a brick in
LAN, WAN, WEB. Is it now more clear what I meant with "don't know where to
start and where to end"?

Lars

Next message: Marcin 'Qrczak' Kowalczyk: "Re: Nicest UTF"
Previous message: Clark Cox: "Re: US-ASCII (was: Re: Invalid UTF-8 sequences)"
Maybe in reply to: Arcane Jill: "When to validate?"
Next in thread: Lars Kristan: "RE: When to validate?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Dec 11 2004 - 04:20:28 CST