Journal Articles
Browse in : |
All
> Journals
> CVu
> 134
(6)
All > Journal Columns > Professionalism (40) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Professionalism in Programming #9
Author: Administrator
Date: 06 August 2001 13:15:46 +01:00 or Mon, 06 August 2001 13:15:46 +01:00
Summary:
Defensive Programming
Body:
There is a huge difference between working code, correct code and good code. M. A. Jackson wrote "The beginning of wisdom for a software engineer is to recognise the difference between getting a program to work, and getting it right." [Jackson] You can write code that 'works' most of the time. You feed it the usual set of inputs, it gives the usual set of outputs. But give it something surprising and it might just fall over. 'Correct' code won't fall over. For all possible sets of inputs the output will be correct[1]. However, not all correct code is 'good' code - the logic may be hard to follow, the code may be contrived, it may be practically impossible to maintain.
Good code is what we should aim for. It is robust, efficient enough, and of course correct. Industrial strength code will not crash or produce incorrect results when given unusual inputs. It will also satisfy other requirements such as thread safety, timing constraints and re-entrancy.
There are many methodologies for producing code: various object-orientated approaches, component based models, structured design, VDM and so on. Defensive programming is an approach that can be applied to any of these. It's not so much a formal methodology as a set of rules of thumb. It is one way to ensure that our code is good.
As the name suggests, defensive programming is careful, guarded programming. When we write code it is all too easy to make a set of assumptions about how it should run, how it will be called, etc. We then carry on our work using this set of assumptions for months, as they fade and distort in our minds.
When we program defensively, we don't make any assumptions. We never assume "it can't happen" (I won't ever be called like that, this piece of code will always work...). Experience tells us that the only thing you can be certain about is that your code will somehow, one day, go wrong. Murphy's law says, "if it can go wrong, it will". Murphy was a wise man. So defensive programming prevents accidents by foreseeing them, or at least foreguessing them - figuring out what can go wrong at each stage in the code.
Is this paranoid? Well, it doesn't hurt to be a little paranoid. In fact, it makes a lot of sense. You will forget the set of assumptions you make as your code evolves (real code does evolve). Other programmers won't have any knowledge of the assumptions in your head, or will just make invalid assumptions about what your code can do. Software evolution exposes weaknesses. Code growth hides original simple assumptions. A little paranoia at the outset can make code a lot more robust in the long run.
Add to this the fact that things can go wrong that neither you nor your users have any control over at all, disks get full, networks fail, computers crash. Remember, it's never actually the software that 'fails' - the software always does what you told it to. It's the actual algorithms or client code that introduces faults in the system. If we write code defensively we attempt to prevent, or at the very least observe when our code is called in such as way as to exhibit incorrect behaviour.
There are a number of techniques we can follow to apply the defensive programming approach to our code. Bear in mind these are all much easier to apply as you start writing code, rather than retrofit onto existent code. The first set we'll think about do not relate so much to the assumptions we make as much as they are just good practise for writing robust code. They are ways of avoiding the possibility of future problems.
-
Employ a good coding style and sound design
We can prevent most coding mistakes by adopting a good coding style in the first place. Simple things like choosing meaning variable names, and the judicious use of parentheses can aid clarity greatly. [Goodliffe1]
Similarly, having a sound design in place before ploughing into the code is key [Goodliffe6]. "The best documentation of a computer program is a clean structure" said Kernighan and Plauger [EoPS]. Starting off with the correct APIs to implement can avoid a lot of headaches further down the line.
-
Don't code in a hurry
Writing a quick, short function, running it through the compiler to check it compiles, running it once to see if it works and then moving on is all too common. Think about each line as you write it. What errors could arise? Maybe slow, methodical programming seems mundane - but it really does cut down on the number of faults introduced. A particular C/C++ gotcha that snares hasty programmers is mistyping == as just =. With an unhelpful compiler (or with warnings switched off) there will be no indication that the intended comparison is in fact an assignment. We'll see some more of this later.
-
Don't use complicated programming tricks that only gurus understand
You'll come back to the code and may be confused as to the original intent. Maintenance programmers may not understand some advanced idioms or a convoluted string of 1000 operators. If it can't be maintained, the code is not safe. In extreme cases, overly complex expressions can cause the compiler to generate incorrect code - many compiler optimisation errors come to light in this way.
-
Split up complicated expressions into a series of simpler calculations
In many ways this is related to the above. Just because you can write an entire for loop inside the for(...); construct with judicious use of commas doesn't mean you have to. Write code for clarity not brevity.
-
Compile with all warnings switched on
And then if your code generates any warnings, remove them immediately. Never be satisfied with code that won't compile completely quietly when all warnings are enabled. The warnings are there for a reason. Even if there is a particular warning that you think 'doesn't matter' by leaving it in you one day run the risk of hiding one that does matter.
-
Initialise all variables at their point of declaration
Another clarity issue. The intent of each variable, or its initial state is explicit if you initialise it. Relying on rules of thumb such as 'if I don't initialise it, I don't care about the initial value' is simply not safe. The code will evolve. The uninitialised value may turn into a problem later down the line.
-
Declare variables as late as possible
By doing this you place the variable as closely as possible with its use - preventing it from confusing the other parts of the code, and clarifying the code that uses it by putting the declaration nearby.
A variable should not be global if it doesn't need to be. It should not be in file scope if doesn't need to be. It should not be function local if can be loop local. These issues are subtly different in pre-C99 C than C++. In C++ (and the new C standard) we can declare variables as late into the function as we chose, rather than placing them at the top of the function. This is generally acknowledged to greatly improve code readability.
-
Check function returns
If a function returns a value it does so for a reason. Check the return value. If it is an error code, you should inspect it. This goes for user functions as well as standard library ones. Most of the insidious bugs you'll find arise when a programmer omits to check a return value. Don't forget that some functions may return errors through a different mechanism (i.e. the standard C library's errno). In a similar vein, we should catch and handle appropriate exceptions at the appropriate level.
-
Always have a default case
Your switch statements should always document what happens in the default case. If the default case is an error, make that explicit in the code[2]. If nothing should happen, make it explicit in the code - that way the maintenance programmer will understand. Similarly, always have a final else block if an if-then-else construct could have a 'default' case.
-
Cast carefully
If you need to use a cast, think carefully about it. It is good practice to not rely on the compiler to perform casting for you - do it explicitly. Do not presume that int and long are the same size (and therefore interchangeable), even if they are on your platform. In C++ use the correct form of cast - it is important to understand the difference between const_cast, reinterpret_cast, et al.
-
Follow the error handling strategy
The code base you are working on should have a consistent error handling strategy. What happens when an error is encountered? It is not acceptable for one section of the code to print error messages to stderr and then abort, whilst another uses perror, whilst another ignores the error, whilst another laboriously tries to recover and keep quiet. When in Rome, handle errors like the Romans do.
-
Write loops carefully
Incorrect loop conditions are a source of many errors. For example, in a for loop don't check whether i == 5, but whether i < 6. Why? Firstly, it is the natural idiom of the language - for understandable code it is important to understand the idioms of the language. Secondly, what would happen if later on someone put an i += 2 anywhere in the loop body?
Note
(However, this advice is not applicable when using C++ iterators. FG)
-
Don't let anyone tinker with stuff they oughtn't
For example, in C++ prevent access to class data members by making them private, or use the Cheshire Cat/pimpl idiom. In C don't declare variables globally when they needn't be. These are examples of code just asking for unnecessary trouble. Don't spoil for a fight.
-
Use standard C/C++
Unless mandated by your project (and there had better be a good reason) don't rely on compiler weirdness or any non-standard extensions to the language. If there is an area of the language that has undefined behaviour don't rely on the behaviour of your particular compiler (e.g. don't rely on your compiler treating char as a signed value - others won't). Doing so leads to very brittle code. What happens when you update the compiler? What happens when a new programmer joins the team who doesn't understand the extensions? Relying on a particular compiler's implementation of undefined behaviour leads to really subtle bugs later in life.
-
When doing complex maths, think about numerical limits
The standard libraries provide mechanisms for determining the capacity of standard types - use them.
-
Prototype functions in a header
It seems like common sense, but people still don't do this. It can avoid parameter mismatching and prevents maintenance nightmares. In older C-style build systems linkers can't check function parameter and return types - this is a common cause of odd behaviour in executing code.
-
Use a good diagnostic logging facility
When you first write code, it often contains a lot of diagnostics to check validity - should these really be removed after the event? If you can leave them in at a disabled logging level things could be much better (when you have to revisit the code later). There are a number of diagnostic logging systems available. Their use is very valuable. They can be used in such a way that diagnostics have no overhead if not needed, they can be conditionally compiled out.
-
Handle memory (and other precious resources) carefully
If you claim a resource be thorough in your release of it. Memory is the most often cited example for this, but it is not the only one. Files and locks are other precious resources that we must use carefully. Don't neglect to close files or release memory because you think that when your program exits the OS will clean up after you. You don't know how long your code will be left running, eating up all file handles or consuming all memory. You can't be sure the OS will cleanly release your resources - for portability you can't assume it; some OSes don't.
A school of thought says don't free memory until you know your program works in first place, only then add all the relevant releases. I think this is a ludicrously dangerous practice. It will likely hide many, many other errors in the memory usage, and you will be more likely to forget to free memory in some places. Just say no.
Running a static code checker (e.g. lint) as mentioned in the previous column can uncover a number of these problems, but not the whole lot. There is no substitute for careful programming.
We've talked about the set of assumptions we make as we program. How can we incorporate this set of assumptions in our code? We can write a little extra code to check these conditions. This code would act as the documentation for the assumptions, making them explicit rather than implicit[3] In doing this we're codifying the constraints on program functionality and behaviour. What do we want the program to do if a constraint is broken? Since this kind of constraint will be more than a simple detectable and correctable run time error (we should already be checking for and handling those), it must be a flaw in the program logic. There are few possibilities:
-
Turn a blind eye to the problem, and hope that nothing else will go wrong as a consequence,
-
Give it an on the spot fine and allow the program to continue (e.g. print a diagnostic warning, log the error),
-
Go directly to jail. Do not pass go. Do not collect £200 (e.g. abort the program on the spot).
For example, if it is invalid to call void a(foo *ptr) with ptr = 0 because ptr will be dereferenced almost immediately, the last two are the most plausible candidates. It's probably best to abort the program completely, since deferencing a null pointer can lead to all sorts of catastrophes on unprotected operating systems.
There are a number of different scenarios in which constraints are used:
-
Preconditions. These are conditions that must hold true before a section of code is entered.
-
Postconditions. These must hold true after a code block is left.
-
Invariants. These are conditions that hold true between loop passes, across method calls, etc.
-
Assertions. Finally, any other statement about a program's state at a given point in time.
The first two are frustrating to implement in C/C++ - with multiple exit points in a function things can get messy when you incorporate a post condition. Eiffel is much better in this respect, with pre/post conditions supported in the core language. It can also check that the constraint's conditional code does not have any side effects.
However tedious, good constraints expressed inline in the code makes the code clearer and more maintainable. This technique is also known as design by contract.
There are a number of different problems you can guard against with such constraints. For example, you can:
-
Check array accesses are within bounds
-
Assert pointers are not 0 before dereferencing them
-
Ensure function parameters are not invalid
-
Validate function results before returning them[4]
-
Prove an object's state is consistent before operating on it
-
Guard any place in the code where you'd write the comment "we should never get here"
The first two of these examples are particularly C/C++ focused. Java has its own ways of avoiding some of these pitfalls, as do Pascal and Ada.
There is the perfectly valid question: just how much constraint checking should you do? Placing a check on every other line might be a bit extreme. As with many such things, the correct balance becomes clearer as the programmer gets more mature. Better too much than too little? It would be possible for too many constraint checks to obfuscate the logic in the code. "Readability is the best single criterion of program quality: if a program is easy to read, it is probably a good program; if it is hard to read, it probably isn't good." [ST]
Realistically, putting pre- and post-conditions in major functions plus invariants in the key loops is more than sufficient.
Now this kind of constraint checking is usually only required during the development and debugging stages of program construction. Once we have used the constraints to convince ourselves (wrongly or rightly) that the program logic is correct, we would ideally remove them so as not to incur an unnecessary runtime overhead.
The standard C and C++ libraries provide us with a single mechanism to implement these constraints - assert. assert acts as a procedural firewall, testing the logic in its parameter. It is provided as an alarm for the developer to show incorrect program behaviour and should not be allowed to trigger in customer-facing code. If the assertion's constraint is satisfied execution continues, otherwise the program aborts. If the assertion is triggered, the program will output to standard error a message something like this (taken from from gcc) before exiting:
a.out: bugged.cpp:10: int main (): Assertion "1 == 0" failed.
assert is implemented as a preprocessor macro which means it sits more naturally in a C environment than a C++ one[5]. To use assert you must #include assert.h. Then you can write in your function something like assert(ptr != 0); Preprocessor magic allows us to remove all assertions in a "production build" by specifying the NDEBUG flag to the compiler. All asserts will be removed, and their conditionals will not be evaluated. This means that in production builds asserts have no overhead at all.
However, whether or not assertions should be completely removed as opposed to just being made "non fatal" is a debatable issue. There is a school of thought that says by removing them you are now testing a completely different piece of code[6]. Others say that the overhead of assertions are not acceptable in a release build. One area we must definitely be wary of is that our assertions must not have any side effects. For example, if you mistakenly wrote
int i = 5; assert(i = 6); // hmm - should have typed more carefully! printf("i is %d\n", i);
The assertion will clearly never trigger in a debug build; its value is 6 (near enough 'true' for C). However, in a release build, the assert line will be removed completely and the printf will produce different output. This can be the cause of subtle problems late in product development. It's quite hard to guard against bugs in the bug-checking code! It's not hard to envision situations where the assertions might have more hidden side effects. For example, if you assert(invariants()); yet the invariants() function has a side effect, it's not so easy to see immediately in the code.
Since the assertions can be removed for production code, it is also vital that only logical constraint testing is done with assert and no real error condition testing. You wouldn't want to compile that out of your code!
Whilst debugging your program, when you discover and fix a fault, it is good practice to slip an assertion in where the fault was fixed, then you can ensure that you won't be bitten twice. If nothing else, this would act as a warning sign to people maintaining the code in the future.
A common C++ technique for writing constraints when testing classes is to add a single member function to each class called bool invariant(). (Naturally this function should have no side effects). Now an assert can be put at the beginning and end of each member function calling this invariant. (The exceptions to this rule being no assertion at the beginning of a constructor or at the end of the destructor, for obvious reasons). For example, in your circle class your invariant may check that radius != 0 since that would be invalid object state and cause later calculations to fail (perhaps with a divide by zero).
Whilst writing this the thought occurred to me, what is the opposite of defensive programming? It is offensive programming of course! Now, there are a number of programmers I know who you could call 'offensive programmers,' but maybe there's more to it than just swearing whilst writing code?
It stands to reason that this 'offensive' logical opposite would be trying to break things in the code rather than guard against problems. This is, actively attacking the code rather than securing it. I'd call that testing.
That means we should be all offensive programmers.
It is important to generate code that is not just correct, but is also good. It needs to be clear, documenting all the assumptions made. This way it will be easier to maintain with fewer bugs. Defensive programming is a method of expecting the worst and being prepared for it.
Many companies say that they employ defensive programming. Look at their code bases - do they really?
[Goodliffe1] Pete Goodliffe. Professionalism in Programming #1. C Vu, Volume 12 No 2. ISSN: 1354-3164 .
[Goodliffe6] Pete Goodliffe. Professionalism in Programming #6: Good design. C Vu, Volume 13 No 1. ISSN: 1354-3164.
[1] Usually the set of 'all possible inputs' is ridiculously large and hard to test.
[2] A possible exception is switch statements for closed sets of enumerated values. Leaving the default case out allows the compiler to warn if you miss one of the enum's values.
[3] They don't replace writing good documentation, though.
[4] This assertion is often overlooked, yet can diagnose a lot of problems early.
[5] A new and more flexible C++ approach is likely to appear in Overload shortly.
[6] In practice, more may change between development and release builds of software - compiler optimisation levels and inclusion of debugging symbols, for example. These can both make subtle differences to execution speed, and may obscure the manifestation of other faults. During even the earliest stages of development testing should be performed equally with development and release builds of the code.
Notes:
More fields may be available via dynamicdata ..