Journal Articles
Browse in : |
All
> Journals
> CVu
> 291
(7)
All > Topics > Programming (877) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: On the Defensive
Author: Martin Moene
Date: 06 March 2017 18:24:50 +00:00 or Mon, 06 March 2017 18:24:50 +00:00
Summary: Pete Goodliffe demonstrates defensive programming techniques for robust code.
Body:
We have to distrust each other. It’s our only defense against betrayal.
~ Tennessee Williams
It seems an age ago now. When my daughter was only 10 months old, she liked playing with wooden bricks. Well, she liked playing with wooden bricks and me. I’d build a tower as high as I could, and then with a gentle nudge of the bottom brick, she’d topple the whole thing and let out a little whoop of delight. I didn’t build these towers for their strength – it would have been pointless if I did. If I had really wanted a sturdy tower, then I’d have built it in a very different way. I’d have shorn up a foundation and started with a wide base, rather than just quickly stacking blocks upon each other and building as high as possible.
Too many programmers write their code like flimsy towers of bricks; a gentle unexpected prod to the base and the whole thing falls over. Code builds up in layers, and we need to use techniques that ensure that each layer is sound so that we can build upon it.
Towards good code
There is a huge difference between code that seems to work, correct code, and good code. M. A. Jackson wrote, “The beginning of wisdom for a software engineer is to recognize the difference between getting a program to work, and getting it right.†There is a difference:
- It is easy to write code that works most of the time. You feed it the usual set of inputs, it gives the usual set of outputs. But give it something surprising, and it might just fall over.
- Correct code won’t fall over. For all possible sets of input, the output will be correct. But usually the set of all possible inputs is ridiculously large and hard to test.
- However, not all correct code is good code – the logic may be hard to follow, the code may be contrived, and it may be practically impossible to maintain.
By these definitions, good code is what we should aim for. It is robust, efficient enough and, of course, correct. Industrial strength code will not crash or produce incorrect results when given unusual inputs. It will also satisfy all other requirements, including thread safety, timing constraints, and re-entrancy.
It’s one thing to write this good code in the comfort of your own home, a carefully controlled environment. It’s an entirely different prospect to do so in the heat of the software factory, where the world is changing around you, the codebase is rapidly evolving, and you’re constantly being faced with grotesque legacy code – archaic programs written by code monkeys that are now long gone. Try writing good code when the world is conspiring to stop you!
In this torturous environment, how do you ensure that your code is industrial strength? Defensive programming helps.
While there are many ways to construct code (object-orientated approaches, component based models, structured design, Extreme Programming, etc.), defensive programming is an approach that can be applied universally. It’s not so much a formal methodology as an informal set of basic guidelines. Defensive programming is not a magical cure-all, but a practical way to prevent a pile of potential coding problems.
Assume the worst
When you write code, it’s all too easy to make a set of assumptions about how it should run, how it will be called, what the valid inputs are, and so on. You won’t even realize that you’ve assumed anything, because it all seems obvious to you. You’ll spend months happily crafting code, as these assumptions fade and distort in your mind.
Or you might pick up some old code to make a vital last-minute fix when the product’s going out the door in 10 minutes. With only enough time for a brief glance at its structure, you’ll make assumptions about how the code works. There’s no time to perform full literary criticism, and until you get a chance to prove the code is actually doing what you think it’s doing, assumptions are all you have.
Assumptions cause us to write flawed software. It’s easy to assume:
- The function won’t ever be called like that. I will always be passed valid parameters only.
- This piece of code will always work; it will never generate an error.
- No one will ever try to access this variable if I document it For internal use only.
When we program defensively, we shouldn’t make any assumptions. We should never assume that it can’t happen. We should never assume that the world works as we’d expect it to work.
Experience tells us that the only thing you can be certain about is this: Your code will somehow, someday, go wrong. Someone will do a dumb thing. Murphy’s law puts it this way: “If it can be used incorrectly, it will.†Listen to that man – he spoke from experience [1]. Defensive programming prevents these accidents by foreseeing them, or at least fore-guessing them – figuring out what might go wrong at each stage in the code, and guarding against it.
Is this paranoid? Perhaps. But it doesn’t hurt to be a little paranoid. In fact, it makes a lot of sense. As your code evolves, you will forget the original set of assumptions you made (and real code does evolve). Other programmers won’t have any knowledge of the assumptions in your head, or else they will just make their own invalid assumptions about what your code can do. Software evolution exposes weaknesses, and code growth hides original simple assumptions. A little paranoia at the outset can make code a lot more robust in the long run.
Assume nothing. Unwritten assumptions continually cause faults, particularly as code grows.
Add to this the fact that things neither you nor your users have any control over can go wrong: disks fill up, networks fail, and computers crash. Bad things happen. Remember, it’s never actually your program that fails – the software always does what you told it to. The actual algorithms, or perhaps the client code, are what introduce faults into the system.
As you write more code, and as you work through it faster and faster, the likelihood of making mistakes grows and grows. Without adequate time to verify each assumption, you can’t write robust code. Unfortunately, on the programming front line, there’s rarely any opportunity to slow down, take stock, and linger over a piece of code. The world is just moving too fast, and programmers need to keep up. Therefore, we should grasp every opportunity to reduce errors, and defensive practices are one of our main weapons.
What is defensive programming?
As the name suggests, defensive programming is careful, guarded programming. To construct reliable software, we design every component in the system so that it protects itself as much as possible. We smash unwritten assumptions by explicitly checking for them in the code. This is an attempt to prevent, or at least observe, when our code is called in a way that will exhibit incorrect behaviour.
Defensive programming enables us to detect minor problems early on, rather than get bitten by them later when they’ve escalated into major disasters. All too often, you’ll see ‘professional’ developers rush out code without thinking. Tinker with the code–run it–crash. Tinker–run–crash. Tinker–run–crash.
They are continually tripped up by the incorrect assumptions that they never took the time to validate. Hardly a promotion for modern day software engineering, but it’s happening all the time. Defensive programming helps us to write correct software from the start and move away from the code-it, try-it, code-it, try-it… cycle.
Okay, defensive programming won’t remove program failures altogether. Far from it. But problems will become less of a hassle and easier to fix. Defensive programmers catch falling snowflakes rather than drown under an avalanche of errors.
Defensive programming is a method of prevention, rather than a form of cure. Compare this to debugging – the act of removing bugs after they’ve bitten. Debugging is all about finding a cure.
Is defensive programming really worth the hassle? There are arguments for and against.
The case against:
- Defensive programming consumes resources, both yours and the computer’s.
- It eats into the efficiency of your code; even a little extra code requires a little extra execution. For a single function or class, this might not matter, but when you have a system made up of 100,000 functions, you may have more of a problem.
- Each defensive practice requires some extra work. Why should you follow any of them? You have enough to do already, right? Well, then just make sure people use your code correctly. If they don’t, then any problems are their own fault.
What Defensive Programming Isn't |
There are a few common misconceptions about defensive programming. Defensive programming is not: Error checking If there are error conditions that might arise in your code, you should be checking for them anyway. This is not defensive code. It’s just plain good practice – a part of writing correct code. Testing Testing your code is not defensive. It’s another normal part of our development work. Test harnesses aren’t defensive; they can prove the code is correct now, but won’t prove that it will stand up to future modification. Even with the best test suite in the world, anyone can make a change and slip it past untested. Debugging You might add some defensive code during a spell of debugging, but debugging is something you do after your program has failed. Defensive programming is something you do to prevent your program from failing in the first place (or to detect failures early before they manifest in incomprehensible ways, demanding all-night debugging sessions). |
The case for:
- Defensive programming saves you literally hours of debugging and lets you do more fun stuff instead. Remember Murphy: If your code can be used incorrectly, it will be.
- Working code that runs properly, but ever-so-slightly slower, is far superior to code that works most of the time but occasionally collapses in a shower of brightly coloured sparks.
- We can design some defensive code to be physically removed in release builds, circumventing the performance issue. The majority of the items we’ll consider here don’t have any significant overhead, anyway.
- Defensive programming avoids a large number of security problems – a serious issue in modern software development. More on this below.
As the market demands software that’s built faster and cheaper, we need to focus on techniques that deliver results. Don’t skip the bit of extra work up-front that will prevent a whole world of pain and delay later.
The big bad world
Someone once said, “Never ascribe to malice that which is adequately explained by stupidity.†[2] Most of the time we are defending against stupidity, against invalid and unchecked assumptions. However, there are malicious users, and they will try to bend and break your code to suit their vicious purposes.
Defensive programming helps with program security, guarding against this kind of wilful misuse. Crackers and virus writers routinely exploit sloppy code to gain control of an application and then weave whatever wicked schemes they desire. This is a serious threat in the modern world of software development; it has huge implications in terms of the loss of productivity, money, and privacy.
Software abusers range from the opportunistic user exploiting a small program quirk to the hardcore cracker who spends his time deliberately trying to gain illicit access to your systems. Too many unwitting programmers leave gaping holes for these people to walk through. With the rise of the networked computer, the consequences of sloppiness become more and more significant.
Many large development corporations are finally waking up to this threat and are beginning to take the problem seriously, investing time and resources into serious defensive code work. In reality, it’s hard to graft in defences after an attack.
Techniques for defensive programming
So what does all this mean to programmers working in the software factory?
There are a number of common sense rules under the defensive programming umbrella. People usually think of assertions when they think of defensive programming, and rightly so. We’ll talk about those later. But there’s also a pile of simple programming habits that will immeasurably improve the safety of your code.
Despite seeming common sense, these rules are often ignored – hence the low standard of most software at large in the world. Tighter security and reliable development can be achieved surprisingly easily, as long as programmers are alert and well informed.
The next few sections list the rules of defensive programming. We’ll start off by painting with broad strokes, looking at high-level defensive techniques, processes, and procedures. As we progress, we’ll fill in finer detail, looking more deeply at individual code statements. Some of these defensive techniques are language-specific. This is natural – you have to put on bulletproof shoes if your language lets you shoot yourself in the foot.
As you read this list, evaluate yourself. How many of these rules do you currently follow? Which ones will you now adopt?
Employ a good coding style and sound design
We can prevent most coding mistakes by adopting a good coding style. Simple things like choosing meaningful variable names and using parentheses judiciously can increase clarity and reduce the likelihood of faults slipping past unnoticed.
Similarly, considering the larger scale design before ploughing into the code is key. “The best documentation of a computer program is a clean structure.†[3]. Starting off with a set of clear APIs to implement, a logical system structure, and well-defined component roles and responsibilities will avoid headaches further down the line.
Don't code in a hurry
It’s all too common to see hit-and-run programming. Programmers quickly hack out a function, shove it through the compiler to check syntax, run it once to see if it works, and then move on to the next task. This approach is fraught with peril.
Instead, think about each line as you write it. What errors could arise? Have you considered every logical twist that might occur? Slow, methodical programming seems mundane – but it really does cut down on the number of faults introduced.
More haste, less speed. Always think carefully about what you’re typing as you type it.
A particular C-family gotcha that snares speedy programmers is mistyping ==
as just =
. The former is a test for equality, the latter a variable assignment. With an unhelpful compiler (or with warnings switched off) there will be no indication that the program behaviour is not what was intended.
Always do all of the tasks involved in completing a code section before rushing on. For example, if you decide to write the main flow first and the error checking/handling second, you must be sure you have the discipline to do both. Be very wary of deferring the error checking and moving straight on to the main flow of three more code sections. Your intention to return later may be sincere, but later can easily become much later, by which time you will have forgotten much of the context, making it take longer and be more of a chore. (And of course, by then there will be some artificially urgent deadline.)
Discipline is a habit that needs to be learned and reinforced. Every time you don’t do the right thing now, you become more likely to continue not doing the right thing in the future. Do it now, don’t leave it for a rainy day in the Sahara. Doing it later actually requires more discipline than doing it now!
Trust no-one
Your mother told you never to talk to strangers. Unfortunately, good software development requires even more cynicism and less faith in human nature. Even well-intentioned code users could cause problems in your program; being defensive means you can’t trust anybody.
You might suffer problems because of:
- Genuine users accidentally giving bogus input or operating the program incorrectly.
- Malicious users trying to consciously provoke bad program behaviour.
- Client code calling your function with the wrong parameters or supplying inconsistent input.
- The operating environment failing to provide adequate service to the program.
- External libraries behaving badly and failing to honour interface contracts that you rely on.
- You might even make a silly coding mistake in one function or forget how some three-year-old code is supposed to work and then use it badly.
Don’t assume that all will go well or that all code will operate correctly. Put safety checks in place throughout your work. Constantly watch for weak spots, and guard against them with extra defensive code.
Trust no one. Absolutely anyone – including yourself – can introduce flaws into your program logic. Treat all inputs and all results with suspicion until you can prove that they are valid.
Write code for clarity, not brevity
Whenever you can choose between concise (but potentially confusing) code and clear (but potentially tedious) code, use code that reads as intended, even if it’s less elegant. For example, split complex arithmetic operations into a series of separate statements to make the logic clearer.
Think about who might read your code. It might require maintenance work by a junior coder, and if he can’t understand the logic, then he’s bound to make mistakes. Complicated constructs or unusual language tricks might prove your encyclopedic knowledge of operator precedence, but it really butchers code maintainability. Keep it simple.
If it can’t be maintained, your code is not safe. In really extreme cases, overly complex expressions can cause the compiler to generate incorrect code – many compiler optimization errors come to light this way.
Simplicity is a virtue. Never make code more complex than necessary.
Say "when" |
When do you program defensively? Do you start when things go wrong? Or when you pick up some code you don’t understand? No, these defensive programming techniques should be used all the time. They should be second nature. Mature programmers have learned from experience – they’ve been bitten enough times that they know to put sensible safeguards in place. Defensive strategies are much easier to apply as you start writing code, rather than retrofitting them into existent code. You can’t be thorough and accurate if you try to shoehorn in this stuff late in the day. If you start adding defensive code once something has gone wrong, you are essentially debugging – being reactive, not preventative and proactive. However, during the course of debugging, or even when adding new functionality, you’ll discover conditions that you’d like to verify. It’s always a good time to add defensive code. |
Don’t let anyone tinker with stuff they shouldn’t
Things that are internal should stay on the inside. Things that are private should be kept under lock and key. Don’t display your code’s dirty laundry in public. No matter how politely you ask, people will fiddle with your data when you’re not looking if given half a chance, and they will try to call ‘implementation only’ routines for their own reasons. Don’t let them.
- In object-oriented languages, prevent access to internal class data by making it private. In C++, consider the Cheshire cat/pimpl idiom.
- In procedural languages, you can still employ object-oriented (OO) packaging concepts, by wrapping private data behind opaque types and providing well-defined public operations on them.
- Keep all variables in the tightest scope necessary; don’t declare variables globally when you don’t have to. Don’t put them at file scope when they can be function-local. Don’t place them at function scope when they can be loop-local.
Compile with all warnings switched on
Most languages’ compilers draw on a vast selection of error messages when you hurt their feelings. They will also spit out various warnings when they encounter potentially flawed code, like the use of a C or C++ variable before its assignment [4]. These warnings can usually be selectively enabled and disabled.
If your code is full of dangerous constructs, you’ll get pages and pages of warnings. Sadly, the common responses are to disable compiler warnings or just ignore the messages. Don’t do either.
Always enable your compiler’s warnings. And if your code generates any warnings, fix the code immediately to silence the compiler’s screams. Never be satisfied with code that doesn’t compile quietly when warnings are enabled. The warnings are there for a reason. Even if there’s a particular warning you think doesn’t matter, don’t leave it in, or one day it will obscure one that does matter.
Compiler warnings catch many silly coding errors. Always enable them. Make sure your code compiles silently.
Use static analysis tools
Compiler warnings are the result of a limited static analysis of your code, a code inspection performed before the program is run.
There are many separate static analysis tools available, like lint (and its more modern derivatives) for C and FxCop for .NET assemblies. Your daily programming routine should include use of these tools to check your code. They will pick up many more errors than your compiler alone.
Use safe data structures
Or failing that, use dangerous data structures safely.
Perhaps the most common security vulnerability results from buffer overrun. This is triggered by the careless use of fixed-size data structures. If your code writes into a buffer without checking its size first, then there is always potential for writing past the end of the buffer.
It’s frighteningly easy to do, as this small snippet of C code demonstrates:
char *unsafe_copy(const char *source) { char * buffer = new char[10]; strcpy(buffer, source); return buffer; }
If the length of the data in source
is greater than 10 characters, its copy will extend beyond the end of buffer
’s reserved memory. Then anything could happen. In the best case, the result would be data corruption – some other data structure’s contents will be overwritten. In the worst case, a malicious user could exploit this simple error to put executable code on the program stack and use it to run his own arbitrary program, effectively hijacking the computer. These kinds of flaw are regularly exploited by system crackers – serious stuff.
It’s easy to avoid being bitten by these vulnerabilities: don’t write such bad code! Use safer data structures that don’t allow you to corrupt the program – use a managed buffer like C++’s string
class. Or systematically use safe operations on unsafe data types. The C code above can be secured by swapping strcpy
for strncpy
, a size limited string copy operation:
char *safer_copy(const char *source) { char * buffer = new char[10]; strncpy_(buffer, source, 10); return buffer; }
Check EVERY return value
If a function returns a value, it does so for a reason. Check that return value. If it is an error code, you must inspect it and handle any failure. Don’t let errors silently invade your program; swallowing an error can lead to unpredictable behaviour.
This applies to user-defined functions as well as standard library ones. Most of the insidious bugs you’ll find arise when a programmer fails to check a return value. Don’t forget that some functions may return errors through a different mechanism (i.e., the standard C library’s errno
). Always catch and handle appropriate exceptions at the appropriate level.
Handle memory (and other precious resources) carefully
Be thorough and release any resource that you acquire during execution. Memory is the example of this cited most often, but it is not the only one. Files and thread locks are other precious resources that we must use carefully. Be a good steward.
Don’t neglect to close files or release memory because you think that the OS will clean up your program when it exits. You really don’t know how long your code will be left running, eating up all file handles or consuming all the memory. You can’t even be sure that the OS will cleanly release your resources – some OSs don’t.
There is a school of thought that says, “Don’t worry about freeing memory until you know your program works in the first place; only then add all the relevant releases.†Just say no. This is a ludicrously dangerous practice. It will lead to many, many errors in your memory usage; you will inevitably forget to free memory in some places.
Treat all scarce resources with respect. Manage their acquisition and release carefully.
Java and .NET employ a garbage collector to do all this tedious tiding up for you, so you can just ‘forget’ about freeing resources. Let them drop to the floor, since the runtime sweeps up every now and then. It’s a nice luxury, but don’t be lulled into a false sense of security. You still have to think. You have to explicitly drop references to objects you no longer care about or they won’t be cleaned up; don’t accidentally hold on to an object reference. Less advanced garbage collectors are also easily fooled by circular references (e.g., A refers to B, and B refers to A, but no one else cares about them). This could cause objects to never be swept up; a subtle form of memory leak.
Initialize all variables at their points of declaration
This is a clarity issue. The intent of each variable is explicit if you initialize it. It’s not safe to rely on rules of thumb like: If I don’t initialize it, I don’t care about the initial value. The code will evolve. The uninitialized value may turn into a problem further down the line.
C and C++ compound this issue. If you accidentally use a variable without having initialized it, you’ll get different results each time your program runs, depending on what garbage was in memory at the time. Declaring a variable in one place, assigning it later on, and then using it even later opens up a window for errors. If the assignment is ever skipped, you’ll spend ages hunting down random behaviour. Close the window by initializing every variable as you declare it; even if the value’s wrong, the behaviour will at least be predictably wrong.
Safer languages (like Java and C#) sidestep this pitfall by defining an initial value for all variables. It’s still good practice to initialize a variable as you declare it, which improve code clarity.
Declare variables as late as possible
By doing this, you place the variable as close as possible to its use, preventing it from confusing other parts of the code. It also clarifies the code using the variable. You don’t have to hunt around to find the variable’s type and initialization; a nearby declaration makes it obvious.
Don’t reuse the same temporary variable in a number of places, even if each use is in a logically separate area. It makes later reworking of the code awfully complicated. Create a new variable each time – the compiler will sort out any efficiency concerns.
Use standard language facilities
C and C++ are nightmares in this respect. They suffer from many different revisions of their specifications, with more obscure cases left as implementation-specific undefined behaviour. Today there are many compilers, each with subtly different behaviour. They are mostly compatible, but there is still plenty of rope to hang yourself with.
Clearly define which language version you are using. Unless mandated by your project (and there had better be a good reason), don’t rely on compiler weirdness or any non-standard extensions to the language. If there is an area of the language that is undefined, don’t rely on the behaviour of your particular compiler (e.g., don’t rely on your C compiler treating char
as a signed
value – others won’t). Doing so leads to very brittle code. What happens when you update the compiler? What happens when a new programmer joins the team who doesn’t understand the extensions? Relying on a particular compiler’s odd behaviour leads to really subtle bugs later in life.
Use a good diagnostic logging facility
When you write some new code, you’ll often include a lot of diagnostics to check what’s going on. Should these really be removed after the event? Leaving them in will make life easier when you have to revisit the code, especially if they can be selectively disabled in the meantime.
There are a number of diagnostic logging systems available to facilitate this. Many can be used in such a way that diagnostics have no overhead if not needed; they can be conditionally compiled out.
Cast carefully
Most languages allow you to cast (or convert) data from one type to another. This operation is sometimes more successful than others. If you try to convert a 64-bit integer into a smaller 8-bit data type, what will happen to the other 56 bits? Your execution environment might suddenly throw an exception or silently degrade your data’s integrity. Many programmers don’t think about this kind of thing, and so their programs behave in unnatural ways.
If you really want to use a cast, think carefully about it. What you’re saying to the compiler is, “Forget your type checking: I know what this variable is, you don’t.†You’re ripping a big hole into the type system and walking straight through it. It’s unstable ground; if you make any kind of mistake, the compiler will just sit there quietly and mutter, “I told you so,†under its breath. If you’re lucky (e.g. using Java or C#) the runtime might throw an exception to let you know, but this depends on exactly what you’re trying to convert.
C and C++ are particularly vague about the precision of data types, so don’t make assumptions about data type interchangeability. Don’t presume that int
and long
are the same size and can be assigned to one another, even if you can get away with it on your platform. Code migrates platforms, but bad code migrates badly.
Conclusion
It is important to craft code that is not just correct but is also good. It needs to document all the assumptions made. This will make it easier to maintain, and it will harbour fewer bugs. Defensive programming is a method of expecting the worst and being prepared for it. It’s a technique that prevents simple faults from becoming elusive bugs.
The use of codified constraints alongside defensive code will make your software far more robust. Like many other good coding practices (unit testing, for example), defensive programming is about spending a little extra time wisely (and early) in order to save much more time, effort, and cost later. Believe me, this can save an entire project from ruin.
Questions
- Can you have too much defensive programming?
- Should you add an assertion to your code for every bug you find and fix?
- Should assertions conditionally compile away to nothing in production builds? If not, which assertions should remain in release builds?
- Are exceptions a better form of defensive barrier than C-style assertions?
- How carefully do you consider each statement that you type? Do you relentlessly check every function return code, even if you’re sure a function will not return an error?
Notes and references
[1] Edward Murphy, Jr., was a US Air Force Engineer. He coined this infamous law after discovering a technician had systematically connected a whole row of devices upside down. Symmetric connectors permitted this avoidable mistake; afterwards, he chose a different connector design.
[2] Some historians attribute that quote to Napoleon Bonaparte. Now there’s a guy who knew something about defence.
[3] The Elements of Programming Style. B.W. Kernighan, P.J. Plauger. 1978.
[4] Many languages (like Java and C#) class this as an error.
Notes:
More fields may be available via dynamicdata ..