This article seeks to examine the use of two mechanisms to handle 'errors' in code. Two types of error are identified - runtime and design-time (see later for how these are defined) - both manifesting themselves when a program is run. Two mechanisms are identified for dealing with errors - program logic flow and assertions. I suggest that runtime errors should only be handled with logic flow (usually the if keyword) and that design-time errors should not. The latter should be handled by assertions or other design-time mechanisms.
I also put forward that if a function accepts a pointer to an object as an argument, then by default, the design decision should be that that pointer must not be NULL.
I know that some are going to have objections to what I say here, so I present this as my point of view, not industry best practice. Having said that, I have found in real projects that the technique put forward here helps to produce cleaner and more robust code, particularly not having silent failures, and not having to constantly check what parameters are valid, and which are not. I welcome any counter arguments, either as letters to the editor, or indeed another article.
First a proviso: my experience is based in the application development world, not safety-critical or embedded systems where continuity might be an issue (for example, an aircraft system would want to keep the plane flying in the right direction even if there was a bug in the toilet light module). The things I have to say may well apply there, with some modifications, but for the article, think of a desktop application, console application, or perhaps a web-based application. In these circumstances, I think it is better to fail fast - that is bail out altogether - than process things incorrectly.
This is a discussion relevant to C++ in particular. It may well be relevant to other languages. Indeed, C#, Java and VB.Net all have alias concepts which allow NULL, so the general principle will apply there.
Options for handling errors
We have at least two options for handling errors - program logic flow using a conditional statement (if, for example) and assertions. However, I usually favour design-time mechanisms to avoid error possibilities. This is briefly discussed later in the context of whether or not to allow NULL pointers into functions.
These, in particular if, act as a fork for control flow. In the context of this discussion, I am talking about detecting error conditions, and doing something about it. That might be to silently ignore by returning early, throwing an exception, doing some extra work, missing out some work, diagnostic output, etc. The point is that it is the intention of the program to do something about it.
Assertion refers to the general category of functions/macros which a programmer puts into code to catch specific 'errors' when a program runs, and alert the user (hopefully the developer acting as a user) in a drastic way. The results of this might be to terminate the program, or to pop up a nasty-looking error dialog offering the chance to debug or abort. The C++ standard only mentions assert in the context of the headers <assert.h> and <cassert>, which provides the assert() function on my system. Also defined in various implementations are _ASSERT, ASSERT and others. I'll deal with these collectively as 'assertions'.
They usually take an expression as an argument which can be converted to a Boolean value. In the case it evaluates to false, the drastic behaviour occurs.
Not orthogonal concepts
The crux of the matter is that a given condition should either be handled by a conditional statement or an assertion. The former should handle the error; the latter should bail out of the program (or do whatever drastic behaviour the particular assertion being used performs). Under no circumstances, should one see both in action for the same check.
But one does see that. Why? The answer usually comes in the form of 'belt and braces'. This is to me a pejorative term, referring to a style in which multiple attempts to 'handle errors' occur, just in case it is not handled properly by the first. The term comes from the real world scenario of the two ways to hold up your trousers. Of course in the real world, either your belt or your braces might wear out, break, or just come undone.
But in the software world, code does not rot once written (see sidebar - Bit Rot (or not?)). If your software belt works in a given environment, it will always work, ditto with the software braces. You have to decide which you really need, depending on the circumstances. That can be the hard part, and the reason why it is not done rigorously.
Bit Rot (or not?) |
Bit rot is a recognized concept, actually one that I hadn't come across until this article was reviewed! Wikipedia refers in the most part to decay of storage media, and that the suggestion that code itself can rot is facetious. My position is that this is correct. Assuming that the bits of a program's code do not get corrupted, it will always execute the same way in response to the same input. Input, however, is not as simple as what the user types into the search box, or where a mouse click occurs in a UI. It must include memory state, available resources, and the like. Thus, there is the common phenomenon of seeminglyrandom errors in release builds of programs, where code is using memory which has (by incorrect coding) not been initialised. The input in this case changes if the memory address referred to happens to contain different values in different invocations of the program. Particularly with regard to operating system changes, a program's environment can change. This may also lead to previously-unseen bugs appearing. In this case, we could include the environment in the term 'input'. For example, as mentioned elsewhere, if your code was passing a NULL pointer to strlen(), then from one version of Windows to the next, your code will suddenly start to crash. This is not bit rot, but a combination of incorrect coding and a change in input. |
It boils down to distinguishing between two types of error, and using the appropriate mechanism for each.
Two types of error
What I call runtime or external errors are things which are beyond the influence of your program. These are things it must take into account in order to behave sensibly in 'expected' circumstances. It may be the case that for your program to fulfil its purpose, certain external factors need to be in place, but as long as it cannot control them, it should be able to handle the absence of these factors in a purposeful way. That may be as simple as outputting a message to say that the program's installation is faulty, please reinstall, or it might prompt the user to go off and find something which it can't.
It should not crash, cause any inconsistencies in data, or do any other kind of harm to the system.
Let's look at some examples. It might be looking for a particular file, registry key, or database connection. These can fail to be there for a number of reasons. The installation may have failed, the network might be down, the user may have tinkered with something, there might not be enough memory available, or another program may have interfered. It doesn't matter what happened, just that you identify these possible scenarios. In programming, it doesn't matter how likely they are to occur, you have to program as if it will occur every time. It's all or nothing - you either check the condition or you don't. Whether the external error actually occurs when the program is run is obviously not knowable. You hope that it won't occur, but should not need to pray.
In addition, don't forget that just because you check something once, it still might go wrong later, a file could be deleted at any time, for example. This means that in principle, every use of that file should have a check first. However, especially considering concurrency, it becomes very difficult to guarantee absolutely that an external factor will be available, even if your code checks for it. Such a discussion is beyond the scope of the current article, though.
This is a totally different kettle of fish. These are things which are fundamentally about the program's design. When they go wrong, it means that you have made an assumption, an incorrect design decision, or just coded something incorrectly. It needs to be fixed. It will only be a matter of time, or particular data input, which will cause the thing to blow up.
This kind of situation calls for an assertion in the code. This states that a given condition is expected to be true by design. It is a strong statement, one you have to make with confidence, because if it turns out not be true, things can go badly wrong with your program.
Assertions often get omitted from release versions of code, for reasons of performance. This is the source of much debate. It is arguable that leaving the assertions in the release build will give you better testing, as well as give good diagnostics should something go wrong in the field. Of course, a big hairy message box saying that the program has misbehaved is something you don't want your customer to see. But, more importantly, you don't want them to get misbehaving software in the first place. Certainly for some classes of application, it is better to bail out, than, for example to buy a lot of something you don't need, at the wrong price, or process only half the data, etc.
Why not have them together?
So, we have a point in the code. We have been passed a pointer, and we want to access that object. Sometimes you will see one of the options in Listing 1.
void MyFunc( Thing* p ) { ASSERT( p != NULL ); if ( p == NULL ) return; DoSomethingUseful( p ); } or void MyFunc( Thing* p ) { ASSERT( p != NULL ); if ( p != NULL ) { DoSomethingUseful( p ); } DoABitMore(); } |
Listing 1 |
Apart from causing the compiler to complain sometimes (because the ASSERT macro can expand to give a redundant check for p), I maintain that this is bad practice. At first sight, it might look very diligent. The careful programmer has covered all the options. The problem is that this lack of clarity blurs the issue of what we can expect from p, it blurs the issue of the design of the function, the whole class even.
Coming to this code from the outside, if you saw this in a function which you had cause to alter, what could you deduce about p? Suppose the person who checked it into source control is not available, or doesn't remember anymore.
You have to check in some code and you have to live with the consequences if it blows up. More than likely, you will opt to assume that p might be NULL. You might consider taking out the assertion in that case, but be careful, because we might really want to know if p is NULL here. You just don't know.
In one project, I spent time backtracking up the function-call trees to see where values were coming from. More often than not, it turned out that there was no way we could have got a NULL pointer, so I changed the code. It was quite common for certain members of the team to check collections of pointers for NULL before using them in a loop, a practice I tried to put a stop to. Simply don't put NULL pointers into the container in the first place!
In the ideal scenario, you would have 100% code coverage in your unit tests, so could simply run them and check. In the real world, you probably don't work in such an organised place.
Justification for putting them together
'Yes, but you just don't know what is going to happen in the future of that code. What if somebody does eventually pass a NULL pointer? Somebody might, and it might only happen in a release build where the assertions have been omitted.'
My answer here is that firstly, we put the assertion in as soon as the function is written, so we never have to shut the stable door after the horse has bolted. Secondly, this is just an argument propping up poor development practices. Are you saying that you can't write the code surrounding a function properly to honour its requirements?
Let's take an example. You have a UI showing objects, which the user can select. The user selects one and chooses to delete it. We end up at a call to RemoveObject( Object* p ). Code that ends up here with the possibility of passing a NULL pointer is going to have other problems, because it is not in control of things. The UI layers should be monitoring the selection, and only invoking actions which make sense. I have seen the result of the silent return early code. People don't realise when it happens, their broken code gets checked in, built upon, and creates a mess. Enter the onion.
The onion, a place for assertions
Let's take a step back, to the kitchen. If you cook much, you will probably be familiar with onions. They are comprised of a number of layers, each one shielding the one inside it. Often a bad onion is only bad in one or two layers, the rot stopping at the layer boundary.
Software can be like an onion, with the emphasis on the layering, rather than it making you cry (although that is often the case). Most software could be considered to be comprised of modules, there being many ways to define a module. However you choose to think of a module, think of it as an onion. The outer layer to that onion, or module, is the key one. It's where external factors call your code, where you can choose your policies for handling input. And that must include policies for handling input errors. See CodeCraft [Goodliffe] by Pete Goodliffe for a good discussion of different error handling strategies.
There are numerous policies you could follow. Doing nothing would be the most simplistic. Sometimes this is referred to as garbage in-garbage out, it being your own fault if you pass silly values.
Silently returning is also unfortunately common. Returning early with error codes is a viable option, requiring documentation of what the error codes mean.
In a language which supports it, throwing exceptions could be used for this purpose, as input errors ought to be exceptional.
Be aggressive with assertions inside the onion
Whatever your choice, once the control path gets inside the first layer, you can be justified in making assumptions about design choices. So, you can be very aggressive about checking and stating conditions with assertions.
That function above becomes cleaner:
void MyFunc( Thing* p ) { ASSERT( p != NULL ); DoSomethingUseful( p ); }
That's better. You ought to do this from the outset of the project, though. Once code has been written and run with no assertions, you run the risk that come code elsewhere has gone through there and 'got away' with bad behaviour. Having a test suite somewhat mitigates this, as does taking a firm stance as soon as possible.
Documenting non-null pointers with the signature
Some say that references are the way to state non-nullability of an alias parameter. Personally, I disagree. I have only ever had bad experiences when null pointers were allowed. If you haven't got an object, don't call a function expecting one. You wouldn't try to call a member function on a NULL pointer, would you? Suffice it to say that assertions support the firm stance on parameters, if used once we are inside the outer layer of that 'module' we talked about.
As an aside, you could document your function's requirements on its pointer by making it take a smart pointer, whose policies dictate that nulls are not allowed [Alexandrescu]. However, personally, I don't think it makes sense to allow null pointers into a function. It is only there to alleviate the calling code from the responsibility. I'd favour a mechanism where the default is that pointer parameters are valid, and to document/police it explicitly if otherwise. Then everyone would know what to expect.
I remember a time when in the Windows API, the strlen() function was changed (implementation, that is). Previously, it used to accept a NULL parameter, and return 0. Later versions didn't and crashed. Fair enough, I say. What's the point of measuring the length of a string object which doesn't exist? Of course, it might make some surrounding code have to do an extra check. I contend that by choosing a strategy of only using valid objects, the whole codebase can become cleaner.
The ambiguity of NULL pointers
Of course, certain things are going to be non-deterministic. For example, the user enters a search term, but none of your objects match that. So, your find function returns a NULL pointer? No, a better design would be to return an empty collection. The program design here is making the statement that the results of this find operation is a collection, and if you just specify a simple container like vector, then it can have nothing in it, no problem.
But what if you have a function which is only interested in one object? What might that be? One often finds interfaces of the kind: FindFirstObject(), FindNextObject(). This is quite common in the Win32 API, where the interface is a C interface. The failure to find any objects results in FindFirstObject() returning a NULL pointer. OK, but in C++ we have iterators and collections as standard, so I suggest that where possible, we construct interfaces to explicitly tell us when we have found nothing. An empty collection is one way, or return an iterator equal to end(). Note that in both these examples, we have an extra level of indirection introduced, which nicely solves the problem of duality of pointers (that duality being either valid object, or no object in one variable, a kind of union really). I note that pointers might be considered a type of iterator. However, idiomatic use of iterators is different to idiomatic use of raw pointers.
One of the most used functions which accepts this ambiguity is the delete operator. It is specified to be safe if given a NULL pointer, there being nothing done in that case. Whilst it means that code calling it does not have to make the NULL check first, I think it blurs the responsibility of the function, and is therefore a poor design. With a multitude of smart pointers to choose from, most code should not be calling delete directly anyway, and those places where it does should be limited to core functionality, so it would not hurt. In addition, consider that silently allowing NULL might be hiding errors where objects are not initialised properly.
Conclusion
There are two kinds of error in code - runtime and design-time. The former needs to be handled with logic flow in the code; the latter is a good candidate for assertions. Do not mix these two concepts; it should always be a clear choice.
Once inside the boundaries of a software module, making aggressive choices about the values you pass to functions can make code cleaner. It goes hand-in-hand with the practice of creating cleaner interfaces, which automatically remove ambiguity.
Acknowledgments
Thanks to Richard Blundell and Paul Thomas for comments and suggestions. Also, in my last article, I forgot to put the acknowledgements in. So, retrospective thanks to Phil Bass, Richard Blundell, Alan Griffiths and Anthony Williams.
References
[Goodliffe] Pete Goodliffe, CodeCraft, No Starch Press (see also http://nostarchpress.com/frameset.php?startat=codecraft).
[Alexandrescu] Andrei Alexandrescu, Modern C++ Design: Applied Generic and Design Patterns, Addison Wesley.
Overload Journal #78 - Apr 2007 + Programming Topics + Design of applications and programs
Browse in : |
All
> Journals
> Overload
> 78
(8)
All > Topics > Programming (877) All > Topics > Design (236) Any of these categories - All of these categories |