Journal Articles
Browse in : |
All
> Journals
> CVu
> 323
(11)
All > Topics > Process (83) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Expect the Unexpected (Part 2)
Author: Bob Schmidt
Date: 09 July 2020 17:15:53 +01:00 or Thu, 09 July 2020 17:15:53 +01:00
Summary: Pete Goodliffe continues to deal with the inevitable.
Body:
In the previous instalment of this mini-series, we looked at the landscape of ‘error conditions’ in our code. We investigated what errors are, what causes them, how we detect, and how we report error situations.
Now, let’s look at the best strategies to handle error conditions in our code, to ensure our application logic recovers well. Or as well as can be expected. This is where we get practical.
Handling errors
Love truth, and pardon error.
~ Voltaire
Errors happen. We’ve seen how to discover them and when to do so. The question now is: What do you do about them? This is the hard part. The answer largely depends on circumstance and the gravity of an error – whether it’s possible to rectify the problem and retry the operation or to carry on regardless. Often there is no such luxury; the error may even herald the beginning of the end. The best you can do is clean up and exit sharply, before anything else goes wrong.
To make this kind of decision, you must be informed. You need to know a few key pieces of information about the error:
- Where it came from
This is quite distinct from where it’s going to be handled. Is the source a core system component or a peripheral module? This information may be encoded in the error report; if not, you can figure it out manually.
- What you were trying to do
What provoked the error? This may give a clue toward any remedial action. Error reporting seldom contains this kind of information, but you can figure out which function was called from the context.
- Why it went wrong
What is the nature of the problem? You need to know exactly what happened, not just a general class of error. How much of the erroneous operation completed? All or none are nice answers, but generally, the program will be in some indeterminate state between the two.
- When it happened
This is the locality of the error in time. Has the system only just failed, or is a problem two hours old finally being felt?
- The severity of the error
Some problems are more serious than others, but when detected, one error is equivalent to another – you can’t continue without understanding and managing the problem. Error severity is usually determined by the caller, based on how easy it will be to recover or work around the error.
- How to fix it
This may be obvious (e.g., insert a floppy disk and retry) or not (e.g., you need to modify the function parameters so they are consistent). More often than not, you have to infer this knowledge from the other information you have.
Given this depth of information, you can formulate a strategy to handle each error. Forgetting to insert a handler for any potential error will lead to a bug, and it might turn out to be a bug that is hard to exercise and hard to track down – so think about every error condition carefully.
When to deal with errors
When should you handle each error? This can be separate from when it’s detected. There are two schools of thought.
- As soon as possible
Handle each error as you detect it. Since the error is handled near to its cause, you retain important contextual information, making the error-handling code clearer. This is a well known self-documenting code technique. Managing each error near its source means that control passes through less code in an invalid state.
This is usually the best option for functions that return error codes.
- As late as possible
Alternatively, you could defer error handling for as long as possible. This recognizes that code detecting an error rarely knows what to do about it. It often depends on the context in which it is used: A missing file error may be reported to the user when loading a document but silently swallowed when hunting for a preferences file.
Exceptions are ideal for this; you can pass an exception through each level until you know how to deal with the error. This separation of detection and handling may be clearer, but it can make code more complex. It’s not obvious that you are deliberately deferring error handling, and it’s not clear where an error came from when you do finally handle it.
In theory, it’s nice to separate ‘business logic’ from error handling. But often you can’t, as clean-up is necessarily entwined with that business logic, and it can be more tortuous to write the two separately. However, centralized error-handling code has advantages: You know where to look for it, and you can put the abort/continue policy in one place rather than scatter it through many functions.
Thomas Jefferson once declared, “Delay is preferable to error.†There is truth there; the actual existence of error handling is far more important than when an error is handled. Nevertheless, choose a compromise that’s close enough to prevent obscure and out-of-context error handling, while being far enough away to not cloud normal code with roundabout paths and error-handling dead ends.
Handle each error in the most appropriate context, as soon as you know enough about it to deal with it correctly.
Possible reactions
You’ve caught an error. You’re poised to handle it. What are you going to do now? Hopefully, whatever is required for correct program operation. While we can’t list every recovery technique under the sun, here are the common reactions to consider.
- Logging
Any reasonably large project should already be employing a logging facility. It allows you to collect important trace information, and is an entry point for the investigation of nasty problems.
The log exists to record interesting events in the life of the program, to allow you to delve into its inner workings and reconstruct paths of execution. For this reason, all errors you encounter should be detailed in the program log; they are some of the most interesting and telling events of all. Aim to capture all pertinent information – as much of the previous list as you can.
For really obscure errors that predict catastrophic disaster, it may be a good idea to get the program to ‘phone home’ – to transmit either a snapshot of itself or a copy of the error log to the developers for further investigation.
What you do after logging is another matter.
- Reporting
A program should only report an error to the user when there’s nothing left to do. The user does not need to be bombarded by a thousand small nuggets of useless information or badgered by a raft of pointless questions. Save the interaction for when it’s really vital. Don’t report when you encounter a recoverable situation. By all means, log the event, but keep quiet about it. Provide a mechanism that enables users to read the event log if you think one day they might care.
There are some problems that only the user can fix. For these, it is good practice to report the problem immediately, in order to allow the user the best chance to resolve the situation or else decide how to continue.
Of course, this kind of reporting depends on whether or not the program is interactive. Deeply embedded systems are expected to cope on their own; it’s hard to pop up a dialog box on a washing machine.
- Recovery
Sometimes your only course of action is to stop immediately. But not all errors spell doom. If your program saves a file, one day the disk will fill up and the save operation will fail. The user expects your program to continue happily, so be prepared.
If your code encounters an error and doesn’t know what to do about it, pass the error upwards. It’s more than likely your caller will have the ability to recover.
- Ignore
I only include this for completeness. Hopefully by now you’ve learned to scorn the very suggestion of ignoring an error. If you choose to forget all about handling it and to just continue with your fingers crossed, good luck. This is where most of the bugs in any software package will come from. Ignoring an error whose occurrence may cause the system to misbehave inevitably leads to hours of debugging.
Ignoring errors does not save time. You’ll spend far longer working out the cause of bad program behaviour than you ever would have spent writing the error handler.
You can, however, write code that allows you to do nothing when an error crops up. Is that a blatant contradiction? No. It is possible to write code that copes with an inconsistent world, that can carry on correctly in the face of an error – but it often gets quite convoluted. If you adopt this approach, you must make it obvious in the code. Don’t risk having it misinterpreted as ignorant and incorrect.
- Propagate
When a subordinate function call fails, you probably can’t carry on, but you might not know what else to do. The only option is to clean up and propagate the error report upwards. You have options. There are two ways to propagate an error:
- Export the same error information you were fed (return the same reason code or propagate exceptions)
- Reinterpret the information, sending a more meaningful message to the next level up (return a different reason code or catch and wrap up exceptions)
Ask yourself this question: Does the error relate to a concept exposed through the module interface? If so, it’s okay to propagate that same error. Otherwise, recast it in the appropriate light, choosing an error report that makes sense in the context of your module’s interface. This is a good self-documenting code technique.
Crafting error messages | |
|
Code implications
Show me the code! Let’s spend some time investigating the implications of error handling in our code. As we’ll see, it is not easy to write good error handling that doesn’t twist and warp the underlying program logic.
The first piece of code we’ll look at is a common error-handling structure. Yet it isn’t a particularly intelligent approach for writing error-tolerant code. The aim is to call three functions sequentially – each of which may fail – and perform some intermediate calculations along the way. Spot the problems with Listing 1.
void nastyErrorHandling() { if (operationOne()) { ... do something ... if (operationTwo()) { ... do something else ... if (operationThree()) { ... do more ... } } } } |
Listing 1 |
Syntactically it’s fine; the code will work. Practically, it’s an unpleasant style to maintain. The more operations you need to perform, the more deeply nested the code gets and the harder it is to read. This kind of error handling quickly leads to a rat’s nest of conditional statements. It doesn’t reflect the actions of the code very well; each intermediate calculation could be considered the same level of importance, yet they are nested at different levels.
Can we avoid these problems? Yes – there are a few alternatives. The first variant (see Listing 2) flattens the nesting. It is semantically equivalent, but it introduces some new complexity, since flow control is now dependent on the value of a new status variable, ok
.
void flattenedErrorHandling() { bool ok = operationOne(); if (ok) { ... do something ... ok = operationTwo(); } if (ok) { ... do something else ... ok = operationThree(); } if (ok) { ... do more ... } if (!ok) { ... clean up after errors ... } } |
Listing 2 |
We’ve also added an opportunity to clean up after any errors. Is that sufficient to mop up all failures? Probably not; the necessary clean-up may depend on how far we got through the function before lightening struck. There are two clean-up approaches:
- Perform a little clean-up after each operation that may fail, then return early. This inevitably leads to duplication of clean-up code. The more work you’ve done, the more you have to clean up, so each exit point will need to do gradually more unpicking.
If each operation in our example allocates some memory, each early exit point will have to release all allocations made to date. The further in, the more releases. That will lead to some quite dense and repetitive error-handling code, which makes the function far larger and far harder to understand.
- Write the clean-up code once, at the end of the function, but write it in such a way as to only clean up what’s dirty. This is neater, but if you inadvertently insert an early return in the middle of the function, the clean-up code will be bypassed.
If you’re not overly concerned about writing Single Entry, Single Exit (SESE) functions, the next example removes the reliance on a separate control flow variable. (Although this clearly isn’t SESE, I contend that the previous example isn’t, either. There is only one exit point, at the end, but the contrived control flow is simulating early exit – it might as well have multiple exits. This is a good example of how being bound by a rule like SESE can lead to bad code, unless you think carefully about what you’re doing.) We do lose the clean-up code again, though. Simplicity renders Listing 3 a better description of the actual intent.
void shortCircuitErrorHandling() { if (!operationOne()) return; ... do something ... if (!operationTwo()) return; ... do something else ... if (!operationThree()) return; ... do more ... } |
Listing 3 |
A combination of this short circuit exit with the requirement for clean-up leads to the approach in Listing 4, especially seen in low-level systems code. Some people advocate it as the only valid use for the maligned goto
. I’m still not convinced.
void gotoHell() { if (!operationOne()) goto error; ... do something ... if (!operationTwo()) goto error; ... do something else ... if (!operationThree()) goto error; ... do more ... return; error: ... clean up after errors ... } |
Listing 4 |
You can avoid such monstrous code in C++ using Resource Acquisition Is Initialization (RAII)) techniques like smart pointers [1]. This has the bonus of providing exception safety – when an exception terminates your function prematurely, resources are automatically deallocated. These techniques avoid a lot of the problems we’ve seen above, moving complexity to a separate flow of control.
The same example using exceptions would look like this (in C++, Java, and C#), presuming that all subordinate functions do not return error codes but instead throw exceptions (see Listing 5).
void exceptionalHandling() { try { operationOne(); ... do something ... operationTwo(); ... do something else ... operationThree(); ... do more ... } catch (...) { ... clean up after errors ... } } |
Listing 5 |
This is only a basic exception example, but it shows just how neat exceptions can be. A sound code design might not need the try
/catch
block at all if it ensures that no resource is leaked and leaves error handling to a higher level. But alas, writing good code in the face of exceptions requires an understanding of principles beyond the scope of this chapter.
Raising hell
We’ve put up with other people’s errors for long enough. It’s time to turn the tables and play the bad guy: Let’s raise some errors. When writing a function, erroneous things will happen that you’ll need to signal to your caller. Make sure you do – don’t silently swallow any failure. Even if you’re sure that the caller won’t know what to do in the face of the problem, it must remain informed. Don’t write code that lies and pretends to be doing something it’s not.
Which reporting mechanism should you use? It’s largely an architectural choice; obey the project conventions and the common language idioms. In languages with the facility, it is common to favour exceptions, but only use them if the rest of the project does. Java and C# really leave you with no choice; exceptions are buried deep in their execution run times. A C++ architecture may choose to forego this facility to achieve portability with platforms that have no exception support or to interface with older C code.
We’ve already seen strategies for propagating errors from subordinate function calls. Our main concern here is reporting fresh problems encountered during execution. How you determine these errors is your own business, but when reporting them, consider the following:
- Have you cleaned up appropriately first? Reliable code doesn’t leak resources or leave the world in an inconsistent state, even when an error occurs, unless it’s really unavoidable. If you do either of these things, it must be documented carefully. Consider what will happen the next time your code is called if this error has manifested. Ensure it will still work.
- Don’t leak inappropriate information to the outside world in your error reports. Only return useful information that the caller understands and can act on.
- Use exceptions correctly. Don’t throw an exception for unusual return values – the rare but not erroneous cases. Only use exceptions to signal circumstances where a function is not able to meet its contract. Don’t use them non-idiomatically (i.e., for flow control).
- Consider using assertions if you’re trapping an error that should never happen in the normal course of program execution, a genuine programming error. Exceptions are a valid choice for this too – some assertion mechanisms can be configured to throw exceptions when they trigger.
- If you can pull forward any tests to compile time, then do so. The sooner you detect and rectify an error, the less hassle it can cause.
- Make it hard for people to ignore your errors. Given half a chance, someone will use your code badly. Exceptions are good for this – you have to act deliberately to hide an exception.
What kind of errors should you be looking out for? This obviously depends on what the function is doing. Here’s a checklist of the general kinds of error checks you should make in each function:
- Check all function parameters. Ensure you have been given correct and consistent input. Consider using assertions for this, depending on how strictly your contract was written. (Is it an offence to supply bad parameters?)
- Check that invariants are satisfied at interesting points in execution.
- Check all values from external sources for validity before you use them. File contents and interactive input must be sensible, with no missing pieces.
- Check the return status of all system calls and other subordinate function calls.
Exceptions are a powerful error reporting mechanism. Used well, they can simplify your code greatly while helping you to write robust software. In the wrong hands, though, they are a deadly weapon.
I once worked on a project where it was routine for programmers to break a while
loop or end recursion by throwing an exception, using it as a non-local goto
. It’s an interesting idea, and kind of cute when you first see it. But this behaviour is nothing more than an abuse of exceptions: It isn’t what exceptions are idiomatically used for. More than one critical bug was caused by a maintenance programmer not understanding the flow of control through a complex, magically terminated loop.
Follow the idioms of your language, and don’t write cute code for the sake of it.
Managing errors
The common principle uniting the raising and handling of errors is to have a consistent strategy for dealing with failure, wherever it manifests. These are general considerations for managing the occurrence, detection, and handling of program errors:
- Avoid things that could cause errors. Can you do something that is guaranteed to work, instead? For example, avoid allocation errors by reserving enough resource beforehand. With an assured pool of memory, your routine cannot suffer memory restrictions. Naturally, this will only work when you know how much resource you need up front, but you often do.
- Define the program or routine’s expected behavior under abnormal circumstances. This determines how robust the code needs to be and therefore how thorough your error handling should be. Can a function silently generate bad output, subscribing to the historic GIGO principle (that is, Garbage in, garbage out – feed it trash, and it will happily spit out trash).
- Clearly define which components are responsible for handling which errors. Make it explicit in the module’s interface. Ensure that your client knows what will always work and what may one day fail.
- Check your programming practice: When do you write error-handling code? Don’t put it off until later; you’ll forget to handle something. Don’t wait until your development testing highlights problems before writing handlers – that’s not an engineering approach.
Write all error detection and handling now, as you write the code that may fail. Don’t put it off until later. If you must be evil and defer handling, at least write the detection scaffolding now.
- When trapping an error, have you found a symptom or a cause? Consider whether you’ve discovered the source of a problem that needs to be rectified here or if you’ve discovered a symptom of an earlier problem. If it’s the latter, then don’t write reams of handling code here, put that in a more appropriate (earlier) error handler.
Conclusion
To err is human; to repent, divine; to persist, devilish.
~ Benjamin Franklin
To err is human (but computers seem quite good at it, too). To handle these errors is divine.
Every line of code you write must be balanced by appropriate and thorough error checking and handling. A program without rigorous error handling will not be stable. One day an obscure error may occur, and the program will fall over as a result.
Handling errors and failure cases is hard work. It bogs programming down in the mundane details of the Real World. However, it’s absolutely essential. As much as 90 percent of the code you write handles exceptional circumstances [2]. That’s a surprising statistic, so write code expecting to put far more effort into the things that can go wrong than the things that will go right.
Questions
- Are return values and exceptions equivalent error reporting mechanisms?
- How should you handle the occurrence of errors in your error-handling code?
- How thorough is the error handling in your current codebase? How does this contribute to the stability of the program?
- Do you naturally consider error handling as you write code, or do you find it a distraction, preferring to come back to it later?
References
[1] Stroustrup (1997) Resource Acquisition Is Initialization (RAII)) techniques like smart pointers
[2] Bentley, Jon Louis (1982) Writing Efficient Programs. Prentice Hall Professional, ISBN-10: 013970244X
Pete Goodliffe is a programmer who never stays at the same place in the software food chain. He has a passion for curry and doesn’t wear shoes.
Notes:
More fields may be available via dynamicdata ..