Journal Articles
Browse in : |
All
> Journals
> CVu
> 322
(9)
All > Topics > Design (236) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Expect the Unexpected (Part 1)
Author: Bob Schmidt
Date: 03 May 2020 18:28:21 +01:00 or Sun, 03 May 2020 18:28:21 +01:00
Summary: Pete Goodliffe looks into dealing with the inevitable.
Body:
We know that the only way to avoid error is to detect it, that the only way to detect it is to be free to enquire.
~ J. Robert Oppenheimer
At some point in life, everyone has this epiphany: The world doesn’t work as you expect it to. My one-year-old friend Tom learned this when climbing a chair four times his size. He expected to get to the top. The actual result surprised him: He ended up under a pile of furniture.
Is the world broken? Is it wrong? No. The world has plodded happily along its way for the last few million years, and looks set to continue for the foreseeable future. It’s our expectations that are wrong and need to be adjusted. As they say: Bad things happen, so deal with it. We must write code that deals with the Real World and its unexpected ways.
This is particularly difficult because the world mostly works as we’d expect it to, constantly lulling us into a false sense of security. The human brain is wired to cope, with built-in fail-safes. If someone bricks up your front door, your brain will process the problem, and you’ll stop before walking into an unexpected wall. But programs are not so clever; we have to tell them where the brick walls are and what to do when they hit one.
Don’t presume that everything in your program will always run smoothly. The world doesn’t always work as you’d expect it to: You must handle all possible error conditions in your code. It sounds simple enough, but that statement leads to a world of pain.
From whence it came
To expect the unexpected shows a thoroughly modern intellect.
~ Oscar Wilde
Errors can and will occur. Undesirable results can arise from almost any operation. They are distinct from bugs in a faulty program because you know beforehand that an error can occur. For example, the database file you want to open might have been deleted, a disk could fill up at any time and your next save operation might fail, or the web service you’re accessing might not currently be available
If you don’t write code to handle these error conditions, you will almost certainly end up with a bug; your program will not always work as you intend it to. But if the error happens only rarely, it will probably be a very subtle bug!
An error may occur for one of a thousand reasons, but it will fall into one of these three categories:
- User error
The stupid user manhandled your lovely program. Perhaps they provided the wrong input or attempted an operation that’s absolutely absurd. A good program will point out the mistake and help the user to rectify it. It won’t insult them or whine in an incomprehensible manner.
- Programmer error
The user pushed all the right buttons, but the code is broken. This is the consequence of a bug elsewhere, a fault the programmer introduced that the user can do nothing about (except to try and avoid it in the future). This kind of error should (ideally) never occur.
There’s a cycle here: Unhandled errors can cause bugs. And those bugs might result in further error conditions occurring elsewhere in your code. This is why we consider ‘defensive programming’ an important practice.
- Exceptional circumstances
The user pushed all the right buttons, and the programmer didn’t mess up. Fate’s fickle finger intervened, and we ran into something that couldn’t be avoided. Perhaps a network connection failed, we ran out of printer ink, or there’s no hard disk space left.
We need a well-defined strategy to manage each kind of error in our code. An error may be detected and reported to the user in a pop-up message box, or it may be detected by a middle-tier code layer and signalled to the client code programmatically. The same principles apply in both cases: whether a human chooses how to handle the problem or your code makes a decision – someone is responsible for acknowledging and acting on errors.
Errors are raised by subordinate components and communicated upwards, to be dealt with by the caller. They are reported in a number of ways; we’ll look at these in the next section. To take control of program execution, we must be able to:
- Raise an error when something goes wrong
- Detect all possible error reports
- Handle them appropriately
- Propagate errors we can’t handle
Errors are hard to deal with. The error you encounter is often not related to what you were doing at the time (most fall under the ‘exceptional circumstances’ category). They are also tedious to deal with – we want to focus on what our program should be doing, not on how it may go wrong. However, without good error management, your program will be brittle – built upon sand, not rock. At the first sign of wind or rain, it will collapse.
Take error handling seriously. The stability of your code rests on it.
Error-reporting mechanisms
There are several common strategies for propagating error information to client code. You’ll run into code that uses each of them, so you must know how to speak every dialect. Observe how these error-reporting techniques compare, and notice which situations call for each mechanism.
Each mechanism has different implications for the locality of error. An error is local in time if it is discovered very soon after it is created. An error is local in space if it is identified very close to (or even at) the site where it actually manifests. Some approaches specifically aim to reduce the locality of error to make it easier to see what’s going on (e.g., error codes). Others aim to extend the locality of error so that normal code doesn’t get entwined with error-handling logic (e.g., exceptions).
The favoured reporting mechanism is often an architectural decision. The architect might consider it important to define a homogeneous hierarchy of exception classes or a central list of shared reason codes to unify error-handling code.
No reporting
The simplest error-reporting mechanism is don’t bother. This works wonderfully in cases where you want your program to behave in bizarre and unpredictable ways and to crash randomly.
If you encounter an error and don’t know what to do about it, blindly ignoring it is not a viable option. You probably can’t continue the function’s work, but returning without fulfilling your function’s contract will leave the world in an undefined and inconsistent state.
Never ignore an error condition. If you don’t know how to handle the problem, signal a failure back up to the calling code. Don’t sweep an error under the rug and hope for the best.
An alternative to ignoring errors is to instantly abort the program upon encountering a problem. It’s easier than handling errors throughout the code, but hardly a well-engineered solution!
Return values
The next most simple mechanism is to return a success/failure value from your function. A boolean return value provides a simple yes or no answer. A more advanced approach enumerates all the possible exit statuses and returns a corresponding reason code. One value means success, the rest represent the many and varied abortive cases. This enumeration may be shared across the whole codebase, in which case your function returns a subset of the available values. You should therefore document what the caller can expect.
While this works well for procedures that don’t return data, passing error codes back with returned data gets messy. If int count()
walks down a linked list and returns the number of elements, how can it signify a list structure corruption? There are three approaches:
- Return a compound data type (or tuple) containing both the return value and an error code. This is rather clumsy in the popular C-like languages and is seldom seen in them.
- Use an ‘optional’ data type that can represent ‘no value, or a specific value’. This is a syntactically nicer version of a compound data type.
- Pass the error code back through a function parameter. In C++ or .NET, this parameter would be passed by reference. In C, you’d direct the variable access through pointers. This approach is ugly and non-intuitive; there is no syntactic way to distinguish a return value from a parameter.
Alternatively, reserve a range of return values to signify failure. The
count
example can nominate all negative numbers as error reason codes; they’d be meaningless answers anyway. Negative numbers are a common choice for this. Pointer return values may be given a specific invalid value, which by convention is zero (orNULL
). In Java and C#, you can return anull
object reference.This technique doesn’t always work well. Sometimes it’s hard to reserve an error range – all return values are equally meaningful and equally likely. It also has the side effect of reducing the available range of success values; the use of negative values reduces the possible positive values by an order of magnitude. (If you used an
unsigned int
then the number of values available would increase by a power of two, reusing thesigned int
’s sign bit.)
Error status variables
This method attempts to manage the contention between a function’s return value and its error status report. Rather than return a reason code, the function sets a shared global error variable. After calling the function, you must then inspect this status variable to find out whether or not it completed successfully.
The shared variable reduces confusion and clutter in the function’s signature, and it doesn’t restrict the return value’s data range at all. However, errors signalled through a separate channel are much easier to miss or wilfully ignore. A shared global variable also has nasty thread safety implications.
The C standard library employs this technique with its errno
variable. It has very subtle semantics: Before using any standard library facility, you must manually clear errno
. Nothing ever sets a succeeded value; only failures touch errno
. This is a common source of bugs, and makes calling each library function tedious. To add insult to injury, not all C standard library functions use errno
, so it is less than consistent.
This technique is functionally equivalent to using return values, but it has enough disadvantages to make you avoid it. Don’t write your own error reports this way, and use existing implementations with the utmost care.
Exceptions
Exceptions are a language facility for managing errors; not all languages support exceptions. Exceptions help to distinguish the normal flow of execution from exceptional cases – when a function has failed and cannot honour its contract. When your code encounters a problem that it can’t handle, it stops dead and throws up an exception – an object representing the error. The language runtime then automatically steps back up the call stack until it finds some exception-handling code. The error lands there, for the program to deal with.
There are two operational models, distinguished by what happens after an exception is handled:
- The termination model
The termination model (provided by C++, .NET and Java), in which execution continues after the handler that caught the exception.
- The resumption model
The resumption model, in which execution resumes where the exception was raised.
The former model is easier to reason about, but it doesn’t give ultimate control. It only allows error handling (you can execute code when you notice an error), not fault rectification (a chance to fix the problem and try again).
An exception cannot be ignored. If it isn’t caught and handled, it will propagate to the very top of the call stack and will usually stop the program dead in its tracks. The language runtime automatically cleans up as it unwinds the call stack. This makes exceptions a tidier and safer alternative to hand-crafted error-handling code. However, throwing exceptions through sloppy code can lead to memory leaks and problems with resource clean-up. (For example, you could allocate a block of memory and then exit early as an exception propagates through. The allocated memory would leak. This kind of problem makes writing code in the face of exceptions a complex business.) You must take care to write exception-safe code. The sidebar explains what this means in more detail.
The code that handles an exception is distinct from the code that raises it, and it may be arbitrarily far away. Exceptions are usually provided by OO languages, where errors are defined by a hierarchy of exception classes. A handler can elect to catch a quite specific class of error (by accepting a leaf class) or a more general category of error (by accepting a base class). Exceptions are particularly useful for signalling errors in a constructor.
Exceptions don’t come for free; the language support incurs a performance penalty. In practice, this isn’t significant and only manifests around exception-handling statements – exception handlers reduce the compiler’s optimization opportunities. This doesn’t mean that exceptions are flawed; their expense is justified compared to the cost of not doing any error handling at all!
Whistle-stop tour of exception safety | |
|
Signals
Signals are a more extreme reporting mechanism, largely used for errors sent by the execution environment to the running program. The operating system traps a number of exceptional events, like a floating point exception triggered by the maths coprocessor. These well-defined error events are delivered to the application in signals that interrupt the program’s normal flow of execution, jumping into a nominated signal handler function. Your program could receive a signal at any time, and the code must be able to cope with this. When the signal handler completes, program execution continues at the point it was interrupted.
Signals are the software equivalent of a hardware interrupt. They are a Unix concept, now provided on most platforms (a basic version is part of the ISO C standard [1]). The operating system provides sensible default handlers for each signal, some of which do nothing, others of which abort the program with a neat error message. You can override these with your own handler.
The defined C signal events include program termination, execution suspend/continue requests, and maths errors. Some environments extend the basic list with many more events.
Detecting errors
How you detect an error obviously depends on the mechanism reporting it. In practical terms, this means:
- Return values
You determine whether a function failed by looking at its return code. This failure test is bound tightly to the act of calling the function; by making the call, you are implicitly checking its success. Whether or not you do anything with that information is up to you.
- Error status variables
After calling a function, you must inspect the error status variable. If it follows C’s
errno
model of operation, you don’t actually need to test for errors after every single function call. First reseterrno
, then call any number of standard library functions back-to-back. Afterwards, inspecterrno
. If it contains an error value, then one of those functions failed. Of course, you don’t know what fell over, but if you don’t care, then this is a streamlined error detection approach. - Exceptions
If an exception propagates out of a subordinate function, you can choose to catch and handle it or to ignore it and let the exception flow up a level. You can only make an informed choice when you know what kinds of exceptions might be thrown. You’ll only know this if it has been documented (and if you trust the documentation).
Java’s exception implementation places this documentation in the code itself. The programmer has to write an exception specification for every method, describing what it can throw; it is a part of the function’s signature. Java is the only mainstream language to enforce this approach. You cannot leak an exception that isn’t in the list, because the compiler performs static checking to prevent it (C++ also supports exception specifications, but leaves their use optional. It’s idiomatic to avoid them – for performance reasons, among others. Unlike Java, they are enforced at run time).
- Signals
There’s only one way to detect a signal: Install a hander for it. There’s no obligation. You can also choose not to install any signal handlers at all, and accept the default behaviour.
As various pieces of code converge in a large system, you will probably need to detect errors in more than one way, even within a single function. Whichever detection mechanism you use, the key point is this:
Never ignore any errors that might be reported to you. If an error report channel exists, it’s there for a reason.
It is good practice to always write error-detection scaffolding – even if an error has no implication for the rest of your code. This makes it clear to a maintenance programmer that you know a function may fail and have consciously chosen to ignore any failures.
When you let an exception propagate through your code, you are not ignoring it – you can’t ignore an exception. You are allowing it to be handled by a higher level. The philosophy of exception handling is quite different in this respect. It’s less clear what the most appropriate way to document this is – should you write a try
/catch
block that simply rethrow
s the exception, should you write a comment claiming that the code is exception safe, or should you do nothing? I’d favour documenting the exception behaviour.
Next time
So that’s an investigation of the landscape of ‘error conditions’ in our code. We’ve seen what errors are, what causes them, how we detect and report error situations, and why we care. Errors are not (necessarily) caused by failures of the programmer. But not considering error conditions would be a failure of the programmer.
In the next instalment, we’ll consider the best strategies to handle and recover from error situations. We’ll see the very practical code implications of good error case handling.
Reference
[1] ISO: The C Standard, the original was published in 1999 but has been superseded by the 2018 version of the document: https://www.iso.org/standard/74528.html
is a programmer who never stays at the same place in the software food chain. He has a passion for curry and doesn’t wear shoes.
Notes:
More fields may be available via dynamicdata ..