Journal Articles

CVu Journal Vol 32, #2 - May 2020 + Design of applications and programs
Browse in : All > Journals > CVu > 322 (9)
All > Topics > Design (236)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Expect the Unexpected (Part 1)

Author: Bob Schmidt

Date: 03 May 2020 18:28:21 +01:00 or Sun, 03 May 2020 18:28:21 +01:00

Summary: Pete Goodliffe looks into dealing with the inevitable.

Body: 

We know that the only way to avoid error is to detect it, that the only way to detect it is to be free to enquire.
~ J. Robert Oppenheimer

At some point in life, everyone has this epiphany: The world doesn’t work as you expect it to. My one-year-old friend Tom learned this when climbing a chair four times his size. He expected to get to the top. The actual result surprised him: He ended up under a pile of furniture.

Is the world broken? Is it wrong? No. The world has plodded happily along its way for the last few million years, and looks set to continue for the foreseeable future. It’s our expectations that are wrong and need to be adjusted. As they say: Bad things happen, so deal with it. We must write code that deals with the Real World and its unexpected ways.

This is particularly difficult because the world mostly works as we’d expect it to, constantly lulling us into a false sense of security. The human brain is wired to cope, with built-in fail-safes. If someone bricks up your front door, your brain will process the problem, and you’ll stop before walking into an unexpected wall. But programs are not so clever; we have to tell them where the brick walls are and what to do when they hit one.

Don’t presume that everything in your program will always run smoothly. The world doesn’t always work as you’d expect it to: You must handle all possible error conditions in your code. It sounds simple enough, but that statement leads to a world of pain.

From whence it came

To expect the unexpected shows a thoroughly modern intellect.
~ Oscar Wilde

Errors can and will occur. Undesirable results can arise from almost any operation. They are distinct from bugs in a faulty program because you know beforehand that an error can occur. For example, the database file you want to open might have been deleted, a disk could fill up at any time and your next save operation might fail, or the web service you’re accessing might not currently be available

If you don’t write code to handle these error conditions, you will almost certainly end up with a bug; your program will not always work as you intend it to. But if the error happens only rarely, it will probably be a very subtle bug!

An error may occur for one of a thousand reasons, but it will fall into one of these three categories:

We need a well-defined strategy to manage each kind of error in our code. An error may be detected and reported to the user in a pop-up message box, or it may be detected by a middle-tier code layer and signalled to the client code programmatically. The same principles apply in both cases: whether a human chooses how to handle the problem or your code makes a decision – someone is responsible for acknowledging and acting on errors.

Errors are raised by subordinate components and communicated upwards, to be dealt with by the caller. They are reported in a number of ways; we’ll look at these in the next section. To take control of program execution, we must be able to:

Errors are hard to deal with. The error you encounter is often not related to what you were doing at the time (most fall under the ‘exceptional circumstances’ category). They are also tedious to deal with – we want to focus on what our program should be doing, not on how it may go wrong. However, without good error management, your program will be brittle – built upon sand, not rock. At the first sign of wind or rain, it will collapse.

Take error handling seriously. The stability of your code rests on it.

Error-reporting mechanisms

There are several common strategies for propagating error information to client code. You’ll run into code that uses each of them, so you must know how to speak every dialect. Observe how these error-reporting techniques compare, and notice which situations call for each mechanism.

Each mechanism has different implications for the locality of error. An error is local in time if it is discovered very soon after it is created. An error is local in space if it is identified very close to (or even at) the site where it actually manifests. Some approaches specifically aim to reduce the locality of error to make it easier to see what’s going on (e.g., error codes). Others aim to extend the locality of error so that normal code doesn’t get entwined with error-handling logic (e.g., exceptions).

The favoured reporting mechanism is often an architectural decision. The architect might consider it important to define a homogeneous hierarchy of exception classes or a central list of shared reason codes to unify error-handling code.

No reporting

The simplest error-reporting mechanism is don’t bother. This works wonderfully in cases where you want your program to behave in bizarre and unpredictable ways and to crash randomly.

If you encounter an error and don’t know what to do about it, blindly ignoring it is not a viable option. You probably can’t continue the function’s work, but returning without fulfilling your function’s contract will leave the world in an undefined and inconsistent state.

Never ignore an error condition. If you don’t know how to handle the problem, signal a failure back up to the calling code. Don’t sweep an error under the rug and hope for the best.

An alternative to ignoring errors is to instantly abort the program upon encountering a problem. It’s easier than handling errors throughout the code, but hardly a well-engineered solution!

Return values

The next most simple mechanism is to return a success/failure value from your function. A boolean return value provides a simple yes or no answer. A more advanced approach enumerates all the possible exit statuses and returns a corresponding reason code. One value means success, the rest represent the many and varied abortive cases. This enumeration may be shared across the whole codebase, in which case your function returns a subset of the available values. You should therefore document what the caller can expect.

While this works well for procedures that don’t return data, passing error codes back with returned data gets messy. If int count() walks down a linked list and returns the number of elements, how can it signify a list structure corruption? There are three approaches:

Error status variables

This method attempts to manage the contention between a function’s return value and its error status report. Rather than return a reason code, the function sets a shared global error variable. After calling the function, you must then inspect this status variable to find out whether or not it completed successfully.

The shared variable reduces confusion and clutter in the function’s signature, and it doesn’t restrict the return value’s data range at all. However, errors signalled through a separate channel are much easier to miss or wilfully ignore. A shared global variable also has nasty thread safety implications.

The C standard library employs this technique with its errno variable. It has very subtle semantics: Before using any standard library facility, you must manually clear errno. Nothing ever sets a succeeded value; only failures touch errno. This is a common source of bugs, and makes calling each library function tedious. To add insult to injury, not all C standard library functions use errno, so it is less than consistent.

This technique is functionally equivalent to using return values, but it has enough disadvantages to make you avoid it. Don’t write your own error reports this way, and use existing implementations with the utmost care.

Exceptions

Exceptions are a language facility for managing errors; not all languages support exceptions. Exceptions help to distinguish the normal flow of execution from exceptional cases – when a function has failed and cannot honour its contract. When your code encounters a problem that it can’t handle, it stops dead and throws up an exception – an object representing the error. The language runtime then automatically steps back up the call stack until it finds some exception-handling code. The error lands there, for the program to deal with.

There are two operational models, distinguished by what happens after an exception is handled:

The former model is easier to reason about, but it doesn’t give ultimate control. It only allows error handling (you can execute code when you notice an error), not fault rectification (a chance to fix the problem and try again).

An exception cannot be ignored. If it isn’t caught and handled, it will propagate to the very top of the call stack and will usually stop the program dead in its tracks. The language runtime automatically cleans up as it unwinds the call stack. This makes exceptions a tidier and safer alternative to hand-crafted error-handling code. However, throwing exceptions through sloppy code can lead to memory leaks and problems with resource clean-up. (For example, you could allocate a block of memory and then exit early as an exception propagates through. The allocated memory would leak. This kind of problem makes writing code in the face of exceptions a complex business.) You must take care to write exception-safe code. The sidebar explains what this means in more detail.

The code that handles an exception is distinct from the code that raises it, and it may be arbitrarily far away. Exceptions are usually provided by OO languages, where errors are defined by a hierarchy of exception classes. A handler can elect to catch a quite specific class of error (by accepting a leaf class) or a more general category of error (by accepting a base class). Exceptions are particularly useful for signalling errors in a constructor.

Exceptions don’t come for free; the language support incurs a performance penalty. In practice, this isn’t significant and only manifests around exception-handling statements – exception handlers reduce the compiler’s optimization opportunities. This doesn’t mean that exceptions are flawed; their expense is justified compared to the cost of not doing any error handling at all!

Whistle-stop tour of exception safety

Resilient code must be exception safe. It must work correctly (for some definition of correctly, which we’ll investigate below), no matter what exceptions come its way. This is true regardless of whether or not the code catches any exceptions itself.

Exception-neutral code propagates all exceptions up to the caller; it won’t consume or change anything. This is an important concept for generic programs like C++ template code – the template types may generate all sorts of exceptions that template implementors don’t understand.

There are several different levels of exception safety. They are described in terms of guarantees to the calling code. These guarantees are:

  • Basic guarantee If exceptions occur in a function (resulting from an operation you perform or the call of another function), it will not leak resources. The code state will be consistent (i.e., it can still be used correctly), but it will not necessarily leave in a known state. For example: A member function should add 10 items to a container, but an exception propagates through it. The container is still usable; maybe no objects were inserted, maybe all 10 were, or perhaps every other object was added.
  • Strong guarantee This is far more strict than the basic guarantee. If an exception propagates through your code, the program state remains completely unchanged. No object is altered, no global variables changed, nothing. In the example above, nothing was inserted into the container.
  • Nothrow guarantee The final guarantee is the most restrictive: that an operation can never throw an exception. If we are exception neutral, then this implies the function cannot do anything else that might throw an exception.

Which guarantee you provide is entirely your choice. The more restrictive the guarantee, the more widely (re)usable the code is. In order to implement the strong guarantee, you will generally need a number of functions providing the nothrow guarantee.

Most notably, every destructor you write must honour the nothrow guarantee. (That’s the case in C++ and Java, at least. C# stupidly called ~X() a destructor, even though it was a finalizer in disguise. Throwing an exception in a C# destructor has different implications.) Otherwise, all exception handling bets are off. In the presence of an exception, object destructors are called automatically as the stack is unwound. Raising an exception while handling an exception is not permissible.

Signals

Signals are a more extreme reporting mechanism, largely used for errors sent by the execution environment to the running program. The operating system traps a number of exceptional events, like a floating point exception triggered by the maths coprocessor. These well-defined error events are delivered to the application in signals that interrupt the program’s normal flow of execution, jumping into a nominated signal handler function. Your program could receive a signal at any time, and the code must be able to cope with this. When the signal handler completes, program execution continues at the point it was interrupted.

Signals are the software equivalent of a hardware interrupt. They are a Unix concept, now provided on most platforms (a basic version is part of the ISO C standard [1]). The operating system provides sensible default handlers for each signal, some of which do nothing, others of which abort the program with a neat error message. You can override these with your own handler.

The defined C signal events include program termination, execution suspend/continue requests, and maths errors. Some environments extend the basic list with many more events.

Detecting errors

How you detect an error obviously depends on the mechanism reporting it. In practical terms, this means:

As various pieces of code converge in a large system, you will probably need to detect errors in more than one way, even within a single function. Whichever detection mechanism you use, the key point is this:

Never ignore any errors that might be reported to you. If an error report channel exists, it’s there for a reason.

It is good practice to always write error-detection scaffolding – even if an error has no implication for the rest of your code. This makes it clear to a maintenance programmer that you know a function may fail and have consciously chosen to ignore any failures.

When you let an exception propagate through your code, you are not ignoring it – you can’t ignore an exception. You are allowing it to be handled by a higher level. The philosophy of exception handling is quite different in this respect. It’s less clear what the most appropriate way to document this is – should you write a try/catch block that simply rethrows the exception, should you write a comment claiming that the code is exception safe, or should you do nothing? I’d favour documenting the exception behaviour.

Next time

So that’s an investigation of the landscape of ‘error conditions’ in our code. We’ve seen what errors are, what causes them, how we detect and report error situations, and why we care. Errors are not (necessarily) caused by failures of the programmer. But not considering error conditions would be a failure of the programmer.

In the next instalment, we’ll consider the best strategies to handle and recover from error situations. We’ll see the very practical code implications of good error case handling.

Reference

[1] ISO: The C Standard, the original was published in 1999 but has been superseded by the 2018 version of the document: https://www.iso.org/standard/74528.html

Pete Goodliffe is a programmer who never stays at the same place in the software food chain. He has a passion for curry and doesn’t wear shoes.

Notes: 

More fields may be available via dynamicdata ..