Journal Articles

CVu Journal Vol 32, #2 - May 2020 + Design of applications and programs

Browse in :

All > Journals > CVu > 322 (9)
All > Topics > Design (236)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Expect the Unexpected (Part 1)

Author: Bob Schmidt

Date: 03 May 2020 18:28:21 +01:00 or Sun, 03 May 2020 18:28:21 +01:00

Summary: Pete Goodliffe looks into dealing with the inevitable.

Body:

We know that the only way to avoid error is to detect it, that the only way to detect it is to be free to enquire.
~ J. Robert Oppenheimer

At some point in life, everyone has this epiphany: The world doesnâ€™t work as you expect it to. My one-year-old friend Tom learned this when climbing a chair four times his size. He expected to get to the top. The actual result surprised him: He ended up under a pile of furniture.

Is the world broken? Is it wrong? No. The world has plodded happily along its way for the last few million years, and looks set to continue for the foreseeable future. Itâ€™s our expectations that are wrong and need to be adjusted. As they say: Bad things happen, so deal with it. We must write code that deals with the Real World and its unexpected ways.

This is particularly difficult because the world mostly works as weâ€™d expect it to, constantly lulling us into a false sense of security. The human brain is wired to cope, with built-in fail-safes. If someone bricks up your front door, your brain will process the problem, and youâ€™ll stop before walking into an unexpected wall. But programs are not so clever; we have to tell them where the brick walls are and what to do when they hit one.

Donâ€™t presume that everything in your program will always run smoothly. The world doesnâ€™t always work as youâ€™d expect it to: You must handle all possible error conditions in your code. It sounds simple enough, but that statement leads to a world of pain.

From whence it came

To expect the unexpected shows a thoroughly modern intellect.
~ Oscar Wilde

Errors can and will occur. Undesirable results can arise from almost any operation. They are distinct from bugs in a faulty program because you know beforehand that an error can occur. For example, the database file you want to open might have been deleted, a disk could fill up at any time and your next save operation might fail, or the web service youâ€™re accessing might not currently be available

If you donâ€™t write code to handle these error conditions, you will almost certainly end up with a bug; your program will not always work as you intend it to. But if the error happens only rarely, it will probably be a very subtle bug!

An error may occur for one of a thousand reasons, but it will fall into one of these three categories:

User error
The stupid user manhandled your lovely program. Perhaps they provided the wrong input or attempted an operation thatâ€™s absolutely absurd. A good program will point out the mistake and help the user to rectify it. It wonâ€™t insult them or whine in an incomprehensible manner.
Programmer error
The user pushed all the right buttons, but the code is broken. This is the consequence of a bug elsewhere, a fault the programmer introduced that the user can do nothing about (except to try and avoid it in the future). This kind of error should (ideally) never occur.

Thereâ€™s a cycle here: Unhandled errors can cause bugs. And those bugs might result in further error conditions occurring elsewhere in your code. This is why we consider â€˜defensive programmingâ€™ an important practice.
Exceptional circumstances
The user pushed all the right buttons, and the programmer didnâ€™t mess up. Fateâ€™s fickle finger intervened, and we ran into something that couldnâ€™t be avoided. Perhaps a network connection failed, we ran out of printer ink, or thereâ€™s no hard disk space left.

We need a well-defined strategy to manage each kind of error in our code. An error may be detected and reported to the user in a pop-up message box, or it may be detected by a middle-tier code layer and signalled to the client code programmatically. The same principles apply in both cases: whether a human chooses how to handle the problem or your code makes a decision â€“ someone is responsible for acknowledging and acting on errors.

Errors are raised by subordinate components and communicated upwards, to be dealt with by the caller. They are reported in a number of ways; weâ€™ll look at these in the next section. To take control of program execution, we must be able to:

Raise an error when something goes wrong
Detect all possible error reports
Handle them appropriately
Propagate errors we canâ€™t handle

Errors are hard to deal with. The error you encounter is often not related to what you were doing at the time (most fall under the â€˜exceptional circumstancesâ€™ category). They are also tedious to deal with â€“ we want to focus on what our program should be doing, not on how it may go wrong. However, without good error management, your program will be brittle â€“ built upon sand, not rock. At the first sign of wind or rain, it will collapse.

Take error handling seriously. The stability of your code rests on it.

Error-reporting mechanisms

There are several common strategies for propagating error information to client code. Youâ€™ll run into code that uses each of them, so you must know how to speak every dialect. Observe how these error-reporting techniques compare, and notice which situations call for each mechanism.

Each mechanism has different implications for the locality of error. An error is local in time if it is discovered very soon after it is created. An error is local in space if it is identified very close to (or even at) the site where it actually manifests. Some approaches specifically aim to reduce the locality of error to make it easier to see whatâ€™s going on (e.g., error codes). Others aim to extend the locality of error so that normal code doesnâ€™t get entwined with error-handling logic (e.g., exceptions).

The favoured reporting mechanism is often an architectural decision. The architect might consider it important to define a homogeneous hierarchy of exception classes or a central list of shared reason codes to unify error-handling code.

No reporting

The simplest error-reporting mechanism is donâ€™t bother. This works wonderfully in cases where you want your program to behave in bizarre and unpredictable ways and to crash randomly.

If you encounter an error and donâ€™t know what to do about it, blindly ignoring it is not a viable option. You probably canâ€™t continue the functionâ€™s work, but returning without fulfilling your functionâ€™s contract will leave the world in an undefined and inconsistent state.

Never ignore an error condition. If you donâ€™t know how to handle the problem, signal a failure back up to the calling code. Donâ€™t sweep an error under the rug and hope for the best.

An alternative to ignoring errors is to instantly abort the program upon encountering a problem. Itâ€™s easier than handling errors throughout the code, but hardly a well-engineered solution!

Return values

The next most simple mechanism is to return a success/failure value from your function. A boolean return value provides a simple yes or no answer. A more advanced approach enumerates all the possible exit statuses and returns a corresponding reason code. One value means success, the rest represent the many and varied abortive cases. This enumeration may be shared across the whole codebase, in which case your function returns a subset of the available values. You should therefore document what the caller can expect.

While this works well for procedures that donâ€™t return data, passing error codes back with returned data gets messy. If int count() walks down a linked list and returns the number of elements, how can it signify a list structure corruption? There are three approaches:

Return a compound data type (or tuple) containing both the return value and an error code. This is rather clumsy in the popular C-like languages and is seldom seen in them.
Use an â€˜optionalâ€™ data type that can represent â€˜no value, or a specific valueâ€™. This is a syntactically nicer version of a compound data type.
Pass the error code back through a function parameter. In C++ or .NET, this parameter would be passed by reference. In C, youâ€™d direct the variable access through pointers. This approach is ugly and non-intuitive; there is no syntactic way to distinguish a return value from a parameter.
Alternatively, reserve a range of return values to signify failure. The count example can nominate all negative numbers as error reason codes; theyâ€™d be meaningless answers anyway. Negative numbers are a common choice for this. Pointer return values may be given a specific invalid value, which by convention is zero (or NULL). In Java and C#, you can return a null object reference.

This technique doesnâ€™t always work well. Sometimes itâ€™s hard to reserve an error range â€“ all return values are equally meaningful and equally likely. It also has the side effect of reducing the available range of success values; the use of negative values reduces the possible positive values by an order of magnitude. (If you used an unsigned int then the number of values available would increase by a power of two, reusing the signed intâ€™s sign bit.)

Error status variables

This method attempts to manage the contention between a functionâ€™s return value and its error status report. Rather than return a reason code, the function sets a shared global error variable. After calling the function, you must then inspect this status variable to find out whether or not it completed successfully.

The shared variable reduces confusion and clutter in the functionâ€™s signature, and it doesnâ€™t restrict the return valueâ€™s data range at all. However, errors signalled through a separate channel are much easier to miss or wilfully ignore. A shared global variable also has nasty thread safety implications.

The C standard library employs this technique with its errno variable. It has very subtle semantics: Before using any standard library facility, you must manually clear errno. Nothing ever sets a succeeded value; only failures touch errno. This is a common source of bugs, and makes calling each library function tedious. To add insult to injury, not all C standard library functions use errno, so it is less than consistent.

This technique is functionally equivalent to using return values, but it has enough disadvantages to make you avoid it. Donâ€™t write your own error reports this way, and use existing implementations with the utmost care.

Exceptions

Exceptions are a language facility for managing errors; not all languages support exceptions. Exceptions help to distinguish the normal flow of execution from exceptional cases â€“ when a function has failed and cannot honour its contract. When your code encounters a problem that it canâ€™t handle, it stops dead and throws up an exception â€“ an object representing the error. The language runtime then automatically steps back up the call stack until it finds some exception-handling code. The error lands there, for the program to deal with.

There are two operational models, distinguished by what happens after an exception is handled:

The termination model
The termination model (provided by C++, .NET and Java), in which execution continues after the handler that caught the exception.
The resumption model
The resumption model, in which execution resumes where the exception was raised.

The former model is easier to reason about, but it doesnâ€™t give ultimate control. It only allows error handling (you can execute code when you notice an error), not fault rectification (a chance to fix the problem and try again).

An exception cannot be ignored. If it isnâ€™t caught and handled, it will propagate to the very top of the call stack and will usually stop the program dead in its tracks. The language runtime automatically cleans up as it unwinds the call stack. This makes exceptions a tidier and safer alternative to hand-crafted error-handling code. However, throwing exceptions through sloppy code can lead to memory leaks and problems with resource clean-up. (For example, you could allocate a block of memory and then exit early as an exception propagates through. The allocated memory would leak. This kind of problem makes writing code in the face of exceptions a complex business.) You must take care to write exception-safe code. The sidebar explains what this means in more detail.

The code that handles an exception is distinct from the code that raises it, and it may be arbitrarily far away. Exceptions are usually provided by OO languages, where errors are defined by a hierarchy of exception classes. A handler can elect to catch a quite specific class of error (by accepting a leaf class) or a more general category of error (by accepting a base class). Exceptions are particularly useful for signalling errors in a constructor.

Exceptions donâ€™t come for free; the language support incurs a performance penalty. In practice, this isnâ€™t significant and only manifests around exception-handling statements â€“ exception handlers reduce the compilerâ€™s optimization opportunities. This doesnâ€™t mean that exceptions are flawed; their expense is justified compared to the cost of not doing any error handling at all!

Whistle-stop tour of exception safety

Resilient code must be exception safe. It must work correctly (for some definition of correctly, which weâ€™ll investigate below), no matter what exceptions come its way. This is true regardless of whether or not the code catches any exceptions itself.

Exception-neutral code propagates all exceptions up to the caller; it wonâ€™t consume or change anything. This is an important concept for generic programs like C++ template code â€“ the template types may generate all sorts of exceptions that template implementors donâ€™t understand.

There are several different levels of exception safety. They are described in terms of guarantees to the calling code. These guarantees are:

Basic guarantee If exceptions occur in a function (resulting from an operation you perform or the call of another function), it will not leak resources. The code state will be consistent (i.e., it can still be used correctly), but it will not necessarily leave in a known state. For example: A member function should add 10 items to a container, but an exception propagates through it. The container is still usable; maybe no objects were inserted, maybe all 10 were, or perhaps every other object was added.
Strong guarantee This is far more strict than the basic guarantee. If an exception propagates through your code, the program state remains completely unchanged. No object is altered, no global variables changed, nothing. In the example above, nothing was inserted into the container.
Nothrow guarantee The final guarantee is the most restrictive: that an operation can never throw an exception. If we are exception neutral, then this implies the function cannot do anything else that might throw an exception.

Which guarantee you provide is entirely your choice. The more restrictive the guarantee, the more widely (re)usable the code is. In order to implement the strong guarantee, you will generally need a number of functions providing the nothrow guarantee.

Most notably, every destructor you write must honour the nothrow guarantee. (Thatâ€™s the case in C++ and Java, at least. C# stupidly called ~X() a destructor, even though it was a finalizer in disguise. Throwing an exception in a C# destructor has different implications.) Otherwise, all exception handling bets are off. In the presence of an exception, object destructors are called automatically as the stack is unwound. Raising an exception while handling an exception is not permissible.

Signals

Signals are a more extreme reporting mechanism, largely used for errors sent by the execution environment to the running program. The operating system traps a number of exceptional events, like a floating point exception triggered by the maths coprocessor. These well-defined error events are delivered to the application in signals that interrupt the programâ€™s normal flow of execution, jumping into a nominated signal handler function. Your program could receive a signal at any time, and the code must be able to cope with this. When the signal handler completes, program execution continues at the point it was interrupted.

Signals are the software equivalent of a hardware interrupt. They are a Unix concept, now provided on most platforms (a basic version is part of the ISO C standard [1]). The operating system provides sensible default handlers for each signal, some of which do nothing, others of which abort the program with a neat error message. You can override these with your own handler.

The defined C signal events include program termination, execution suspend/continue requests, and maths errors. Some environments extend the basic list with many more events.

Detecting errors

How you detect an error obviously depends on the mechanism reporting it. In practical terms, this means:

Return values
You determine whether a function failed by looking at its return code. This failure test is bound tightly to the act of calling the function; by making the call, you are implicitly checking its success. Whether or not you do anything with that information is up to you.
Error status variables
After calling a function, you must inspect the error status variable. If it follows Câ€™s errno model of operation, you donâ€™t actually need to test for errors after every single function call. First reset errno, then call any number of standard library functions back-to-back. Afterwards, inspect errno. If it contains an error value, then one of those functions failed. Of course, you donâ€™t know what fell over, but if you donâ€™t care, then this is a streamlined error detection approach.
Exceptions
If an exception propagates out of a subordinate function, you can choose to catch and handle it or to ignore it and let the exception flow up a level. You can only make an informed choice when you know what kinds of exceptions might be thrown. Youâ€™ll only know this if it has been documented (and if you trust the documentation).

Javaâ€™s exception implementation places this documentation in the code itself. The programmer has to write an exception specification for every method, describing what it can throw; it is a part of the functionâ€™s signature. Java is the only mainstream language to enforce this approach. You cannot leak an exception that isnâ€™t in the list, because the compiler performs static checking to prevent it (C++ also supports exception specifications, but leaves their use optional. Itâ€™s idiomatic to avoid them â€“ for performance reasons, among others. Unlike Java, they are enforced at run time).
Signals
Thereâ€™s only one way to detect a signal: Install a hander for it. Thereâ€™s no obligation. You can also choose not to install any signal handlers at all, and accept the default behaviour.

As various pieces of code converge in a large system, you will probably need to detect errors in more than one way, even within a single function. Whichever detection mechanism you use, the key point is this:

Never ignore any errors that might be reported to you. If an error report channel exists, itâ€™s there for a reason.

It is good practice to always write error-detection scaffolding â€“ even if an error has no implication for the rest of your code. This makes it clear to a maintenance programmer that you know a function may fail and have consciously chosen to ignore any failures.

When you let an exception propagate through your code, you are not ignoring it â€“ you canâ€™t ignore an exception. You are allowing it to be handled by a higher level. The philosophy of exception handling is quite different in this respect. Itâ€™s less clear what the most appropriate way to document this is â€“ should you write a try/catch block that simply rethrows the exception, should you write a comment claiming that the code is exception safe, or should you do nothing? Iâ€™d favour documenting the exception behaviour.

Next time

So thatâ€™s an investigation of the landscape of â€˜error conditionsâ€™ in our code. Weâ€™ve seen what errors are, what causes them, how we detect and report error situations, and why we care. Errors are not (necessarily) caused by failures of the programmer. But not considering error conditions would be a failure of the programmer.

In the next instalment, weâ€™ll consider the best strategies to handle and recover from error situations. Weâ€™ll see the very practical code implications of good error case handling.

Reference

[1] ISO: The C Standard, the original was published in 1999 but has been superseded by the 2018 version of the document: https://www.iso.org/standard/74528.html

Pete Goodliffe is a programmer who never stays at the same place in the software food chain. He has a passion for curry and doesnâ€™t wear shoes.

Notes:

More fields may be available via dynamicdata ..