Design of applications and programs + Overload Journal #93 - October 2009
Browse in : All > Topics > Design
All > Journals > Overload > 93
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: The Generation, Management and Handling of Errors (Part 2)

Author: webeditor

Date: 25 October 2009 08:56:00 +00:00 or Sun, 25 October 2009 08:56:00 +00:00

Summary: Dealing with errors is a vital activity. Andy Longshaw and Eoin Woods conclude their pattern language.

Body: 

This is the second part of a paper that presents patterns for handling error conditions in distributed systems. The patterns in the collection are illustrated in Figure 1.

Figure 1

Some of the patterns (Split Domain and Technical Errors, Log at Distribution Boundary, Unique Error Identifier) were discussed in the first part of the paper in the previous issue. The remaining patterns are covered in this second part. At the end of the paper, a set of proto-patterns is briefly described. These are considered to be important concepts that may or may not become fully fledged patterns as the paper evolves.) were discussed in the first part of the paper in the previous issue. The remaining patterns are covered in this second part. At the end of the paper, a set of proto-patterns is briefly described. These are considered to be important concepts that may or may not become fully fledged patterns as the paper evolves.

Big Outer Try Block

Problem

Unexpected errors can occur in any system, no matter how well it is tested. Such truly exceptional conditions are rarely anticipated in the design of the system and so are unlikely to be dealt with by the system's error handling strategy. This means that these errors will propagate right to the edge of the system and will appear to 'crash' the application if not handled at that point. This may lead to some or all of the information associated with such unexpected errors being lost, leading to difficulties with the rectification of underlying problem in the system.

Context

A distributed system with a largely 'lay' user community, probably using graphical user interfaces. The interface is likely to be very simple: possibly even a 'kiosk style' interface. Users are mostly on remote sites and will not do much to report errors if they can work around them.

Forces

Solution

Implement a Big Outer Try Block at the 'edge' of the system to catch and handle errors that cannot be handled by other tiers of the system. The error handling in the block can report errors in a consistent way at a level of detail appropriate to the user base.

Implementation

In the system's ultimate client, wrap the top-level invocation of the system in a Big Outer Try Block that will catch any error - domain or technical - propagating up from the rest of the system. The Big Outer Try Block should differentiate between technical errors (such as databases not being available) and domain errors (such as performing business process steps in the wrong order) as suggested in Split Domain and Technical Errors.

Technical errors should be logged for possible use by technical support staff and the user should then be informed that something terrible has happened in general terms, making it clear that what has happened is not related to their use of the system.

A domain error that reaches the Big Outer Try Block is probably a failure in the design of the user interface that resulted in an unanticipated business process state being reached and as such should be treated as a system fault. In such cases, again the error should be logged and a user-friendly message displayed, but in this case the message can include details of the problem encountered, as these details are likely to be meaningful to the user since they relate to the business process that they were performing.

Finally, a totally unpredictable error (such as an exception indicating a resource shortage due to having run out of memory) that reaches the Big Outer Try Block is some form of internal or environmental error that could not be handled at a lower level. As with a technical error, a generic error should be displayed to the user and the details of the error logged locally.

An example of the structure of a Big Outer Try Block's implementation is shown in Listing 1.

    public class ApplicationMain  
    {  
      ...  
      public static void main(String[] args)  
      {  
        try  
        {  
          ApplicationMain m = new ApplicationMain() ;  
          m.initialize() ;  
          m.execute() ;  
          m.terminate() ;  
        }  
        catch(AppDomainException de)  
        {  
         // Domain exceptions shouldn't get to this  
         // level as they should be handled in the  
         // user interface. If they get here, report  
         // the text to the user and log them in a  
         // local log file  
        }  
        catch(AppTechnicalException te)  
        {  
         // Technical exceptions here are probably  
         // user interface problems. Display a  
          // generic apology and log to a local log file  
        }  
        catch(Throwable t)  
        {  
         // Other exception objects must be internal  
         // errors that could not be caught and  
         // handled elsewhere. Display a generic  
         // apology and log to a local log file  
        }  
      }  
    }  
Listing 1

Positive consequences

Negative consequences

Related patterns

Hide Technical Error Detail From Users

Problem

The technical details of errors that occur are typically of no interest to the end-users of a system. If exposed to such users, this error information may cause unnecessary concern and support overhead.

Context

An application with a largely non-technical user community, probably using the system via some sort of graphical interface.

Forces

Solution

Implement a standard mechanism for reporting unexpected technical errors to end-users. The mechanism can report all errors in a consistent way at a level of detail appropriate to the different user constituencies who need to be informed about the error.

Known uses

The authors are aware of a number of instances of this pattern in enterprise systems, although none of them are available for public study. Some examples of using this pattern outside the domain of enterprise systems include the following.

Implementation

Within the system's user interface implementation, provide a single, straightforward mechanism for reporting technical errors to end-users. The mechanism is almost certainly going to be a simple API call of the general form:

      void notifyTechnicalError(Throwable t) ;  
 

The mechanism created should perform two key tasks:

Use this mechanism to handle all technical errors encountered by the system's user interface.

Positive consequences

Negative consequences

Related patterns

Log Unexpected Errors

Problem

Much domain code includes handling of exceptional conditions and is designed to recognize and handle each condition according to a business process definition (typically the offending transaction being rejected or a new domain entity being created). If such routine error conditions are logged, this makes real errors requiring operator intervention difficult to spot.

Context

Where systems are created in organizations with complex domain processing, or systems with a large number of routinely expected error conditions to which the processes specify the response.

Forces

Solution

Implement separate error handling mechanisms for expected and unexpected errors. Error conditions that are expected to arise in the course of normal domain processing should not be logged but handled in the code or by the user. Hence, any logged error should be viewed as requiring investigation.

Implementation

Throughout the system's implementation, use two distinct error handling approaches for expected and unexpected errors:

Alternatively, the application may interact with the user, inform them of the problem (in appropriate terms - see Hide Technical Detail from Users) and prompt them to re-start part or all of the current operation.

By following these principles, errors such as 'could not connect to database' are not hidden by hundreds of routine error conditions such as 'no such product code' (perhaps caused by a user misreading a code from a piece of product packaging). As the former error is a significant error requiring investigation, while the second is an expected error condition, the former would be logged and the latter handled algorithmically by the business logic, without logging the condition.

One variation on this approach is to log different types of error message to different places. For example, in terms of the application itself a user failing to authenticate may not be worth recording. However, from the system's point of view (i.e. the operating system) the security policy may require all failed authentications to be logged. This is usually resolved by logging different types of errors to different logs, such as the application event log and security event log provided under Windows. Such partitioning allows different logs to be created to serve the needs of different areas of concern. Another example of this is where knowledge of the patterns in which errors occur would be of interest to developers - large numbers of failed searches at a search engine site may indicate a usability problem. However, such errors are not of interest to the operations team who are responsible for keeping the system running. In this case, the expected errors could be logged to a different location where they will not interfere with the operational errors but can be retrieved later by the development team for further analysis.

A second variation is to log different types of error message in one location but to mark each log message with one or more attributes that allow a set of filters to be created to provide the ability to extract various subsets of the log content on demand to support different uses (such as error monitoring versus usability analysis).

Positive consequences

Negative consequences

Related patterns

Make Exceptions Exceptional

Problem

A number of languages include exception handling facilities and these are powerful additions to the error handling toolkit available to programmers. However, if exceptions are used to indicate expected error conditions occurring, then the calling code becomes much more difficult to understand.

Context

Any situation where a language with exception handling built into it is in use.

Forces

Solution

Indicate expected domain errors by means of return codes. Only use exceptions to indicate runtime problems such as underlying platform errors or configuration/data errors.

Implementation

When designing the interfaces in your system you should classify errors into two types:

Errors of the first type will be handled as part of the standard business logic in the system. On the other hand, errors of the second type will normally be handled by a combination of logging and exiting the current code block via an exception path.

It is worth briefly exploring the differences and the blurring of the boundaries here through an example. Consider a component in a retail system that offers out two methods to look up product information. One of these methods allows you to look up products either by keyword or wildcard text string and returns a list of matching products. The other method requires a numeric product code such as a barcode and returns the single product matching that code. The component is backed by a database containing all the products stocked by the retailer.

The search by keyword/wildcard has no guarantee of finding a matching product. Typically, the keyword/wildcard will be entered by a user and so could be subject to all forms of data problems such as mis-spelling or unrealistic expectations (e.g. entering "Elton John" when the retailer just sells food - not CDs). Hence, semantically you could expect no products to be returned - this is an expected business condition, however unhelpful it is to the user of the calling application. Having said that, the user can always get the answer they want by trying again - providing sensible input to the search.

On the other hand, there is more of a semantic implication that the method that requires a product code should find something. Unless users of the system are prone to scanning in barcodes from random products they bring into work, any product scanned in store should be in the database: you should not be able to provide a code that cannot be found. In this case, you could justifiably throw an exception as the only way this condition can occur is if there is a problem with the data in your database. Not only can the user not get the right answer by re-scanning the product (same answer each time...), but in terms of the system this situation needs resolving (i.e. the data in the database needs correcting).

Finally, in either case if the component cannot connect to the database for whatever reason a technical exception should be raised (indeed, the underlying platform will probably raise one for you).

Positive consequences

Negative consequences

None

Related patterns

Proto-Patterns

Ignore Irrelevant Errors

Single Type for Technical Errors

References

[Cunningham] CHECKS: A Pattern Language of Information Integrity, http://c2.com/ppr/checks.html

[Dyson04] Dyson 2004 Architecting Enterprise Solutions: Patterns for High-Capability Internet-based Systems, Paul Dyson and Andy Longshaw, John Wiley and Sons, 2004

[Haase] Java Idioms - Exception Handling, linked from http://hillside.net/patterns/EuroPLoP2002/papers.html

[Renzel97] 'Error Handling for Business Information Systems', Eoin Woods, linked from http://hillside.net/patterns/onlinepatterncatalog.htm

Acknowledgements

We'd like to thank our EuroPLoP 2005 shepherd, Ofra Homsky for her thorough and valuable feedback during this paper's review process and our original EuroPLOP 2004 shepherd Bob Hanmer for providing very valuable advice on the original paper, the members of the EuroPLoP 2004 and 2004 workshops, and the members of the OT2004 workshop at which this paper was first presented.

Notes: 

More fields may be available via dynamicdata ..