hourly 1 http://backend.userland.com/rss

Journal Articles

Overload Journal #61 - Jun 2004 + Programming Topics

Browse in :

All > Journals > Overload > 61 (10)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Efficient Exceptions?

Author:

Date: 01 June 2004 16:51:14 +01:00 or Tue, 01 June 2004 16:51:14 +01:00

Summary:

Body:

I recently read a comment on a code review that produced a fair amount of discussion about exceptions. Slightly simplified, the C# code being reviewed was:

public bool isNumeric(string input) {
  bool ret = true;
  bool decimalFound = false;

  if(input == null
        || input.Length < 1) {
    ret = false;
  }
  else {
    for(int i = 0
        ; i < input.Length
        ; i ++) {
      if(!Char.IsNumber(input[i]))
        if((input[i] == '.')
              && !decimalFound) {
          decimalFound = true;
        }
        else {
          ret = false;
        }
    }
  return ret;
}

The review comment that started the discussion was quite short:

Your isNumeric function might be more efficient as:

public bool isNumeric(string input) {
  try {
    Double.Parse(input);
    return true;
  }
  catch(FormatException /*ex*/) {
    return false;
  }
}

I thought some of the discussion inspired by this comment might be of more general interest. There are several important issues that are relevant here, and later in the article I will return to some of them. However the first thing that struck me was the use of the word "efficient" and it is this word that the bulk of the article addresses.

What Does it Mean to be Efficient?

In the coding context efficiency is usually concerned with size and/or speed.

The second piece of code is more efficient in terms of source code size. And it is probably slightly more efficient in terms of image code size; but it is almost certainly not more efficient in terms of runtime memory use, particularly on failure, since exceptions in C# will allocate memory.

So I started wondering about runtime efficiency - which for simplicity I will from here on refer to as 'performance'. Would the proposed replacement function be any faster than the original function?

The Dangers of Performance Figures

Note: performance figures are very dangerous! They depend on all sorts of factors, such as the language being used, the compiler settings, the operating system being used and the hardware that the program runs on.

Although I've done my best to produce repeatable performance figures for this article please do not take any of the figures as being more than indicative of the overall performance of the languages mentioned. A small change to the programs being tested could produce variations in the figures produced.

For those who care I was using Windows 2000 SP4 on a 733 MHz single CPU machine with 768 Mb of RAM. (Yes, maybe it's time I bought a newer machine!)

I was using:

C# with csc from Visual Studio .NET (version 7.00.9466), both with and without optimising
Java with JDK 1.3.0 and JDK 1.4.2
C++ with Microsoft VC++ 6 (version 12.00.8804) both with (/Ox) and without (/Od) optimising. In both cases I was also using /GX /GR.
C++ with gcc 2.95.3-4 under Cygwin (with and without -O)

(I also repeated a couple of the C++ tests with the Microsoft .NET and .NET 2003 C++ compilers but the results did not change enormously.)

It is important to note that I was not principally looking to optimise the hand written code - I was interested in the effect on performance of using exceptions. For this reason I deliberately kept the implementations of each function similar to ones in the other languages - hence, for example, the use of the member function at() in the C++ code rather than the more idiomatic [] notation. In fact, after being challenged about this, I tested both methods and to my shock found that at() was actually faster than operator[] with MSVC6. If you find this unbelievable it only goes to show how unexpected performance measurements can be, and how dependent they are on the optimiser!

I also made the IsNumeric method an instance method of a class in all languages for consistency and ease of testing. Changing this would have equally affected the performance of both the exception and the non-exception code so I left it as it was.

It is often said that it is better to use library functions than to write your own code; apart from any other considerations library functions are often optimised by experts using a wide variety of techniques. However, in this case using the library function adds exception handling into the equation - would the advice still stand?

I thought I'd try to get some actual performance figures.

I wrote a simple test harness that called the first and the second functions 1,000,000 times.

(execution times in seconds)

Table 1. Unoptimised C#

Argument	Function #1	Function #2
1	0.19	0.92
12345	0.56	1.13
10 digits	1.01	1.38
20 digits	1.91	1.79

Table 2. Optimised C#

Argument	Function #1	Function #2
1	0.10	0.89
12345	0.33	1.01
10 digits	0.65	1.36
20 digits	1.28	1.76
30 digits	2.07	1.92
40 digits	2.55	2.38

So the first function is quite a bit faster for relatively short strings, but degrades until it is eventually slower than the second function. Similar results are generated when optimisation is turned on, although the number of digits at the 'break even' point is slightly more.

The main question I was investigating though is what happens when a non-numeric value is supplied and an exception is thrown.

Table 3. Exception Results

Argument	Function #1	Function #2
X	0.20	147.60 (unoptimised)
X	0.11	143.24 (optimised)

Yes, that's right - the decimal point is in the right place for function #2! The code path through the exception throwing route took almost 3 orders of magnitude longer than the raw code.

This is why, for this article, I'm just not interested in minor optimisations of the source code, since the impact of exceptions dwarfs them.

This was very intriguing - I wondered whether it was only a C# issue or it was also an issue with Java and C++.

Here is an approximately equivalent pair of functions in Java:

public boolean isNumeric(String input) {
  boolean ret = true;
  boolean decimalFound = false;
  if(input == null
        || input.length() < 1) {
    ret = false;
  }
  else {
    for(int i = 0
        ; i < input.length()
        ; i ++) {
      if(!Character.isDigit(
            input.charAt(i)))
        if((input.charAt(i) == '.')
              && !decimalFound) {
          decimalFound = true;
        }
        else {
          ret = false;
        }
    }
  }
  return ret;
}

and:

public boolean isNumeric(String input) {
  try {
    Double.parseDouble(input);
    return true;
  }
  catch(NumberFormatException ex) {
    return false;
  }
}

Surely code that looks so similar must behave the same way :-) ?

Here are the results for the non-exception case:

Table 4. jdk 1.3.0

Argument	Function #1	Function #2
1	0.13	0.81
12345	0.42	1.15
10 digits	0.76	1.68
20 digits	1.48	23.16

Table 5. jdk 1.4.2

Argument	Function #1	Function #2
1	0.10	0.76
12345	0.29	1.12
10 digits	0.51	1.63
20 digits	0.94	28.08

The results here are comparable to the optimised C# results - apart from the last line. What happens here? The parseDouble method is much slower once you exceed 15 digits - this is to do (at least with the versions of Java I'm using) with optimisations inside Double.parseDouble when the number is small enough to be represented as an integer value. Whether this matters in practice of course depends on the range of input values the program actually passes to the isNumeric function.

The exception results look like this:

Table 6. Exception Results

Argument	Function #1	Function #2
X	0.16	15.33 (jdk 1.3.0)
X	0.12	18.15 (jdk 1.4.0)

Well, this is not quite as awful as the C# case - the performance of the second function is 'only' two orders of magnitude worse than the first function when the exception is thrown.

For completeness, how about a C++ solution?

The roughly equivalent functions I came up with were:

bool IsNumeric1::isNumeric(std::string
                   const & input) const {
  bool ret = true;
  bool decimalFound = false;
  if(input.length() < 1) {
    ret = false;
  }
  else {
    for(int i = 0
        ; i < input.length()
        ; i ++) {
      if(!isdigit(input.at(i)))
        if((input.at(i) == '.')
              && !decimalFound) {
          decimalFound = true;
        }
        else {
          ret = false;
        }
    }
  }
  return ret;
}

and:

bool IsNumeric2::isNumeric(std::string
                   const & input) const {
  try {
    convert<double>(input);
    return true;
  }
  catch(std::invalid_argument const & ex) {
    return false;
  }
}

where convert was derived from code in boost::lexical_cast from www.boost.org (in the absence of a standard C++ library function with similar syntax and semantics to the C# and Java parse functions) and looks like this:

template<typename Target>
Target convert(std::string const & arg) {
  std::stringstream interpreter;
  Target result;
  if(!(interpreter << arg)
        || !(interpreter >> result)
        || !(interpreter >> std::ws).eof())
    throw std::invalid_argument( arg );
  return result;
}

I decided using a reference in C++ kept the source code looking more equivalent although a smart pointer could have been used instead as its behaviour is more like that of a reference in the other two languages.

How did C++ fare in the comparison?

Table 7. MSVC optimised

Argument	Function #1	Function #2
1	0.07	5.73
12345	0.21	7.65
10 digits	0.40	11.25
20 digits	0.74	17.12

Our initial choice for the convert function is very slow - perhaps it is a bad choice. The cost of using stringstream objects seems to be very high, although that might be a problem with my compilers' implementations. This is not really an entirely fair comparison either - the convert template function is generic whereas the C# and Java code is type-specific. So let me replace the generic convert function with:

double convert(std::string const & arg) {
  const char *p = arg.c_str();
  char *pend = 0;
  double result = strtod(p, &pend);
  if(*pend != '\0')
    throw std::invalid_argument(arg);
  return result;
}

This produces the following improved performance figures:

Table 8. MSVC unoptimised

Argument	Function #1	Function #2
1	0.14	1.82
12345	0.46	2.71
10 digits	0.83	5.1
20 digits	1.57	8.62

Table 9. MSVC optimised

Argument	Function #1	Function #2
1	0.07	1.80
12345	0.21	2.71
10 digits	0.40	4.94
20 digits	0.74	8.39

And finally I recompiled the C++ code with gcc under Cygwin.

Table 10. gcc unoptimised

Argument	Function #1	Function #2
1	0.29	0.63
12345	0.98	0.70
10 digits	1.79	0.85
20 digits	3.44	3.87

Table 11. gcc optimised

Argument	Function #1	Function #2
1	0.05	0.33
12345	0.11	0.40
10 digits	0.17	0.55
20 digits	0.27	3.55

However, what about the exception throwing case (which is after all the motivating example) ?

Table 12. Exception Results

Argument	Function #1	Function #2
X	0.17	11.69 (MSVC unoptimised)
X	0.08	11.03 (MSVC optimised)
X	0.40	4.15 (gcc unoptimised)
X	0.06	3.17 (gcc optimised)

Even discounting the cost of solution #2 there is one to two orders of magnitude difference between the return code and exception throwing case, but with some significant differences between the two compilers.

What is the Cost of an Exception?

Exceptions tend to be expensive for a number of reasons, described below:

The exception object itself must be created.

This is not usually very expensive in C++, although it does obviously depend on the exact class being used. In Java and in C# the exception object contains a call stack, and the runtime environment has to create this before the exception is thrown. This may be quite expensive, particularly if the function call stack is deep.

The act of throwing the exception can be expensive.

For example, when using Microsoft compilers under Windows, throwing a C++ exception involves calling the OS kernel to raise an operating system exception, which includes capturing the state of the thread for passing to the exception handler. This approach is by no means universal - gcc under Cygwin does not use native operating system exceptions for its C++ exceptions, which seems to have as a consequence that the execution time cost of an exception is lower.

Then, in C++, a copy of the supplied object is thrown, which can impose some overhead for non-trivial exception objects.

There is the cost of catching the exception.

This in general involves unwinding the stack and finding suitable catch handlers, using run time type identification to match the types of the thrown object to each potential catch handler. For example, if I throw a std::invalid_argument object in C++ this might be caught by:

catch(std::invalid_argument const &)
catch(std::exception)
catch(...)

with different behaviour in each case. The cost of this rises with both the depth of the exception class hierarchy and the number of catch statements that there are between the throw and the successful catch.

Note that some experts in compiler and library implementation claim that high performance exception handling is theoretically possible; however in practice it seems than many of the popular compilers out there do have less than optimal performance in this area.

Should I Care How Slow Exceptions Are?

Let's take stock of where we have reached. I've investigated the 'efficiency' claims for the proposed replacement code and found that it is almost always slower for numeric input and very significantly slower for non-numeric input.

On examining the two functions you can quickly see that they do not produce the same answers for all inputs; this is probably much more significant than which function runs faster since in most applications 'right' is better than 'fast but wrong'.

Consider the results the two C# implementations give for the following inputs:

Table 13. Optimised C#

Argument	Function #1	Function #2
"+1"	False	True
"-1"	False	True
"."	True	False
" 1"	False	True
"1 "	False	True
"1e3"	False	True
"1,000"	False	True
"Infinity"	False	True
null	False	Exception

The library function understands a much broader range of numeric input than the hand-crafted code does. And that's leaving aside all discussion about locales (should ',' be a decimal point or a thousands separator?), which the library function takes in its stride. This probably provides an explanation of why our own conversion function is faster than the library call - it isn't a complete solution!

The problem with the initial code review was the word 'efficient'; I would like to make use of the library call to take advantage of its rich functionality despite the loss of efficiency. However I'd like to reduce the expense of the exception - is this possible?

The exception is being thrown when the input is not numeric so its cost only matters in this case. Ideally I'd like to find out how many times the function returns false in typical use; unfortunately a simple profiler will only tell me how many times the function is called and not differentiate on return code. I either need to use a better profiler or to add some instrumentation to my program.

In the best case I might find that the function usually succeeds and then I probably don't mind taking a performance hit on the rare failures. However I might find that the function is called a lot and is roughly evenly divided between success and failure - in this case I will want to reduce the cost.

As it happens, it is fairly easy to do this in the C# case. Closer investigation of the Double class reveals a TryParse method that has exactly the behaviour we require in IsNumeric. It needs a couple of additional arguments but the resultant code is clear:

using System.Globalization;
...
public bool isNumeric(string input) {
  double ignored;
  return = Double.TryParse(input,
        NumberStyles.Float |
              NumberStyles.AllowThousands,
        NumberFormatInfo.CurrentInfo,
        out ignored);
}

The results of running this function are:

Table 14. Optimised C#

Argument	Function #1	Function #2	Function #3
1	0.10	0.89	1.08
12345	0.33	1.01	1.29
10 digits	0.65	1.36	1.55
20 digits	1.28	1.76	1.95
X	0.11	143.24	1.90

Unfortunately Double.Parse(string) is slightly faster than TryParse for the 'good' case but this is outweighed by the drastic improvement in speed on 'bad' inputs. In the absence of specific measurements of performance I would prefer this solution.

The Java case is more difficult - there is no direct equivalent to TryParse. I tried the following:

public boolean isNumeric(String input) {
  java.text.NumberFormat numberFormat
        = java.text.NumberFormat.getInstance();
  java.text.ParsePosition parsePosition
        = new java.text.ParsePosition(0);
  Number value = numberFormat.parse(
        input, parsePosition);
    return ((value != null)
        && (parsePosition.getIndex()
        == input.length()));
}

However the performance is 'disappointing'. The new method is indeed faster when an exception occurs - but an order of magnitude slower when the input is in fact numeric. The biggest cost is creating the numberFormat object - caching this makes it a lot faster, but additional coding work would need to be done to make it threadsafe (see the JDK 1.4 documentation for NumberFormat).

Table 15. jdk 1.3.0

Argument	Function #1	Function #2	Function #3	Funct'n #3 + caching
1	0.13	0.81	14.54	2.39
12345	0.42	1.15	16.00	3.77
10 digits	0.76	1.68	18.11	5.88
20 digits	1.48	23.16	51.19	34.70
X	0.16	15.33	11.79	0.85

The decision is much harder here - can I do anything else? One option is to check for common failure cases before passing the string into Double.Parse. This means measuring or guessing what the 'common failures' are - an example of such a guess would be to check if the first digit is alphabetic.

Moving on, the C++ case is easier - I can simply return failure from strtod by using a return code rather than throwing an exception.

bool try_convert(std::string const & arg) {
  const char *p = arg.c_str();
  char *pend = 0;
  (void)strtod(p, &pend);
  return (*pend == '\0');
}

Table 16.

Argument	Function #1	Function #2	Function #3
X	0.17	11.69	0.31 (MSVC unoptimised)
X	0.08	11.03	0.26 (MSVC optimised)

Anything Else?

There are a couple of other points worth noting about using exceptions.

It can be hard to correctly identify which exceptions should be caught, and mismatches can cause other problems.

Firstly, catching too much. If your code catches too broad a category of exceptions (for example "catch (Exception)", or "catch (...)" in C++) can mean that error cases other than the one you are expecting are caught and do not flow to the appropriate higher level handler where they can be correctly dealt with. This can be even more of an issue in some C++ environments, such as MSVC, where non-C++ exceptions are also swallowed by catch (...).

Conversely, failing to make the exception net wide enough can lead to exceptions leaking out of the function and causing a failure higher up. This has happened to me when using JDBC in Java where the exception types thrown for data conversion errors, such as invalid date format, seem to vary depending on the driver being used.

Debugging exceptions can be a problem. Many debuggers cannot easily filter exceptions, so if your program throws many exceptions it can make the debugging process slow or unwieldy, or swamp output with spurious warnings.

In some environments you can stop when an exception is about to be thrown, but it is very hard to follow the flow of control after that point. The standard flow-of-control mechanisms are usually easier to trace.

Finally the code you write must be exception safe - when exceptions occur you must make sure that the unwinding of the stack up to the catch handler doesn't leave work undone. The main dangers to avoid are leaving objects in inconsistent states and neglecting to release resources. This includes, but is not restricted to, dynamically allocated memory - don't fall for the popular misconception that exception safety is only an issue for C++ programs (see, for example, [Griffiths]).

When Are Exceptions Exceptional?

Let's go back to first principles - what are exceptions for?

The exception mechanism can be seen as a way to provide separation of concerns for error handling. It is particularly useful when the code detecting the error is distant from the code handling the error; exceptions provide a structured way of passing information about the error up the call stack to the error handler.

Exceptions also make errors non-ignorable by default, since uncaught exceptions terminate the process. More traditional alternatives such as error return values are often ignored and also the flow of error information has to be explicitly coded which leads to increased code complexity.

Exceptions can, in principle, be viewed as no more than a mechanism to transfer control within a program. However, unless care is taken, using exceptions as a flow of control mechanism can produce obscure code. Stroustrup wrote: "Whenever reasonable, one should stick to the "exception handling is error handling" view" [Stroustrup].

If exceptions are being used for handling errors that need nonlocal processing then the possible runtime overhead is unlikely to be an issue. Typical uses of exceptions of this type, where errors are relatively uncommon and the performance impact is secondary, include:

signalling errors which require aborting an entire unit of work, for example an unexpected network disconnection
support for pre/post conditions and asserts

Grey areas where, since exceptions are thrown for 'nonexceptional' or 'non-error' conditions, programmers disagree about the validity of using exceptions include:

dealing with invalid user input
handling uncommon errors in a recursive algorithm - for example a parse failure for a SQL statement or numeric overflow in a calculation
handling end of file (or, more generally, handling any kind of 'end of data' condition)

Examples of abuse include:

using exceptions to handle optional fields missing
using exceptions to give early return for common special cases

My own rough guideline for the 'grey areas' is that if all exceptions became fatal then most programs should still run at least four days out of five.

Others have a more flexible approach and use exceptions more widely than this, sometimes unaware of the consequences of this decision.

Conclusion

It is important to recognise that using exceptions has an associated cost in C# and to a slightly lesser extent in Java and C++.

Using exceptions in the main execution path through the program may have major performance implications. Their use in time-critical software, in particular to deal with non-exceptional cases, should be carefully justified and the impact on performance measured.

When this is an issue alternative techniques which may be faster include: using return values instead of exceptions to indicate 'expected' error conditions; and checking for common failures before risking the exception.

Acknowledgements

Thanks to Phil Bass and Alan Griffiths for reviewing earlier drafts of this article.

References

[Griffiths] Alan Griffiths, "More Exceptional Java", Overload, June 2002

[Stroustrup] Bjarne Stroustrup, C++ programming language, 3rd edition, p375

Notes:

More fields may be available via dynamicdata ..