Title: Professionalism in Programming #25

Author:

Date: 01 April 2004 22:53:48 +01:00 or Thu, 01 April 2004 22:53:48 +01:00

Summary:

In the first part of this series, we looked at what it means to optimise code, and saw the cases for and against optimisation. In this article, we'll look at the process of optimisation. We'll see the correct, methodical approach that will lead to solid, worthwhile code optimisations.

Body:

The nuts and bolts

So how do you optimise? Rather than learn a list of specific code optimisations, it's far more important to understand the correct approach to optimising. Don't panic; we will see some programming techniques later, but they must be read in the context of this wider optimisation process.

The six steps for speeding up a program are:

Determine that it's too slow, and prove you do need to optimise.
Identify the slowest code. Target this point.
Test the performance of the optimisation target.
Optimise the code.
Test that the optimised code still works (very important).
Test the speed increase, and decide what to do next.

This sounds like quite a performance, but without it you'll waste time and effort, and end up with crippled code that runs no faster. If you're not trying to improve execution speed, adjust this process accordingly; for example tackle memory consumption problems by identifying which data structures are consuming all the memory and target those.

This optimisation procedure is essential; you cannot successfully optimise as you go along. First strive for clear, correct code. Only later look at improving it. But do pay continuous attention to efficiency as you write.

Performance requirements

Certain kinds of system have very strict requirements for program performance; this is especially true of embedded systems controlling critical pieces of hardware.

General purpose software also has performance requirements, although they tend to be specified less rigorously. You won't see many mandated performance requirements - the software is expected to execute in 'reasonable' time on a 'reasonable' computer. The problem comes when you need to know the value of 'reasonable'. A user with a modest PC will have a different opinion than a developer with a cutting edge machine.

The adequate level of performance is usually captured in a product's Requirements Specification. This contains a list of all requirements, including those for the program's quality of service. These specifications translate operational requirements into more technical language that describes:

what is correct behaviour, and
how we will determine if the software is acceptable.

As well as the expected system behaviour, performance requirements will describe the operational environment (in what conditions the code should work, and when it doesn't matter). Sometimes a UI Specification (describing how the user expects the system to respond) indirectly covers a large number of performance requirements.

A good performance specification should be quantitative, not qualitative. This helps us to design acceptance tests based on performance criteria. If a requirement is not verifiable, it is useless.

The requirements specification should not be too prescriptive. If a system is required to position a mechanism accurate to within one millimetre, it is specified. However, the requirements specification does not detail how this criterion is met - it might be a function of the mechanical hardware, of the electronic motor control, or of the controlling software (or more likely, a combination of the three). This is a design decision, often taken much later when the appropriate tradeoffs can be made.

Ensure that you don't write needlessly bloated code. This alone will not lead to code that performs well, but it means that you'll start optimising from a more reasonable place, and will not have to address tedious code issues before getting on to more serious optimisations.

It's important to begin optimisation with a clear goal in sight - the more optimisation you perform, the less readable the code becomes. Know the level of performance you require, and stop when it's sufficiently fast. It's tempting to keep going, trying to continually squeeze out a little extra performance.

To stand any chance of optimising correctly, you must take great care to prevent external factors changing the way your code works. When the world is changing under your feet, you can't compare measurements realistically. There are two essential techniques that help here:

Optimise your code separately from any other work, so the outcome of one task doesn't cloud the other.
Optimise release builds of your program, not development builds. The development builds may run in a very different manner to release builds, due to the inclusion of debugging trace information, object file symbols, etc.

Now we'll look at each of these optimisation steps in more detail:

Prove you need to optimise

The first thing to do is make sure you really do need to optimise. If the code's performance is acceptable then there's no point in tinkering with it. Knuth said (himself quoting C.A.R. Hoare): "We should forget about small efficiencies, say about 97% of the time: premature optimisation is the root of all evil." There are so many compelling reasons not to optimise that the quickest and safest optimisation technique is to prove that you don't need to do it.

You make this decision based on program requirements or usability studies. With this information you can establish a commercial imperative for optimisation, and determine whether it takes priority over adding new features.

Identify the slowest code

This is the part that most programmers get wrong. If you're going to spend time optimising, you need to target the places where it will make a difference. Investigations show that the average program spends more than 80% of its time in less than 20% of the code [Boehm87]. This is known as the 80/20 rule^[1]. That's a relatively small target; it's very easy to miss, and spend time optimising code that's rarely run.

You might notice that a part of your program has some relatively easy optimisations, but if it's seldom executed, then there's no point in doing so - clear code is better than faster code in this situation.

So how do you work out where to focus your attention? The most popular - and often the most effective - technique is to use a profiler. This times the flow of control around your program. It shows where that 80% of execution time is going, so you know where to concentrate your effort.

A profiler doesn't tell you which parts of the code are slowest; this is a common misconception. It actually tells you where the CPU spends most of its time. This is subtly different^[2]. You have to interpret these results, and use your brain. The program might spend most of its execution time in a few perfectly valid functions which cannot be improved at all - it's a sad fact that you can't always optimise. Sometimes the laws of physics win.

There are plenty of benchmarking programs around: many excellent commercial programs, and a number of freely available tools. It's worth spending money on a decent profiler - optimisation can easily eat into your time; this is also an expensive commodity. If you don't have a profiler available, there are a few other timing techniques you can try:

Put manual timing tests throughout your code. Make sure you use an accurate clock source, and that the time taken to read the clock will not affect program performance too much.
Count how often each function is called (some debug libraries provide support for this kind of activity).
Exploit compiler-supplied 'hooks' to insert your own accounting code when each function is entered/exited. Many compilers provide a means to do this - some profilers are implemented using such a mechanism.
Sample the program counter; interrupt your program periodically in a debugger to see where control is. This is harder in multithreaded programs, and is a very slow, manual approach. If you have control over the execution environment, you can write scaffolding to automate this kind of test - effectively writing your own form of profiler.
Test an individual function's impact on the total program execution time by making it slower. If you suspect that a particular function is causing a slowdown, try replacing its call with two calls in succession, and measure how it affects execution time^[3]. If the program takes 10% longer to run, then the function consumes approximately 10% of execution time. You can use this as a very basic timing test.

Whilst a profiler (or equivalent) is a good starting point to choose optimisation targets, you can easily miss quite fundamental problems. The profiler only shows how the code in the current design executes - and encourages you to perform code-level improvement only. Look at larger design issues, too. The lack of performance may not be due to a single function, but a more pervasive design flaw. If it is, then you'll have to work harder to remedy the problem. This shows how important it is to get the initial code design right, with knowledge of established performance requirements. Don't rely solely on a profiler to find the causes of program inefficiency - you might miss important problems.

When profiling make sure that you use realistic input data, simulating Real World events. The way your code executes may be drastically affected by the kind of input you feed it, or by the way it is driven, so make sure that you provide true representative input sets. If possible, capture a set of real input data from a live system.

Try profiling several different data sets, to see what difference this makes. Select a very 'basic' set, a 'heavy use' set, and a number of 'general use' sets. This will prevent you optimising for the particular quirks of one input data set. Select profiling test data carefully to represent Real World program use. Otherwise, you might optimise parts of the program that are not normally run.

Using a profiler

Owning a hammer or a saw doesn't instantly make you an expert carpenter; owning a profiler doesn't make you an expert optimiser. You have to know how to use it to perform the task - any tool increases your productivity, it doesn't give you ability.

Although each profiler is different, they follow the same general principles. A profiler is intimately entwined with a particular compiler - it has to reach into the object files and extract the compiler's 'debugging' information to work out which function is currently being run. The basic use of a profiler is:

Rebuild your program to support profiling. This usually means linking the executable without stripping out debug information. When you do this, you're no longer probing a release build of the code; this contravenes one of the previous optimisation laws. Sigh.
Inevitably using a profiler will interfere with running code, but generally this interference will not affect the way the code runs, or skew the timing results too much. We have to live with this.
Run the program in the profiler. This is probably an option in your IDE, or a simple profiler command.
The profiler will spit out its results when the program finishes. You can browse the profiler's output file, or use a neat graphical tool to interpret this information and display nice graphs.
Work out what the results are telling you, and act on them.

The most interesting part of the profiler output is the list of functions that the CPU spent most time in. Use this as a starting point for your optimisation. Read the list, taking into account the number of times each function was called:

If a function runs very quickly but gets called frequently, try to reduce the number of times it is invoked.
If a function is only ever called once but takes an age to complete, look for ways to speed up its execution.

Having completed this step, you've found the areas of your code where a performance improvement will have the most benefit. Now it's time to attack them...

Testing the code

We recognised three testing phases in the optimisation procedure. For each piece of code targeted, we test:

its performance before optimisation,
that it still works correctly once optimised, and then
its performance after optimisation.

Programmers often forget the second check: that the optimised code still works correctly in all possible situations. It's easy to check the 'normal' mode of operation, but it's not in our nature to test each and every corner case. This can be the cause of weird bugs late in the day, so be very rigorous about this.

You must measure the code's performance before and after modification, to make sure that you have made a real difference - and to make sure that it is a change for the better; sometimes an 'optimisation' can be an unwitting pessimisation. You can perform these timing tests with your profiler, or by inserting timing instrumentation by hand. Never try to optimise code without performing some kind of before and after measurement.

These are some very important things to think about when running your timing tests:

Run both the 'before' and 'after' tests with exactly the same set of input data, so that you're testing exactly the same thing. Otherwise your tests are meaningless; you're not comparing like with like. An automated test suite is best^[4] - with the same kind of 'live' representative data, we used in the profiling step.
Run all tests under identical prevailing conditions, so that factors like the CPU load or amount of free memory don't affect your measurements.
Ensure that your tests don't rely on user input. Humans can cause timings to fluctuate wildly. Automate every possible aspect of the test procedure.

Optimising the code

We'll investigate some specific optimisation techniques in the next article. Speed-ups vary from the simple refactoring of small sections of code to more serious design-level alterations. The trick is to optimise without totally destroying the code.

Determine how many different ways exist to optimise the identified code, and pick the best. Only perform one change at a time; it's less risky, and you'll have a better idea of what improved performance the most. Sometimes it's the least expected things that have the most significant optimisation effects.

After optimisation

We already mentioned performance testing after an optimisation. It's really important to run these before and after performance benchmarks - how else do you know you've made a successful modification?

If an optimisation proves to be unsuccessful, remove it. Back out your changes. This is where a source control system is useful, helping you to revert to the previous code version.

You should also remove the slightly successful optimisations. Prefer clear code to modest optimisations (unless you're absolutely desperate for an improvement, and there are no other avenues to explore).

Next time

We'll round off this series by looking at some specific optimisation techniques, and discover how to avoid optimisation altogether.

[Boehm87] Boehm, Barry, "Improving Software Productivity", 1987, IEEE Computer, Vol 20, No 9.

^[1] Although some go so far as to claim this should be the 90/10 rule.

^[2] All code runs at a fixed rate, based on the speed of the CPU clock, the number of other processes being juggled by the OS, and this thread's priority.

^[3] This won't necessarily make the function run twice as slowly. File system buffers, or CPU memory caches can enhance the performance of repeated code sections. Treat this as a very rough guide - more qualitative than quantitative.

^[4] Perhaps you can even use a part of your regression test suite?

Notes:

More fields may be available via dynamicdata ..