Programming Topics + CVu Journal Vol 27, #6 - January 2016

Browse in :

All > Topics > Programming
All > Journals > CVu > 276
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Bug Hunting

Author: Martin Moene

Date: 06 January 2016 21:09:57 +00:00 or Wed, 06 January 2016 21:09:57 +00:00

Summary: Pete Goodliffe continues the hunt for software faults.

Body:

You can see a lot by just looking.
~ Yogi Berra

In the previous article we looked at the (somewhat obvious) reasons that an effective programmer has to be an effective debugger (or is that debuggist?) (or debugorisator?). And we began to look at strategies and tools that help us perform this task.

In this concluding part, we'll continue our journey though the useful strategies and tools that help us to find, and remove, those pesky varmints.

Invest in sharp tools

The are many tools that are worth getting accustomed to, including memory checkers like Electric Fence, and Swiss Army knife tools like Valgrind. These are worth learning about now rather than reaching for them at the last minute. If you know how to use a tool before you have a problem that demands it, youâ€™ll be far more effective.

Learning a range of tools will prevent you from cracking a nut with a pneumatic drill.

Of course, the tool of debugging champions is the debugger. This is the king of tools that allows you to break into the execution of a running program, step forward by a single instruction, or step in and out of functions. Other very handy facilities include the ability to watch variables for changes, set conditional breakpoints (e.g., "break if x > y"), and change variable values on the fly to quickly experiment with different code paths. Some advanced debuggers even allow you to step backward (now thatâ€™s real voodoo).

Most IDEs come with a debugger built in, so youâ€™re never far from deploying a breakpoint. But you may find it worth investing in a higher quality alternative, donâ€™t rely on the first tool that falls to hand.

In some circles there is a real disdain for the debugger. Real programmers donâ€™t need a debugger. To some extent this is true; being overly reliant on the debugger is a bad thing. Single-stepping through code mindlessly can trick you into focusing on the micro level, rather than thinking about the macro, overall shape of the code.

But itâ€™s not a sign of weakness. Sometimes itâ€™s just far easier and quicker to pull out the big guns. Donâ€™t be afraid to use the right tool for the job.

Learn how to use your debugger well. Then use it at the right times.

Remove code to exclude it from cause analysis

When you can reproduce a fault, consider removing everything that doesnâ€™t appear to contribute to the problem to help focus in on the offending lines of code. Disable other threads that shouldnâ€™t be involved. Remove subsections of code that do not look like theyâ€™re related.

Itâ€™s common to discover objects indirectly attached to the â€˜problem areaâ€™ â€“ for example, via a message bus or a notifier-listener mechanism. Physically disconnect this coupling (even if youâ€™re convinced itâ€™s benign). If you still reproduce the fault, you have proven your hunch about isolation, and have reduced the problem space.

Then consider removing, or skipping over, sections of code leading up to the error (as much as makes practical sense). Delete, or comment out blocks that donâ€™t appear to be involved.

Cleanliness prevents infection

Donâ€™t allow bugs to stay in your software for longer than necessary. Donâ€™t let them linger.

Donâ€™t dismiss niggling problems as known issues. This is a dangerous practice. It can lead to broken window syndrome [1], making it gradually feel normal and acceptable to have buggy behaviour.Â This lingering bad behaviour can mask the causes of other bugs youâ€™re hunting.

Fix bugs as soon as you can. Don't let them pile up until you're stuck in a code cesspit.

One project I worked on was demoralisingly bad in this respect. When given a bug report to fix, before managing to reproduce the initial bug youâ€™d encounter 10 different issues that all also needed to be fixed, and may (or may not) have contributed to the bug in question.

Oblique strategies

Sometimes you can bash your head against a gnarly problem for hours and get nowhere. Try an oblique strategy to avoid getting stuck in a debugging rut.

Take a break
Itâ€™s important to learn when you should simply stop and walk away. A break can give you fresh perspective.

This can help you to think more carefully. Rather than running headlong back into the code, take a break to consider the problem description and code structure.

Go for a walk to force you to step away from the keyboard. (How many times have you had those â€˜eurekaâ€™ moments in the shower? Or in the bathroom?! It happens to me all the time.)
Explain it to someone else
Describe the problem to someone else. Often when describing any problem (including a bug hunt) to another person, you instantly explain it to yourself and solve it.

Failing another actual, live person, you can follow the rubber duck strategy described by Andrew Hunt and David Thomas [2].

Talk to an inanimate object on your desk to explain the problem to yourself. Itâ€™s only a problem if the rubber duck starts to talk back.

Donâ€™t rush away

Once you find and fix a bug, donâ€™t rush mindlessly on. Stop for a moment and consider if there are other related problems lurking in that section of code. Perhaps the problem youâ€™ve fixed is a pattern that repeats in other sections of the code. Is there further work that you could do to shore up the system with the knowledge you just gained?

Keep notes on which parts of the code harbour more faults. There are always hotspots. These hotspots are either the 20% of the code that 80% of users actually run, or a sign of ropey, badly written software.

When you have spent enough time gathering notes, it may be worth devoting time to those problem areas: perhaps a rewrite, a deep code review, or an extra unit test harness.

Non-reproducible bugs

Sometimes you discover a bug for which you can't easily form a set of reproduction steps. The bug defies logic and reason; itâ€™s not possible to determine the cause-and-effect. These nasty, intermittent bugs seem to be caused by cosmic rays rather than any direct user interaction.Â They take ages to track down, often because we never get a chance to see them on a development machine, or when running in a debugger.

How do we go about finding, and fixing, these fiends?

Keep records of the factors that contribute to the fault. Over time you may spot a pattern that will help you identify the common causes.
As you get more information, start to draw conclusions. Perhaps you can identify more data points to keep in the record.
Consider adding more logging and assertions in beta or release builds to help gather information from the field.
If itâ€™s a really pressing problem, set up a test farm to run long-running soak tests. If you can automate driving the system in a representative manner, then you can accelerate the hunting season.

There are a few things that are known to contribute to such unreliable bugs. You may find they provide hints for where to start investigating:

Threaded code
As threads entwine and interact in non-deterministic and hard-to-reproduce ways, they often contribute to freaky intermittent failure.

Often this behaviour is very different when you pause the code in a debugger, so it is hard to observe forensically. Logging can also change the interaction of the threads and mask the problem. And non-optimised â€˜debugâ€™ builds of your software can perform rather differently from the â€˜releaseâ€™ builds.

These are affectionately known as Heisenbugs, after the physicist Werner Heisenbergâ€™s â€˜observer effectâ€™ in quantum mechanics. The act of observing a system can alter its state.
Network interaction
Networks are, by definition, laggy and may drop or stall at any point in time. Most code presumes that all access to local storage works (because, most often, it does). This is careless, and will not scale to storage over a network, where failures and intermittent long load times are common.
The variable speed of storage
Itâ€™s not just network latency that can cause this. Slow spinny disks, or database operations, may change the behaviour of your program, especially if you are balanced precariously on the edge of timeout thresholds.
Memory corruption
Oh, the humanity! When your aberrant code overwrites part of the stack or the heap, you can see a myriad of unreproducible strangenesses that are very hard to detect. Software archaeology is often the easiest route to diagnose these errors.
Global variables/singletons
Hardcoded communication points can be a clearing house for unpredictable behaviour. It can be impossible to reason about the correctness of your code, or predict what will happen, when anyone at any time can reach into a piece of global state and adjust it under your feet.

Conclusion

Debugging isnâ€™t easy. But itâ€™s our own fault. We wrote the bugs. Effective debugging is an essential skill for any programmer.

Questions

What tools or techniques do you fall back on when hunting a bug?
Are there other techniques you should try?
What was the trickiest bug youâ€™ve ever had to find? What was the key thing that helped you find the cause?
Do you know other programmers who are better at finding and fixing bugs? What makes them more capable? How can you learn from them?
How can you close the gap between the introduction of a bug into a software system and the point at which it is observed, and the point at which it is fixed?

Notes and references

[1] Broken windows theory implies that keeping neighbourhoods in good condition prevents vandalism and crime.See http://en.wikipedia.org/wiki/Brokenwindowstheory

[2] Andrew Hunt and David Thomas, The Pragmatic Programmer (Boston: Addison Wesley, 1999).

Notes:

More fields may be available via dynamicdata ..