    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: Professionalism in Programming #22</title>
        <link>https://members.accu.org/index.php/journals/1247</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">CVu Journal Vol 15, #5 - Oct 2003 + Professionalism in Programming, from CVu journal</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c77/">CVu</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c106/">155</a>
                    (10)
<br />

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c184/">Journal Columns</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c182/">Professionalism</a>
                    (40)
<br />

                                            <a href="https://members.accu.org/index.php/journals/c106-182/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/journals/c106+182/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;Professionalism in Programming #22</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 06 October 2003 13:16:00 +01:00 or Mon, 06 October 2003 13:16:00 +01:00</p>
<p><strong>Summary:</strong>&nbsp;<p>Finding fault.</p></p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e22" id="d0e22"></a></h2>
</div>
<p><img src="/var/uploads/journals/resources/bug.png" align="right">Nobody's perfect.
Well, except for me that is. All day I have to sit down and work
through tedious problems in other people's code. The test
department discovers that our software falls over when they do
<span class="emphasis"><em>such-and-such</em></span>. So I trawl
through the system to find what Programmer Fred did wrong three
years ago, patch it up and send it back to test for them to break
again.</p>
<p>Of course, you wouldn't find me making those sorts of elementary
mistakes, not a chance. My code is watertight. Faultless. Low fat
and cholesterol free. I don't write a line until I've gone over
everything in my head, I don't complete a code statement without
considering all the special cases that might occur, and I type so
carefully that I've never once misplaced <tt class="literal">=</tt>
for <tt class="literal">==</tt> in an <tt class="literal">if</tt>
statement.</p>
<p>Totally fault free, me. Really.</p>
<p>Well, perhaps not quite.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e45" id="d0e45"></a>The facts of
life</h2>
</div>
<p>I don't think anyone sits trainee programmers down and explains
the facts of life to them. <span class="emphasis"><em>It's like
this, son. There are the birds and the bees. Oh, and the
bugs.</em></span> Bugs are the inevitable dark side of constructing
software, a simple fact of life. Sad, but true. Whole departments,
and even industries, exist to manage them.</p>
<p>Everyone reading this will be only too aware of the
proliferation of faults that exist in released software. How do
bugs appear with such frightening regularity and in such great
magnitude? It's all down to human nature. Programs are written by
humans. Humans make mistakes. They make mistakes for a number of
reasons (or excuses). They make mistakes because they don't
understand the system they're working on well enough, because they
don't correctly understand what they are implementing, but more
often than not because they just don't pay enough attention to what
they're doing. Most bugs are due to mindlessness. I once saw a
wonderfully simple illustration of this, play along at home:</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>The tree that grows from an acorn is called an
....................</p>
</li>
<li>
<p>The noise a frog makes is a ....................</p>
</li>
<li>
<p>The vapour that rises from fire is called
....................</p>
</li>
<li>
<p>The white of an egg is called the ....................</p>
</li>
</ul>
</div>
<p>The yolk, right? Think about it. If you didn't fall for that
one, then you were probably only paying attention because I'd just
warned you. Hey, give yourself a brownie point anyway. But tell me
who warns you every time that you're about to write a potentially
flawed line of code? They'd deserve a lifetime supply of brownie
points.</p>
<p>So as programmers we're all to blame for the bad state of
software. We're all guilty. Do we learn to live with the guilt, or
do we do something about it? There are two types of response. The
first school is the <span class="emphasis"><em>it's not a fault,
it's a feature</em></span> school. A fault turns up and we respond
in the words of the great philosopher Bart Simpson: <span class=
"emphasis"><em>I didn't do it. Nobody saw me do it. You can't prove
anything</em></span> [<a href="#">Simpsons</a>]. We blame compiler
quirks, OS flaws, random climate changes, or computers with a mind
of their own. Or as I alluded to in the opening paragraphs, we
blame other people. A Teflon raincoat can be a very handy
programming tool.</p>
<p>However, we should really subscribe to the second school, the
school that concedes that software errors are not entirely
inevitable. Many of these kinds of mindless mistake can be picked
up or even prevented, and as responsible programmers we should be
taking steps to do so. In this article we'll find out about of
this, and look at some good debugging techniques to employ when
bugs do slip through the net.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e83" id="d0e83"></a>Nature of the
beast</h2>
</div>
<p>Contrary to popular belief the term bug was in use before the
advent of computers. In the 1870s Thomas Edison talked about bugs
in electrical circuits. The story of the Harvard University Mark II
Aiken Relay Calculator tells of the first recorded computer bug. In
1947, the early days of computers when they took up whole rooms, a
moth flew in and managed to lodge itself in some circuits, causing
a system failure. They taped it into the logbook and wrote below:
<span class="emphasis"><em>First actual case of bug being
found</em></span>. For posterity's sake it has been preserved in
the Smithsonian Institute.</p>
<p>Bugs are bad news. But what are they really? It's worth
identifying the different varieties of bug we encounter, understand
how they are born, survive and can be exterminated. It's also
important to know what to call them; see the sidebar for more on
this matter.</p>
<div class="sidebar">
<p class="title c3">Nomen nudum: what <span class=
"emphasis"><em>shall</em></span> we call them?</p>
<p>The term 'bug' is remarkably evocative, and incredibly
imprecise. It's very easy to throw around words without really
understanding what they mean. If we use more specific terminology
then we'll get straight in our head some key facts.</p>
<p>The exact meaning of the three terms below depends on who's
defining them; this can get a bit philosophical. These
interpretations are largely inspired by IEEE literature [<a href=
"#ANSI-IEEE">ANSI-IEEE</a>].</p>
<div class="variablelist">
<dl>
<dt><span class="term">Error</span></dt>
<dd>
<p>An error is something that we do wrong. It is a specific human
action that results in software containing a fault. Whilst merrily
coding away, for example, forgetting to check a condition (like the
size of an array before indexing into it) is an error.</p>
</dd>
<dt><span class="term">Fault</span></dt>
<dd>
<p>A fault is the consequence of an error, embodied in the
software. I made an error, and this resulted in a fault in the
code. Now at first this is a <span class=
"emphasis"><em>latent</em></span> problem. If the code I've just
written is never executed then this fault will never have a chance
to cause problems. If execution often passes through the faulty
code, but never in the particular way that triggers the fault,
we'll never notice that there is a fault at all. This subtle little
point is what makes debugging so notoriously difficult. A faulty
line of code may appear to work flawlessly for years, and then one
day it causes the most bizarre system tantrum you've ever seen;
you'll not suspect the aged code since it's been so reliable for so
long.</p>
</dd>
<dt><span class="term">Failure</span></dt>
<dd>
<p>So a fault, if encountered, may cause a failure. It may not. The
failure is what we really care about, the manifestation of the
fault, and it's the only thing we'll probably take notice
of<sup>[<a name="d0e128" href="#ftn.d0e128" id=
"d0e128">1</a>]</sup>. A failure is the departure of your program's
operation from its requirements, from its expected behaviour. This
is where we are verging on philosophy. If a tree falls over in a
forest does it make a sound; if the running program doesn't
exercise a bug, is the mistake still a fault? These definitions
help to answer this.</p>
</dd>
</dl>
</div>
<p>If you want a hard definition of bug then it is a synonym for
fault. The problem with the word &quot;bug&quot; is that users throw it
around without knowing exactly what they're describing; this
dilutes any true meaning. When being precise it's best ignored.
There are other related words that can be thrown into the lexicon
for good measure: defect, for example. Again, they'll mean
different things ifyou ask different people, and we can happily
survive here without getting too anal about them.</p>
<p>In most situations these (perhaps arbitrary) distinctions don't
really matter, you can happily talk about a fault, an error, or a
bug and not worry about being pedantically misinterpreted. However,
in an article about bugs it's good to be clear what we're talking
about.</p>
</div>
<p>Software bugs fall into a few broad categories, and
understanding these will help us to reason about them. Some bugs
are naturally harder to find than others, and this usually turns
out to be related to their category. Stepping right back and
squinting at them from a distance, we see these three classes
emerge:</p>
<div class="variablelist">
<dl>
<dt><span class="term">Failure to compile</span></dt>
<dd>
<p>It's really annoying when the code you've spent ages writing
fails to compile. It means that you'll have to go and fix a tedious
little typo or some parameter type mismatch, then wait for the
compiler to run again before you can get to the real job of testing
your handiwork. It may come as a surprise to learn that this is the
best type of error you can get. Why? Simply because it's the
easiest to detect and fix<sup>[<a name="d0e145" href="#ftn.d0e145"
id="d0e145">2</a>]</sup>. It's the most immediate, and the most
obvious.</p>
<p>Faults cost more to fix the longer it takes to detect them. We
saw in the previous article that the cost of changing software
rises dramatically over the life of a project, and this holds for
fixing faults. The sooner we catch them and fix them, the sooner we
can move on, the less fuss and cost they incur. Compilation
failures are easy to notice, and usually very easy to fix. You
can't run the code until you have.</p>
<p>Most of the time a compilation failure will be a silly syntactic
mistake, or something simple like calling a function with the wrong
number or type of parameters. The failure might be due to a fault
in a makefile, it might be a link stage error (say, a missing
function implementation), or even a build server running out of
disk space.</p>
</dd>
<dt><span class="term">Runtime crash</span></dt>
<dd>
<p>After enough donkeying about fixing your compilation errors, out
pops your executable and you merrily run it. Then it crashes. You
probably swear and mutter something about random cosmic rays. After
the sixtieth crash you're threatening to throw your computer out of
the window. These kinds of error are far harder to deal with than
compilation errors, but they're still reasonable to work with.</p>
<p>This is because, like compilation errors, they are blindingly
obvious. You can't argue with an ex-program. You can't pretend a
crash is a feature. When it has kicked the bucket and shuffled off
its mortal coil, you step back and begin to figure out where your
program went wrong. You'll have some clues (what input sequence
preceded the crash, what had happened previously), and can employ
tools to discover more information (more on this later).</p>
</dd>
<dt><span class="term">Unexpected behaviour</span></dt>
<dd>
<p>Now this is the really nasty one, when your program isn't
pushing up the daisies, just pining for the fjords. Suddenly it
does the wrong thing. You expected a blue square and out popped a
yellow triangle. The code continues to meander on its happy way
with total disregard for your frustration. What caused the yellow
triangle to appear? Has the program been overthrown by a militant
army of guerrilla COM objects? It will almost certainly be a minute
logic problem in the bowels of the code that executed over half and
hour ago. Good luck finding it...</p>
</dd>
</dl>
</div>
<p>A failure may manifest itself because of a defective single line
of code, or may only show up when several interconnecting modules
are finally glued together, their assumptions not quite matching
up.</p>
<p>Moving in a bit, and looking more closely at runtime errors, a
few more groupings of fault become clear. Here they are ranked in
order of pain, from splinter to decapitation.</p>
<div class="variablelist">
<dl>
<dt><span class="term">Syntactical errors</span></dt>
<dd>
<p>Whilst these <span class="emphasis"><em>are</em></span> mostly
caught by the compiler at build-time, sometimes language grammar
errors slip through undetected. They can generate weird and
unexpected behaviour. The syntax error will often be one of;
mistaking == for =, or &amp;&amp; for &amp; in a conditional
expression, forgetting a semicolon or adding one in the wrong place
(the classic is after a for statement), forgetting to enclose a set
of loop statements in braces, or mismatching parentheses. The
simplest way to avoid being tripped up by these sorts of error is
to keep all warnings switched on; compilers tend to moan about of
lot of these potential problems.</p>
</dd>
<dt><span class="term">Build errors</span></dt>
<dd>
<p>Whilst not necessarily a runtime fault <span class=
"emphasis"><em>per se</em></span>, the build error manifests itself
at run time. Be on the lookout and always distrust your build
system, no matter how good you think it is. In these enlightened
times you're unlikely to come across a compiler bug. However, you
may not always be running what you thought you built. Several times
I've been hit by this: the build system failed to create a program
or shared library, perhaps because makefiles didn't contain
adequate dependency information, or the old executable had a bad
timestamp. Every time I tested a modification I was still running
the old buggy code unawares.There are a number of ways to confuse a
build system, but the worst part is you don't notice it failing -
like a leprous limb.</p>
<p>It can take quite some time (and maybe even a brief stint in the
funny farm) to notice that this is biting you. For this reason,
when you feel at all wary of what's going on it can be sensible to
do a total clean out of your project, and then rebuild from
scratch. This should flush out any possible build system
problems<sup>[<a name="d0e192" href="#ftn.d0e192" id=
"d0e192">3</a>]</sup>.</p>
</dd>
<dt><span class="term">Basic semantic bugs</span></dt>
<dd>
<p>The majority of runtime faults are due to very simple errors
causing incorrect behaviour. Using uninitialised variables is a
classic example, and can be quite hard to track since the program's
behaviour may depend on the garbage value that waspreviously in the
memory location used by the variable. One time the program will
work fine, another timeit may fail. Other basic semantic faults
are: comparing floats for equality, writing calculations that don't
handle numerical overflow, and rounding errors from implicit type
conversions (losing the sign of a char is common). This type of
semantic fault is often caught with static analysis tools.</p>
</dd>
<dt><span class="term">Semantic bugs</span></dt>
<dd>
<p>These are much harder to identify, the insidious errors that
won't be caught by inspection tools. A semantic bug might be a
low-level error like the wrong variable being used in the wrong
place, not validating a function's input parameters, or getting
aloop wrong. It may be a higher-level piece of wrong-headedness,
calling an API incorrectly, or not keeping an object's
stateinternally consistent. A pile of memory related errors fall in
this category - they can be evil to find due to their ability to
warp and corrupt your running code, so that it behaves in totally
unpredictable and unreasonable ways. Programs often behave weirdly.
The only consolation is that they're doing exactly what we told
them to.</p>
</dd>
</dl>
</div>
<p>The best kind of runtime failures are the reliable ones. If
they're reproducible, they are much easier to write tests for, and
track down the cause of. The failures that don't always occur tend
to be memory corruptions.</p>
<p>Now that we have things in neat little boxes, let's zoom right
in and take a look at some of the specific types of runtime
failure. These are some common semantic faults that we come
across.</p>
<div class="variablelist">
<dl>
<dt><span class="term">Segmentation faults (or protection
faults)</span></dt>
<dd>
<p>come from accessing memory locations that have not been
allocated for the program's use. They result in the operating
system aborting the application code and producing some form of
error message, usually with diagnostic information. This can be
triggered by dodgy pointer arithmetic, or far too easily by typing
errors involving pointers. A common C typo causing a segfault is
<tt class="literal">scanf(&quot;%d&quot;, number);</tt> The missing
<tt class="literal">&amp;</tt> before <tt class=
"literal">number</tt> makes <tt class="literal">scanf</tt> try to
write into the memory location referenced by the (garbage) contents
of number, and <span class="emphasis"><em>poof!</em></span> the
program disappears in blue smoke. If you're really unlucky, though,
number happens to hold a value that equates to a valid memory
address. Now your code will continue as if nothing was wrong, until
the memory you just wrote over is used and your fate is in the lap
of the gods.</p>
</dd>
<dt><span class="term">Memory overruns</span></dt>
<dd>
<p>are caused by writing past memory that has been allocated for
your data structure, be it an array, a vector, or some other custom
construct. When writing values into the wide blue yonder, you'll
generally end up clobbering data from some other part of your
program. If you're running on an unprotected operating system (more
common in embedded environments) you may even tamper with data from
another process or the OS itself. Ouch. Memory overrun is a common
problem and difficult to detect, usually the symptom is random
unexpected behaviour manifesting at a much later point than the
overrun, many thousands of instructions later. If you're lucky the
memory overrun hits an invalid memory address and you get a
segfault which is hard not to notice. Use 'safe' data structures
wherever possible to insulate yourself from the possibility of such
disaster.</p>
</dd>
<dt><span class="term">Memory leaks</span></dt>
<dd>
<p>are a constant threat in non-garbage collected
languages<sup>[<a name="d0e248" href="#ftn.d0e248" id=
"d0e248">4</a>]</sup>. When you want some memory you have to ask
the runtime for it nicely (using malloc in C or new in C++), and
then you have to be polite and give it back when you're done (using
free and delete respectively). If you rudely forget to release
memory, your program slowly consumes more and more of the
computer's scarce resources. You may not notice it at first, but
gradually your computer's response will degrade, as memory pages
thrash to and from the disk. Two other classes of error relate to
this: freeing a memory block too many times causing unpredictable
environmental failures, and not managing other scarce resources
carefully, like file handles and network connections.</p>
</dd>
<dt><span class="term">Running out of memory</span></dt>
<dd>
<p>is always a possibility, as is running out of file handles or
any other managed resource. It might be rare (modern computers have
so much memory, how could this possibly happen?) but that's no
excuse to ignore the potential for failure. Only sloppy code fails
to make appropriate checks and will consequently perform in a very
brittle manner when run in constrained situations. Always validate
the return status of a memory allocation or file open system call.
It is worth noting that some modern operating systems<sup>[<a name=
"d0e258" href="#ftn.d0e258" id="d0e258">5</a>]</sup> will never
return a failure from a memory allocation call - every allocation
returns a pointer to a reserved but unallocated memory page. When
the program eventually tries to access this page, an OS mechanism
traps the access and then really allocates memory to the page,
resuming normal program operation. This all works nicely until the
available memory finally is exhausted. Your program will then be
sent error signals, a long time after the relevant allocation
occurred.</p>
</dd>
<dt><span class="term">Maths errors</span></dt>
<dd>
<p>(<span class="emphasis"><em>or &quot;Math&quot; errors for those using
strange variants of the English language - Ed</em></span>) come in
a number of guises: floating point exceptions, incorrect
mathematical constructions or incorrect use of floating point
numbers (for example, divide by zero). Even trying to output a
<tt class="literal">float</tt> but passing an <tt class=
"literal">int</tt> through <tt class="literal">printf(&quot;%f&quot;)</tt>
can cause your program to bomb with a maths error.</p>
</dd>
<dt><span class="term">Program hangs</span></dt>
<dd>
<p>are usually caused by bad program logic. Infinite loops with
badly crafted terminal cases are the most common, we also see
deadlock or race conditions in threaded code, and in event-driven
code the waiting on events that will never occur. It is usually
fairly easy to interrupt the running program, see where the code
has stalled and determine the cause of the hang.</p>
</dd>
</dl>
</div>
<p>Different OSes, languages, and environments report these errors
in different ways, with different wording. Some languages try to
avoid types of error by not providing features you can shoot
yourself in the foot with. Java, for example, has no pointers and
checks every memory access you make automatically.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e291" id="d0e291"></a>Pest
extermination</h2>
</div>
<p>Like a hypochondriac, our code is constantly complaining about
being ill. More often than not it genuinely is in need of some
attention. We're the doctors. If our code is sick then we've got to
perform the diagnosis, the surgery, and nurture it through its
convalescence.</p>
<p>Weeding out bugs is hard. Not only do humans make mistakes when
writing, they also make mistakes when reading. When I proof read
these articles I have a tendency to read what I <span class=
"emphasis"><em>meant</em></span> to write and not what I
<span class="emphasis"><em>really</em></span> wrote; it works the
same for software. When we look at our faulty code we'll tend to
see what we intended, not how the compiler actually interprets our
instructions. In this respect the compiler is really quite
pedantic, it can only produce exactly what we asked, not what we
were hoping for.</p>
<p>Some programmers introduce far fewer faults into their code than
their peers (as much as 60% less), can find and fix faults quicker
(in as little as 35% of the time), and introduce fewer faults as
they do so (figures from [<a href="#Gould">Gould</a>]). How do they
do it? They are naturally able to pay more attention to the task,
and can focus on the microscopic level of the code they're writing
whilst keeping the broader picture in mind.</p>
<p>The professional programmer is always mindful of introducing
faults, and will try to fix a detected problem sooner rather than
later. Certainly, it's wrong to presume that we only check for
problems when the software has been written. I've known many
programmers who believe that the test department exists to detect
their bugs for them. This is just plain wrong.</p>
<p>There is a clear difference between testing and debugging.
Testing identifies the presence of a fault, e.g. the program output
is incorrect, whereas debugging is the process of reproducing,
locating, understanding, and fixing a fault.</p>
<p>Testing is QA, that is quality assurance; debugging is repairing
a problem. You don't get quality by fixing bugs, you can't add it
in at the end of software development, you must plan the quality
into the architecture and implementation. Testing won't prove the
absence of faults, it won't catch all errors. It's impossible to
draft exhaustive test cases; software is just too complex. We will
inevitably release software into the field containing faults that
may still crop up. Yes, the quality of our software is in part down
to the quality of our testing department, but also to our personal
testing, and the quality of the fixes that we implement.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e315" id="d0e315"></a>Debugging
techniques and tools</h2>
</div>
<p>There is an art to debugging, and it's very much something to be
learnt. It's a skill. Experience shows you how to become an
effective debugger. And this is something that we <span class=
"emphasis"><em>will</em></span> all get plenty of experience at.
Now, different people's brains work in very different ways, and
they have different ways of problem solving. What works for one
programmer may not for another. However, there are some general
principles that always apply.</p>
<p>The sidebar (next page) offers a whistle-stop tour of the tools
available to aid our bug hunting. How we use these tools and where
and when they are applicable will differ from situation to
situation. However, one of the most potent weapons in our debugging
arsenal is a distrust of anyone's code mixed with a healthy dose of
cynicism. The cause of your errant behaviour could be absolutely
anything, and in the act of diagnosis we should start by
eliminating even the most unlikely of candidates.</p>
<p>How difficult it is to find a fault depends on how well you know
the code it's lurking in. It's hard to jump into some random source
and make any kind of judgement about it without knowing the
structure and how it's intended to work. For this reason, if you
have to debug some new code take time to learn it first, it really
will pay off in the long run.</p>
<p>The ease of debugging is also dependent on the control you have
over the execution environment, how much you can play around with
the running program and inspect its state. In an embedded
environment debugging can be much harder because the tool support
is sparser. You're also probably running in an environment that is
providing a lot less insulation from your own stupidity; little
mistakes can have much bigger consequences.</p>
<p>There are two distinct facets to debugging: <span class=
"emphasis"><em>finding</em></span> the fault and <span class=
"emphasis"><em>fixing</em></span> the fault. The following sections
describe a sensible approach to both.</p>
<p>The golden rule when debugging is this: <span class=
"emphasis"><em>Use Your Brain.</em></span> Think. Consider what
you're doing. Don't flail around thoughtlessly hacking at bits of
code until something begins to look like it might be working. Now,
sometimes a quick fiddle about <span class=
"emphasis"><em>will</em></span> get you results, sometimes some
hacky little exploratory tests will pinpoint the problem quickly.
So is it a justifiable thing to do? Perhaps, but if you make the
conscious decision to do some quick-and-dirty stabbing around, set
yourself a hard time limit to do it in. It's all too easy to spend
an entire morning with the 'just one more little go' approach.
After the time limit is up, follow the more methodical approach
laid out below.</p>
<p>If your quick stab turns up trumps and you do find the fault,
reengage your thinking gear. Look at the <span class="bold"><b>How
to fix fault</b></span>s section below, make the change carefully
and thoughtfully. Just because the fault was easy to find, it
doesn't necessarily mean that the fix is quite as obvious as it
looks.</p>
<div class="sidebar">
<p class="title c3">Wasp spray, slug repellant, fly paper...</p>
<p>Debugging would be a lot nicer if there was someone else to do
the job for us. Whilst that'll never happen, we can make the job a
lot more palatable with a little help. Many useful tools exist;
you'd be stupid not to take advantage of them. A little time
learning how they work may reduce your debugging time
immeasurably.</p>
<p>Some tools are <span class=
"emphasis"><em>interactive</em></span>, allowing you to inspect the
code in various ways whilst a program is actually running. In
advanced development environments these tools may be seamlessly
integrated, or they may need to be run as separate programs. Other
tools are <span class="emphasis"><em>non-interactive</em></span>,
often running as a code filter or parser spitting out information
about the code following analysis. In this list we'll also consider
tools you may not have thought of as debugging aids, and even some
helpful procedures.</p>
<div class="variablelist">
<dl>
<dt><span class="term">Debugger.</span></dt>
<dd>
<p>This is perhaps the most well known debugging tool, its name
kind of gives its purpose away. A debugger is an interactive tool
that allows you to view the internals of your running program and
poke around with it. You can follow the flow of control, inspect
the contents of variables, set <span class=
"emphasis"><em>breakpoints</em></span> in the code for later
interruption, even run arbitrary sections of code at will.
Debuggers come in many shapes and sizes, some command line tools,
some graphical applications. Usually there will be at least one
available for your particular development platform (although the
ubiquitous <tt class="literal">gdb</tt> seems to get ported to
every conceivable platform these days). A debugger relies on
<span class="emphasis"><em>symbols</em></span> being left in your
executable (these are the compiler's debugging information which
are normally stripped out at the final link stage) - it uses these
to provide you with information about function and variable names,
and the location of the source files. A debugger is a rich and
powerful tool, however I believe that they can often be misused or
overused, and can actually inhibit good debugging. Programmers
easily get wrapped up chasing what the program is doing, getting
side tracked by observing the wrong variable values, stepping into
the wrong functions, and don't sit back and <span class=
"emphasis"><em>think</em></span> about the problem they are trying
to solve. A little more thought about a failure may pinpoint the
specific fault far quicker than trying to hunt it down in a
debugger.</p>
</dd>
<dt><span class="term">Memory access validator.</span></dt>
<dd>
<p>This interactive tool inspects your running program for memory
leaks and overruns. It can be remarkably useful, showing up reams
of memory release failures you never knew existed.</p>
</dd>
<dt><span class="term">System call trace utilities,</span></dt>
<dd>
<p>like Linux's <tt class="literal">strace</tt> show all the system
calls issued by an application. This can be a good way to see how a
program is interacting with its environment, particularly useful
when it appears to be stalled on some external activity that is not
happening.</p>
</dd>
<dt><span class="term">Core dump.</span></dt>
<dd>
<p>This is a Unix term for the OS-generated snapshot of a program
that can be produced when it exits abnormally. The term derives
from archaic machines with ferrite core memory, however the dump
file is still called <span class="emphasis"><em>core</em></span>.
It contains a copy of the program's memory when it died, the state
of the CPU registers, and the function call stack. The core dump
can be loaded into an analyser (which is most often the debugger)
to query a number of useful bits of information.</p>
</dd>
<dt><span class="term">Logging facilities</span></dt>
<dd>
<p>allow you to programmatically generate information about your
application as it runs. Rich logging systems allow you to assign
priorities to the output (e.g. debug, warning, fatal), and then
filter out a particular message level at run time. The program's
log gives a history of activity that can help pinpoint what
circumstances triggered a failure. The logging facility may be an
integral part of the operating environment, or provided by a third
party library. Without such support you'll see the use of
<tt class="literal">printf/cerr</tt> diagnostic information,
introduced on a very ad hoc basis. This is about as basic as you
can get, and must be carefully removed in the production code
release. <tt class="literal">printf</tt>s may also clobber the
normal program output. I have worked in environments where even
lowly <tt class="literal">printf</tt>s weren't available; when
bringing up a system board the only diagnostic output I had was a
single eight segment LED display, and a scope attached to a spare
system bus! There are downsides to logging: it can slow down
program execution and bloat the executable size if the logging
statements can't be compiled out completely. Some logging systems
are useless for trapping a program crash, since at the crash time
messages may still be stuck in an output buffer that will never get
flushed. Be sure you know how well your logging mechanism works,
and always send diagnostic <tt class="literal">printf</tt>s to the
unbuffered <tt class="literal">stderr</tt>, not <tt class=
"literal">stdout</tt>.</p>
</dd>
<dt><span class="term">Static analyser.</span></dt>
<dd>
<p>This is a type of non-interactive tool that inspects source code
for potential problem areas. Many compilers include support for
this kind of functionality when set to their maximum warning level,
but good static analysis tools go far beyond this. Products exist
to discover problem code, any usage of undefined behaviour or
non-portable constructs, to identify dangerous programming
practices, to provide code metrics, to enforce coding standards,
and to create test harnesses. Use of a static analysis tool can
eradicate many errors before they have a chance to bite. A handy
safety net. It's a sound pragmatic idea to use a static analyser
from a different company than your compiler manufacturer - they're
less likely to have made the same set of mistakes.</p>
</dd>
<dt><span class="term">Code reviews</span></dt>
<dd>
<p>often identify problem areas that would otherwise go undetected.
They were described in an earlier article [<a href=
"#Goodliffe4">Goodliffe4</a>]. If you've never done one, you'll be
surprised how many faults can get unearthed this way.</p>
</dd>
<dt><span class="term">Defensive programming techniques [<a href=
"#Goodliffe9">Goodliffe9</a>]</span></dt>
<dd>
<p>greatly reduce the likelihood of all sorts of errors. In
particular, the use of assertions to check logical invariant
conditions can be crucial. Whilst tracking a bug you can insert
more assertions to validate the assumptions you've made about the
code.</p>
</dd>
<dt><span class="term">Fault logging/reporting database
systems</span></dt>
<dd>
<p>such as <tt class="literal">Bugzilla</tt> provide persistent
records of all failures so no problem, no matter how small, is ever
forgotten. It helps you gather statistics on the quality of the
project, so you know when it has reached a releasable state. It is
a key tool, integral to the development process. It won't find
faults for you, but helps co-ordinate the process of doing so. It
allows you to assign problems to engineers, to mark issues as
resolved or duplicated, and acts as a bridge between the test
department and development. No software development organisation
should function without such a system in place, although it's
frightening that many do.</p>
</dd>
<dt><span class="term">Source code editor.</span></dt>
<dd>
<p>A good editor will prevent you from making a whole pile of silly
mistakes. Syntax highlighting often provides visual cues when
you've made an error. You'll see when you mismatch comment
delimiters, or get brace or parenthesis mismatches. A goodediting
environment also provides navigation around your code so you can
find offending areas easily.</p>
</dd>
<dt><span class="term">A version management system</span></dt>
<dd>
<p>stores the source code and a history of its development. It
allows you to review changes that have been made, find out who made
them and when. When a fault rears its head you can revert to a
previously working revision and inspect the differences that have
been made.</p>
</dd>
</dl>
</div>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e475" id="d0e475"></a>Bug
hunting</h2>
</div>
<p>So how do we find bugs? If there was a simple three-step process
we'd all have learnt it and our programs would be perfect by now.
As it is, there isn't and they aren't. Let's try to distil the
available bug hunting wisdom.</p>
<p><span class="bold"><b>Compile time errors.</b></span> We'll look
at these first, since they are comparatively easy to deal with.
When your compiler comes across something unpleasant it will not
normally just complain the once, but take the opportunity to sound
off about life in general, spitting out a ream of other subsequent
error messages. It's been told to do this; upon encountering any
error the compiler tries to pick itself back up and carry on
parsing away. It's not always too good at it, but with code like
yours who could blame it?</p>
<p>The upshot is that the later compiler messages can all be quite
random and irrelevant. You should only need to look at the very
first error reported, and sort out that problem. Have a glance
further down the list by all means, there may be some other good
things down there, but more often than not there isn't.</p>
<p>Even this first compiler error may be cryptic or misleading,
depending on the quality of the compiler (if you're really stumped
by what an error means try another compiler, perhaps). Hardcore C++
template code can produce inspired errors from some compilers. The
reported fault usually is on the line that the compiler reports,
but sometimes it may actually be on the preceding line - a syntax
error there causes the following line to be nonsensical, and this
is what thecompiler notices and moans about.</p>
<p><span class="bold"><b>Linker errors</b></span>, on the whole,
are far less cryptic. The linker will tell you that it's missing a
function or library and so you'd better go off and find it (or
write it). Sometimes the linker may complain about arcane vtable
related C++ problems, this is usually a symptom of missing a
destructor's implementation or something like that.</p>
<p><span class="bold"><b>Run time errors</b></span> require a
little more of a game plan. If your program contains a bug then
it's likely that somewhere in the code a condition you believed to
be true isn't. Finding the bug is a process of confirming what you
think is correct until you find the place where the condition
doesn't hold. You have to develop a model of how the code really
works and compare this with how you'dintended it to. The only
sensible way to do this is methodically.</p>
<p>Scientific method is the process scientists use to develop an
accurate representation of the world. That sounds akin to what we
are trying to do. There are four steps to scientific method: (i)
observe a phenomenon, (ii) form a hypothesis to explain it, (iii)
use thishypothesis to predict the results of further observations,
and finally (iv) perform experiments to test these predictions. Now
I'm not proposing that we use scientific method wholesale, for a
start we're trying to get <span class="emphasis"><em>rid
of</em></span> the errant phenomenon rather than build a model of
it. However, scientific method is a good backbone and you'll see it
reflected in the steps below.</p>
<div class="variablelist">
<dl>
<dt><span class="term">Identify</span></dt>
<dd>
<p>a failure. It all starts here, when you notice that the program
doesn't do what it's supposed to. It may crash, it may just produce
a yellow triangle, but you know something's up and you've got to
fix it. The first thing you do is put a fault report into the fault
database. This is particularly valuable if you're in the middle of
tracking some other bug or have no time to handle the fault right
now. Making a record ensures the fault doesn't get lost. Don't just
make a mental note to come back to a problem later. You'll
forget.</p>
<p>Even if you're going to start fixing the fault immediately,
having the record in the database serves a useful purpose - it
shows other developers that a problem has been identified and is
under investigation. It also allows reports to be generated about
the number of issues remaining/resolved in the codebase.</p>
<p>Identify the nature of the errant behaviour. Characterise the
problem as completely as possible by answering questions like: is
it timing sensitive, does it depend on input, system load, or
program state. If you don't understand the bug before you try to
fix it you'll just be changing code until the symptom disappears.
You may only have masked a cause so the fault will crop up
elsewhere.</p>
</dd>
<dt><span class="term">Reproduce</span></dt>
<dd>
<p>it. This goes alongside characterising the failure. Work out the
set of steps you can take to reliably trigger the problem. If there
is more than one way then document them all.</p>
<p>You have a problem if the bug isn't reproducible; the best you
can do is set mousetraps for the fault and see what you can find
out when it does occur. For these unreliable failures, keep careful
notes of the information you collect, it may be a while until you
next see the problem crop up.</p>
</dd>
<dt><span class="term">Locate</span></dt>
<dd>
<p>the fault. This is the big one. You've got the scent, now you
need to track the beast and pinpoint its location from what you've
learnt. That's far more easily said than done. This is a process of
eliminating all the things that don't contribute to the failure, or
are working correctly, Sherlock Holmes-style. You may need to draft
new tests. You may need to poke around in the seedy underbelly of
the system. You will probably find that there is more information
you need to gather as you progress.</p>
<p>Analyse what you have found about the failure. Without jumping
to conclusions, draw up a list of code suspects. See if you can
spot patterns of events that hint at causes. If possible, keep a
record of the inputs and outputs that demonstrate the problem.A
good starting point for the investigation is where the error
manifests itself - although this is rarely the actual habitat of
the fault. Remember, just because a failure exhibits itself in one
module that doesn't necessarily mean that that module is to blame.
Determining this position is easy if your program crashed, you can
use a debugger to get information like the line of code in
question, the value of all variables at that point, and what called
this function. In the absence of a crash, start from a point you
know exhibits incorrect behaviour. Work backwards from there
following the flow of control, checking that the code is doing what
you expect at each point.</p>
<p>There are a few common bug hunting strategies. The worst is
randomly changing things to see the failure goes away. This is an
immature approach. (A professional will at least try to make it
look scientific!) A far better strategy is <span class=
"emphasis"><em>divide and conquer</em></span>. Say you have the
fault pinned down to a single function that consists of ten steps.
After the fifth print out the intermediate result, or set a
breakpoint and investigate it in your debugger. If the value is
good then the fault lies in the instructions after this, otherwise
it's in the instructions before. Concentrate on those instructions
and repeat until you've cornered the fault.</p>
<p>Another technique is the dry run method. Rather than relying on
intuition to locate the error, you play the role of the computer,
tracing program execution through a trial run, calculating all
intermediate values, to get the final result. If your result and
reality don't match then you know a fault lies in the code.
Although time consuming this can be very effective, highlighting
your bad assumptions.</p>
</dd>
<dt><span class="term">Understand</span></dt>
<dd>
<p>the real problem once you've found where it's lurking. If it's a
simple syntactical error then getting your head round it isn't too
bad. For more complex semantic problems make sure you really know
what the problem is, and all the ways that it may manifest itself
before you move on.</p>
</dd>
<dt><span class="term">Create a test.</span></dt>
<dd>
<p>Write a test case for the failure that exercises it. You may
have done this in the 'reproduce it' step if you were clever. If
you didn't, then you really want to write one now. With your new
understanding make sure the test is rigorous.</p>
</dd>
<dt><span class="term">Fix</span></dt>
<dd>
<p>the fault. See the following section for a discussion of this
part.</p>
</dd>
<dt><span class="term">Prove</span></dt>
<dd>
<p>you've fixed it. Now you know why you wrote a test case. Run it,
and prove the world is a better place. The test case can be added
to your regression test suite to ensure that the fault is never
reintroducedat a later point.</p>
</dd>
</dl>
</div>
<p>Sometimes you try all this but it just doesn't work, you're left
wailing and gnashing your teeth, with a sore head from banging it
against a brick wall for too long. When things get this bad I
always find it helps to explain the whole problem to someone else.
Somewhere in the description everything seems to slip into place
and I see the one key piece of information I had been missing all
along. Try it and see. Perhaps this is why pair programming is such
a successful strategy.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e561" id="d0e561"></a>How to fix
faults</h2>
</div>
<p>You'll notice that this section is much smaller than the
preceding one. Funny that. Usually the whole problem is finding the
darned fault. Once you've worked out where it is, then the fix is
obvious.</p>
<p>But don't let that lure you into a false sense of security.
Don't stop thinking once you've diagnosed the source of your errant
behaviour. It's very important not to break anything else as you
make the fix - it's surprisingly easy to trample over something in
the flower bed as you stroll over to pluck out a weed.</p>
<p>As you modify code always ask yourself 'what are the
consequences of this change?' Be aware of whether the fix is
isolated to a single statement, or affects other surrounding bits
of code. Might the effect of your change ripple out to any code
that calls this function, does it subtly alter the behaviour of the
function?</p>
<p>Convince yourself that you have really found the cause of the
problem, and not just another symptom. Then you can feel confident
you've put a fix in the right place. Consider whether similar
mistakes may have been made elsewhere in related modules, and go
and fix them if necessary<sup>[<a name="d0e572" href="#ftn.d0e572"
id="d0e572">6</a>]</sup>.</p>
<p>Finally, try to learn from your mistake. We must learn or
otherwise be doomed to repeat the same errors for all eternity. Is
it a simple programming error you keep making, or something more
fundamental, the incorrect application of an algorithm?</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e578" id=
"d0e578"></a>Prevention</h2>
</div>
<p>Anyone will tell you that prevention is better than a cure. The
best way to manage bugs is to not introduce them. Sadly I don't
think we'll ever completely reach this ideal, but careful
programming can avoid so many problems. Good programming is about
discipline and attention to detail.</p>
<p>This section could be enormous, but all prevention advice boils
down to one simple statement: <span class="emphasis"><em>Use Your
Brain</em></span>. Enough said.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e588" id=
"d0e588"></a>Conclusion</h2>
</div>
<p>Like death and taxes, no matter how hard we try to avoid them,
bugs happen. Sure, we should use every sort of anti-wrinkle cream
available and manipulate our money in cunning ways to mitigate the
effects. But if we don't know how to deal with faults when they
stare us in the face then our code is doomed.</p>
<p>Debugging is a skill you develop. It doesn't rely on guesswork,
but on methodical detection and thoughtful repair.</p>
</div>
<div class="bibliography">
<div class="titlepage">
<h2><a name="d0e595" id="d0e595"></a>References</h2>
</div>
<div class="bibliomixed"><a name="Simpsons" id="Simpsons"></a>
<p class="bibliomixed">[Simpsons] The Simpsons. Do the Bart Man.
1991, Geffen. GEF87CD.</p>
</div>
<div class="bibliomixed">
<p class="bibliomixed">ANSI/IEEE. <span class="citetitle"><i class=
"citetitle"><a name="ANSI-IEEE" id="ANSI-IEEE"></a>IEEE Standard
Glossary of Software Engineering Terminology</i></span>. 1984,
ANSI/IEEE Standard 729.</p>
</div>
<div class="bibliomixed"><a name="Gould" id="Gould"></a>
<p class="bibliomixed">[Gould] John Gould. &quot;Some Psychological
Evidence on How People Debug Computer Programs.&quot; 1975, <span class=
"citetitle"><i class="citetitle">International Journal of
Man-Machine Studies</i></span>. No 7.</p>
</div>
<div class="bibliomixed"><a name="Goodliffe4" id="Goodliffe4"></a>
<p class="bibliomixed">[Goodliffe4] Pete Goodliffe.
&quot;Professionalism in programming #4: Code reviews.&quot; <span class=
"citetitle"><i class="citetitle">C Vu</i></span>, Volume 12, No 5.
ISSN: 1354-3164.</p>
</div>
<div class="bibliomixed"><a name="Goodliffe9" id="Goodliffe9"></a>
<p class="bibliomixed">[Goodliffe9] Pete Goodliffe.
&quot;Professionalism in programming #9: Defensive programming.&quot;
<span class="citetitle"><i class="citetitle">C Vu</i></span>,
Volume 13, No 3. ISSN: 1354-3164.</p>
</div>
</div>
<div class="footnotes"><br>
<hr class="c4" width="100">
<div class="footnote">
<p><sup>[<a name="ftn.d0e128" href="#d0e128" id=
"ftn.d0e128">1</a>]</sup> This isn't necessarily the way it should
be. Code inspections, when done, should pick up on a lot of faults
that have never had a chance to manifest themselves as
failures.</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e145" href="#d0e145" id=
"ftn.d0e145">2</a>]</sup> Provided you have a sane build
environment that stops when it encounters an error and provides
some reasonable diagnostic messages.</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e192" href="#d0e192" id=
"ftn.d0e192">3</a>]</sup> This presumes that you trust your 'build
clean' facility. To be really thorough you can delete the project
and check it back out again afresh. Alternatively, manually remove
all intermediate object files, libraries and executables. For large
projects both of these options can be tedious in the extreme.
<span class="emphasis"><em>C'est la vie.</em></span></p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e248" href="#d0e248" id=
"ftn.d0e248">4</a>]</sup> OK, it is possible to leak memory in a
garbage collected language. Hand two objects references to one
another and then let go of both of them. Unless you have a very
advanced garbage collector they will never be swept up.</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e258" href="#d0e258" id=
"ftn.d0e258">5</a>]</sup> This is certainly the case for Linux, at
least until you exhaust the virtual memory address space. At this
point <tt class="literal">malloc</tt> may return 0, but the system
would probably have keeled over before you got a chance to notice.
I'm not sure how Windows works in this respect.</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e572" href="#d0e572" id=
"ftn.d0e572">6</a>]</sup> This is a good reason why &quot;cut and paste&quot;
programming is bad - it is far too dangerous. You may end up
mindlessly duplicating bugs, which then can't be fixed in one
single place.</p>
</div>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
