    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: Professionalism in Programming #26</title>
        <link>https://members.accu.org/index.php/articles/676</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>




<div class="xar-mod-head"><span class="xar-mod-title">Professionalism in Programming, from CVu journal + CVu Journal Vol 16, #3 - Jun 2004</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c184/">Journal Columns</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c182/">Professionalism</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c77/">CVu</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c102/">163</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/c182-102/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/articles/c182+102/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;Professionalism in Programming #26</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 03 June 2004 13:16:05 +01:00 or Thu, 03 June 2004 13:16:05 +01:00</p>
<p><strong>Summary:</strong>&nbsp;<p>In my previous article we learnt the important optimisation
process, the steps that ensure any optimisation really is
worthwhile. Being eager code monkeys, I can see that you're all
desperate for practical code optimisation techniques.</p>
<p>So here they are - in this final part we'll investigate specific
code techniques for optimisation. Just don't tell anyone that I
showed you. To redress the theological balance, we'll also see how
to avoid optimisation in the first place.</p>
</p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e22" id="d0e22"></a></h2>
</div>
<div class="blockquote">
<blockquote class="blockquote">
<p><img src="/var/uploads/journals/resources/goodliffe-prof.png" align=
"right">Technological progress has merely provided us with more
efficient means for going backwards. (Aldous Huxley)</p>
</blockquote>
</div>
<p>In my previous article we learnt the important optimisation
process, the steps that ensure any optimisation really is
worthwhile. Being eager code monkeys, I can see that you're all
desperate for practical code optimisation techniques.</p>
<p>So here they are - in this final part we'll investigate specific
code techniques for optimisation. Just don't tell anyone that I
showed you. To redress the theological balance, we'll also see how
to avoid optimisation in the first place.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e32" id="d0e32"></a>Optimisation
techniques</h2>
</div>
<p>OK, we've avoided it for long enough - now it's time to look at
some of the really gory optimisation details. You've proved that
your program performs badly, and have found the worst code culprit.
Now you need to whip it into shape. What can you do?</p>
<p>There's a palette of optimisations to choose from. Which is the
most appropriate will depend on: the exact cause of the problem,
what you're trying to achieve (e.g. increase execution speed, or
reduce code size), and how much of a performance improvement is
required. We'll look at some examples of specific optimisations,
but in any given situation you probably won't have much of a choice
- the best modification will be obvious.</p>
<p>I've grouped these optimisations into two broad categories:
<span class="emphasis"><em>design</em></span> changes and
<span class="emphasis"><em>code</em></span> changes. Generally, a
change at the design level will have a far more profound effect on
performance than a code level tweak. An inefficient design can
strangle quality more than a few dodgy lines of source code, so a
design fix - whilst harder - will have a bigger payoff.</p>
<p>Most often, our goal is to increase execution speed. The
speed-based optimisation strategies are to:</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>speed up slow things,</p>
</li>
<li>
<p>do slow things less often, or</p>
</li>
<li>
<p>defer slow things until you really need them.</p>
</li>
</ul>
</div>
<p>The other common optimisation goals are to reduce memory
consumption (mainly by changing the data representation, or by
reducing the amount of data accessed at once), or to reduce
executable size (by removing functionality, or by exploiting
commonality). As we'll see, these goals often conflict - most speed
increases come at the expense of memory consumption, and vice
versa.</p>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e61" id="d0e61"></a>Design
changes</h3>
</div>
<p>These are the <span class="emphasis"><em>macro</em></span>
optimisations, the fixes on a large scale. Here we're looking at
the internal design of a module. This doesn't address the highest
level of performance degradation - <span class=
"emphasis"><em>architectural</em></span> deficiencies will have the
most serious impact on the performance of a system. Unfortunately
these are unlikely to be fixable, since so much code depends on
them. Hardware bottlenecks are probably the worst - I worked on an
embedded product that was eventually abandoned because the memory
subsystem was fundamentally incapable of delivering acceptable
performance.</p>
<p>Performance-sapping bad design is also hard (or uneconomical) to
fix for similar reasons. We end up papering over the cracks,
employing small code level fixes instead. The nearer a project is
to a release deadline, the less likely you are to perform design
changes; the risk is too great<sup>[<a name="d0e74" href=
"#ftn.d0e74" id="d0e74">1</a>]</sup>. When brave enough, the kinds
of design optimisation we can perform include:</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p><span class="bold"><b>Remove functionality</b></span> - the
quickest code is code that doesn't run at all. A function will be
slow if it is doing too many things, some of which are unnecessary.
Cut out the superfluous stuff. Move it elsewhere in the program.
Defer all work until it's really necessary.</p>
</li>
<li>
<p><span class="bold"><b>Exploit parallelisation.</b></span> Use
threading to prevent one action being serialised after another.</p>
</li>
<li>
<p><span class="bold"><b>Avoid or remove excessive
locking.</b></span> It inhibits concurrency, generates overhead and
often leads to deadlock. Employ static checking to prove which
locks are necessary and which aren't.</p>
</li>
<li>
<p><span class="bold"><b>Compromise design quality to gain
speed.</b></span> For example: reduce indirection and increasing
coupling. You can do this by breaking encapsulation, leaking a
class' private data through it's public interface. Knocking down
module barriers will cause irreparable damage to the design. If
possible, try a less disruptive optimisation mechanism first.</p>
</li>
<li>
<p><span class="bold"><b>Change the data storage format</b></span>,
or its on-disk representation to something more suited to high
speed operation. For example: speed up file parsing by using a
binary format. Transmit or store compressed files to reduce network
bandwidth.</p>
</li>
<li>
<p><span class="bold"><b>Add layers of caching or
buffering</b></span>, to enhance slow data access or prevent
lengthy recalculations. Precompute values that you know will be
needed, and store them for immediate access.</p>
</li>
<li>
<p><span class="bold"><b>Create a pool of resource</b></span> to
reduce the overhead of allocating objects. For example: preallocate
memory, or hold a selection of files open rather than repeatedly
opening then closing them. This technique is particularly used to
speed up memory allocation; older OS memory allocation routines
were designed for simple non-threaded use. Their locks stall
multithreaded applications, leading to abysmal performance.</p>
</li>
<li>
<p><span class="bold"><b>Sacrifice accuracy for speed</b></span> if
you can get away with it. Dropping floating point precision is the
obvious example. Many devices have no FPU (Floating Point Unit
hardware), and instead employ slower software FPU emulation. You
can switch to fixed point arithmetic libraries to bypass a slow
emulator, at the expense of numeric resolution. This is
particularly easy in C++, by taking advantage of its abstract data
type facilities.</p>
<p>Accuracy is not solely due to your choice of data types; this
tactic can run far deeper, to your use of algorithms or the quality
of your output. Perhaps you can let the user make this decision -
allow them to select <span class="emphasis"><em>slow but accurate
or fast but approximate</em></span> operation modes.</p>
</li>
<li>
<p><span class="bold"><b>Avoid overuse of exceptions</b></span>.
They can inhibit compiler optimisations<sup>[<a name="d0e129" href=
"#ftn.d0e129" id="d0e129">2</a>]</sup>, and when used too
frequently will hamper timely operation.</p>
</li>
<li>
<p><span class="bold"><b>Forgo certain language
facilities</b></span> if it will save code space. Some C++
compilers allow you to disable RTTI and exceptions, with a
consequent saving in executable size.</p>
</li>
</ul>
</div>
<p>The major design level optimisations involve improvements in
<span class="emphasis"><em>algorithms</em></span> or <span class=
"emphasis"><em>data structures</em></span>. Most speed degradation
or memory consumption is down to a bad choice of one or both, and a
subsequent change will rectify this.</p>
<p>Algorithms have a profound impact on the speed of execution. A
function that works acceptably in a small local test may not scale
up when Real World data gets thrown at it. If profiling shows that
your code spends most of its time running a certain algorithm, you
must make it run faster. One approach is at the code level,
chipping small improvements from each instruction. A better
approach is to replace the entire algorithm with a more efficient
version.</p>
<p>Consider this realistic example. A particular algorithm runs a
loop 1000 times. Each iteration takes 5 milliseconds to execute.
The operation therefore completes in around 5 seconds. By tweaking
the code inside the loop, you can shave 1 millisecond from each
iteration - that's a saving of 1 second. But instead, you can plug
in a different algorithm, where an iteration takes 7 milliseconds,
although it only iterates 100 times. That's a saving of almost 4.5
seconds - significantly better.</p>
<p>For this reason, prefer to look at optimisations that change
fundamental algorithms, not that tweak specific lines of code.
There are many algorithms to chose from in the computer science
world, and unless your code is particularly dire you'll always gain
the most significant performance improvements by selecting a better
algorithm.</p>
<p>Data structures are intimately related to your choice of
algorithms; some algorithms require certain data structures, and
vice versa. If your program is consuming far too much memory then
changing the data storage format may improve matters, although
often at the expense of execution speed. If you need to search a
list of 1000 items quickly, don't store them in a linear array with
<span class="emphasis"><em>O(n)</em></span> search time; use a
binary tree with <span class="emphasis"><em>O(log n)</em></span>
performance.</p>
<div class="sidebar">
<p class="title c3">Complexity Notation</p>
<p><span class="emphasis"><em>Algorithmic complexity</em></span> is
a measure of how well an algorithm scales - how long it takes in
proportion to the size of input. It's a <span class=
"emphasis"><em>qualitative</em></span> mathematical model, allowing
you to quickly compare the performance characteristics of different
implementation approaches. It doesn't measure exact execution time
(this is highly dependent on CPU speed, OS configuration, etc).</p>
<p>Complexity is determined by the amount of work an algorithm must
perform: the number of basic operations it executes. A 'basic
operation' is something like: an arithmetic operation, an
assignment, a test, or a data read/write. Algorithmic complexity
doesn't count the exact number of operations performed, just how
this value relates to the problem size. We are usually interested
in the <span class="emphasis"><em>worst case</em></span>
performance of an algorithm, the most work that will ever need to
be done. A good comparison looks at the <span class=
"emphasis"><em>best case</em></span> and <span class=
"emphasis"><em>average</em></span> time complexity as well.</p>
<p>Algorithmic complexity is expressed using <span class=
"emphasis"><em>Big O</em></span> notation, invented by the German
number theorist Edmund Landau. For a problem with input size
<span class="emphasis"><em>n</em></span>, it might have a
complexity of:</p>
<div class="variablelist">
<dl>
<dt><span class="term"><span class="emphasis"><em>O(1): Order
1</em></span></span></dt>
<dd>
<p>This is a <span class="emphasis"><em>constant time</em></span>
algorithm. No matter how large the input set, it always takes the
same amount of time to complete the task. This is the best
performance characteristic possible.</p>
</dd>
<dt><span class="term"><span class="emphasis"><em>O(n): Order
n</em></span></span></dt>
<dd>
<p>A <span class="emphasis"><em>linear time</em></span> algorithm's
complexity rises in line with the input size. Searching a linked
list will involve visiting more nodes as the list size grows; the
number of operations is directly related to the size of the
list.</p>
</dd>
<dt><span class="term"><span class="emphasis"><em>O(n<sup>2</sup>):
Order n squared</em></span></span></dt>
<dd>
<p>This is where performance really begins to get bad: complexity
is increasing faster than the rate of input growth. A <span class=
"emphasis"><em>quadratic time</em></span> algorithm may seem fine
when you give it a small set of data, but large data sets take a
seriously long time. The bubblesort algorithm is <span class=
"emphasis"><em>O(n<sup>2</sup>)</em></span>.</p>
</dd>
</dl>
</div>
<p>Of course, complexity may be of any order; the quicksort
algorithm averages <span class="emphasis"><em>O(n log
n)</em></span>. This is worse than <span class=
"emphasis"><em>O(n)</em></span>, but far better than <span class=
"emphasis"><em>O(n<sup>2</sup>)</em></span>. A simple optimisation
route for a slow bubblesort algorithm is to replace it with a
quicksort algorithm, especially since there are plenty of freely
available quicksort implementations.</p>
<p>These big O expressions don't include constants or low-order
terms. You'll never see any talk about a complexity of <span class=
"emphasis"><em>O(2n+6)</em></span>. When <span class=
"emphasis"><em>n</em></span> gets large enough, these constants and
low-order terms dwarf into insignificance.</p>
</div>
<p>Selecting a different data structure seldom requires you to
implement the new representation yourself. Most languages come with
library support for all common data structures.</p>
</div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e258" id="d0e258"></a>Code
changes</h3>
</div>
<p>And so now we creep anxiously on to the really disgusting stuff:
the <span class="emphasis"><em>micro</em></span> level, small
scale, short sighted, code tweaking optimisations. There are many
ways to molest source code for the sake of performance. You must
experiment to see what works best in each situation - some changes
will work well, others will have little, or even negative effect.
Some may prevent the compiler's optimiser from performing it's
task, producing startlingly worse results.</p>
<p>The first task is easy: turn on compiler optimisation, or
increase the optimisation level. It often gets disabled for
development builds since the optimiser can take a very long time to
run<sup>[<a name="d0e268" href="#ftn.d0e268" id=
"d0e268">3</a>]</sup>, increasing the build time of large projects
by an order of magnitude. Try configuring the optimiser, and test
what affect this has. Many compilers allow you to bias optimisation
towards extra speed or reduced code size.</p>
<p>There are a collection of very low level optimisations that you
should know about, but generally avoid. These are the kind of
changes that a compiler is able to perform for you; if you've
switched the optimiser on, it'll be looking in these areas. When
applied by hand, they tend to butcher your code's readability the
most, since they warp the fundamental logic out of shape. Only
consider using one of these optimisations if you can <span class=
"emphasis"><em>prove</em></span> it's really required, and that
there are no alternatives:</p>
<div class="variablelist">
<dl>
<dt><span class="term">Loop unrolling</span></dt>
<dd>
<p>For loops with very short bodies, the loop scaffolding may be
more expensive than the looped operation itself. Remove this
overhead by flattening it out - turn your 10 iteration loop into 10
consecutive individual statements.</p>
<p>Loop unrolling can be done partially; this makes more sense for
large loops. You can insert four operations per iteration, and
increment the loop counter by four each time. This tactic gets
nasty if the loop doesn't always iterate over a whole number of
unrolls.</p>
</dd>
<dt><span class="term">Code inlining</span></dt>
<dd>
<p>For small operations, the overhead of calling a function might
be prohibitive. Splitting code into functions brings significant
benefits: clearer code, consistency through reuse, and the ability
to isolate areas of change, yet it can be removed to increase
performance. This process merges the caller(s) and the callee.</p>
<p>There are a number of ways to do this. With language support,
you can request it in the source code (in C/C++ using the
<tt class="literal">inline</tt> keyword); this method preserves a
lot of the code's readability. Otherwise you have to merge the code
yourself, either by duplicating the function over and over again,
or using a preprocessor macro to do the work for you.</p>
<p>It's hard to inline recursive function calls - how would you
know when to stop inlining? Try to find alternative algorithms to
replace recursion.</p>
<p>Often inlining opens the way for more code level optimisations
to be performed, that were not previously possible across a
function boundary.</p>
</dd>
<dt><span class="term">Constant folding</span></dt>
<dd>
<p>Calculations involving constant values can be computed at
compile time, to reduce the amount of work done at run time. The
simple expression <tt class="literal">return 6+4;</tt> can be
reduced to <tt class="literal">return 10;</tt>. Carefully ordering
the terms of a large calculation might bring two constants together
enabling them to be reduced into a simpler subexpression.</p>
<p>It's unusual for a programmer to write something as obvious as
<tt class="literal">return 6+4;</tt> However these sorts of
expression are common after macro expansion.</p>
</dd>
<dt><span class="term">Move to compile time</span></dt>
<dd>
<p>There is more you can do at compile time than just constant
folding. Many conditional tests can be proved statically, and
removed from the code. Some kinds of test can be avoided
altogether; for example, remove tests for negative numbers by using
unsigned data types.</p>
</dd>
<dt><span class="term">Strength reduction</span></dt>
<dd>
<p>This is the act of replacing one operation with an equivalent
that executes faster. This is most important on CPUs with poor
arithmetic support. For example: replace integer
multiplication/division with constant shifts or adds; <tt class=
"literal">x/4</tt> can be converted to <tt class=
"literal">x&lt;&lt;2</tt> if it's faster on your processor.</p>
</dd>
<dt><span class="term">Subexpressions</span></dt>
<dd>
<p><span class="emphasis"><em>Common subexpression
elimination</em></span> avoids the recalculation of expressions
whose values have not changed. In code like this:</p>
<pre class="programlisting">
int first = (a * b) + 10;
int second = (a * b) / c;
</pre>
<p>The expression (a * b) is evaluated twice. Once is enough. You
can factor out the common subexpression, and replace it with:</p>
<pre class="programlisting">
int temp = a * b;
int first = temp + 10;
int second = temp / c;
</pre></dd>
<dt><span class="term">Dead code elimination</span></dt>
<dd>
<p>Don't write needless code; prune anything that's not strictly
necessary to the program. Static analysis will show you the
functions that are never used, or the sections of code that will
never execute. Remove them.</p>
</dd>
</dl>
</div>
<p>Whilst the above are particularly distasteful code
optimisations, the following ones are slightly more socially
acceptable. They focus on increasing program execution speed.</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>If you find that you're repeatedly calling a slow function,
don't call it so often. Cache its result and reuse this value. This
might lead to less clear code, but it will run faster.</p>
</li>
<li>
<p>Reimplement the function in another language. For example:
rewrite a critical Java function in C using the JNI (<span class=
"emphasis"><em>Java Native Interface</em></span>) facility.
Conventional compilers still beat JIT code interpreters for
execution speed.</p>
<p>Don't naively assume that one language is 'faster' than another
- many programmers have been surprised how little difference using
JNI makes. It's been commonly claimed that OO languages are far
slower than their procedural counterparts. This is a lie. Bad OO
code can be slow, but so can bad procedural code. If you write 'OO'
style code in C it's likely to be slower than good C++; the C++
compiler will generate better tuned method dispatch code than your
attempts.</p>
</li>
<li>
<p>Reorder the code for improved performance.</p>
</li>
<li>
<p>Defer work until it's absolutely necessary. Don't open a file
until you're about to use it. Don't calculate a value if you might
not need it, wait until it's wanted. Don't call a function yet if
the code will work without it.</p>
</li>
<li>
<p>Hoist checking further up the function. After having done a lot
of work, it's a waste to perform a test that might lead to early
return. Make the check sooner to avoid all the previous work.</p>
</li>
<li>
<p>Move invariant calculations out of a loop. The most subtle
source of this problem is a loop condition. If you write:
<tt class="literal">for(int n = 0; n &lt; tree.numApples();
++n)</tt>, yet <tt class="function">numApples()</tt> manually
counts one thousand items on every call, you'll have a very slow
loop. Move the count operation before the loop:</p>
<pre class="programlisting">
int numApples = tree.numApples();
for (int n = 0; n &lt; numApples; ++n) {
... something ...
}
</pre></li>
<li>
<p>Use <span class="emphasis"><em>lookup tables</em></span> for
complex calculations; trading time for space. For example: rather
than write a set of trigonometric functions that individually
calculate their values, precalculate the return values and store
them in an array. Map input values to the closest index into this
array.</p>
</li>
<li>
<p>Exploit <span class="emphasis"><em>short circuit
evaluation</em></span>. Make sure that the tests likely to fail are
placed first to save time. If you write a conditional expression:
<tt class="literal">if (condition_one &amp;&amp;
condition_two)</tt> make sure that <tt class=
"literal">condition_one</tt> is statistically more likely to fail
than <tt class="literal">condition_two</tt>.</p>
</li>
<li>
<p>Don't reinvent the wheel - reuse standard routines that have
already been performance tuned. Library writers will have already
honed their code carefully. But be aware that a library may have
been optimised for different goals than yours; perhaps an embedded
product was profiled for memory consumption, not for speed.</p>
</li>
</ul>
</div>
<p>Size-focused code level optimisations include:</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>Produce compressed executables, that unpack their code before
running. This doesn't necessarily affect the size of the running
program, but it reduces the storage space required<sup>[<a name=
"d0e420" href="#ftn.d0e420" id="d0e420">4</a>]</sup>. This might be
important if your program is stored in limited flash memory.</p>
</li>
<li>
<p>Factor common code into a shared function.</p>
</li>
<li>
<p>Move seldom used functions out of the way. Put them into a
dynamically loaded library, or into a separate program.</p>
</li>
</ul>
</div>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e430" id="d0e430"></a>Writing
efficient code</h2>
</div>
<p>If the best approach is to not optimise, how can we avoid any
need to improve code performance? The answer is to <span class=
"emphasis"><em>design for performance</em></span>, planning to
provide adequate quality of service from the outset, rather than
trying to whittle it out at the last minute.</p>
<p>Some will argue that this is a dangerous road to follow. Indeed,
there are potential hazards for the unwary. If you try to optimise
as you go along, you'll write at a lower level than needed; you'll
end up with nasty, hacky code full of low-level 'performance'
enhancements and special 'back door' exploits.</p>
<p>How do we reconcile these seemingly opposing views? It isn't
hard, because they're not actually at odds. They are two
complementary strategies:</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>write efficient code, and</p>
</li>
<li>
<p>optimise code later.</p>
</li>
</ul>
</div>
<p>If you make a point of writing clear, good, efficient code now
you will not need to perform heavy optimisations later. Some claim
that you don't know whether any optimisation is necessary at first,
so you should write everything <span class="emphasis"><em>as simply
as possible</em></span>, and only optimise when profiling proves
that there is a bottleneck.</p>
<div class="sidebar">
<p class="title c3">Pessimisations</p>
<p>Without careful measurement you can easily end up writing
'optimisations' that are not at all optimal. A perfectly good
optimisation for one situation might turn out to be a performance
disaster in another. Here's a little case study. Exhibit A: The
<span class="emphasis"><em>copy on write</em></span> string
optimisation.</p>
<p>This was a common optimisation applied to C++ standard library
implementations. Programs which performed intensive string
manipulation experienced a massive overhead when copying long
strings, both in terms of execution speed and memory consumption.
Copying large strings means duplicating and shoveling around large
quantities of data. Many string copies are automatically generated,
temporary objects that are created and then thrown away shortly
after - never actually modified. The expensive copy operation is an
unnecessary cost.</p>
<p>The copy on write (COW) optimisation turns the string data type
into a form of <span class="emphasis"><em>smart
pointer</em></span>; the actual string data is held in a (hidden)
shared representation. The string copy operation now only has to
perform an inexpensive 'smart pointer' copy (attaching a new smart
pointer to the shared representation), rather than duplicate the
entire string contents. Only when you make a modification to a
shared string is the internal representation copied, and the smart
pointer remapped. This optimisation avoids a large number of
unnecessary copy operations.</p>
<p>COW worked well in single threaded programs; it was shown to
speed up performance greatly. However, a problem became apparent
when multithreaded programs used COW strings. (Indeed, this problem
also manifests in single threaded programs, if the COW string class
is built with multithreading support). Many implementations
performed very conservative thread locking around the copy
operations - these locks become a <span class=
"emphasis"><em>major</em></span> bottleneck. Suddenly a lightning
fast program slowed down to a crawl. The COW optimisation proved to
be a serious pessimisation.</p>
<p>Far better multithreaded performance was achieved by reverting
to 'classic' string implementations, and writing more careful code
that reduced automatic string copying. Thankfully, library
implementors now provide more intelligent versions of the string
class, which are both thread safe and fast.</p>
</div>
<p>This approach has obvious flaws. If you know that you need a
list type with good search performance (because your program must
perform fast searches) pick a binary tree over an array. If you're
not aware of any such requirement, <span class=
"emphasis"><em>then</em></span> go for the most appropriate thing
that will work. This still might not be the simplest - an array is
a hard data structure to manage.</p>
<p>As you design each module, don't chase performance blindly -
only spend effort on it when necessary. Understand the mandated
performance requirements and at each stage justify how your choices
will meet these requirements. When you know what level of
performance is required, it's easier to design in appropriate
efficiency. It also helps you to write explicit tests to prove that
you do achieve these performance goals.</p>
<p>Some simple design choices that will increase efficiency and aid
later optimisation are:</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>Minimise your reliance on functions that might be implemented on
remote machines, or that will access the network or a slow data
storage system.</p>
</li>
<li>
<p>Understand the target deployment and how the program is expected
to be run, so you can design it to work well in these
situations.</p>
</li>
<li>
<p>Write <span class="emphasis"><em>modular</em></span> code, so
it's easy to speed up one section without having to rewrite other
sections too.</p>
</li>
</ul>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e498" id=
"d0e498"></a>Conclusion</h2>
</div>
<p>High performance code is not as important as some people think.
Although you sometimes do have to roll your sleeves up and tinker
with code, optimisation is a task you should actively avoid. To do
this, make sure you know the system's performance requirements
before you start working on it. At each level of design, ensure you
provide this quality of service. That way, optimisation will be
unnecessary.</p>
<p>When you optimise, be very methodical and measured in your
approach. Have a clear goal, and prove that each step is getting
you closer to it. Be guided by solid data, not your hunches. As you
write code, ensure that your designs are efficient, but don't
compromise on quality. Worry about codelevel performance only when
it proves to be a problem.</p>
<p>That concludes this three part miniseries on optimisation. I
hope you've found it useful.</p>
</div>
<div class="footnotes"><br>
<hr class="c4" width="100">
<div class="footnote">
<p><sup>[<a name="ftn.d0e74" href="#d0e74" id=
"ftn.d0e74">1</a>]</sup> Sadly, it's often only near project
deadlines that anyone notices performance isn't good enough.</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e129" href="#d0e129" id=
"ftn.d0e129">2</a>]</sup> <tt class="literal">try</tt>/<tt class=
"literal">catch</tt> blocks act, like functions, as a barrier to an
optimiser. It's not possible to look through the barrier to perform
optimisation, so some potential speed ups will be lost.</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e268" href="#d0e268" id=
"ftn.d0e268">3</a>]</sup> It has to do complex inspection of the
parsed code to determine the set of possible speed ups, and select
the most appropriate ones.</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e420" href="#d0e420" id=
"ftn.d0e420">4</a>]</sup> It may have the pleasant side-effect of
decreasing program start-up time; a compressed executable will load
from disk much faster.</p>
</div>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
