    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: More is Less</title>
        <link>https://members.accu.org/index.php/journals/318</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">Overload Journal #59 - Feb 2004 + Design of applications and programs</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c78/">Overload</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c153/">59</a>
                    (7)
<br />

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c67/">Design</a>
                    (236)
<br />

                                            <a href="https://members.accu.org/index.php/journals/c153-67/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/journals/c153+67/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;More is Less</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 02 February 2004 21:55:35 +00:00 or Mon, 02 February 2004 21:55:35 +00:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e16" id="d0e16"></a></h2>
</div>
<p>When it comes to optimisation the established wisdom amongst
&quot;serious&quot; programmers these days seems to be:</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>Don't do it</p>
</li>
<li>
<p>(For experts only) Don't do it yet [<a href=
"#Jackson">Jackson</a>].</p>
</li>
</ol>
</div>
<p>Fortunately established good practice tends to avoid
&quot;pessimisation&quot;. Sometimes, however, it is necessary to go one step
further, and actually perform some optimisations, based of course
on careful measurements. In this article I will show two methods
for streamlining object creation semantics, preventing the compiler
from generating code that is not needed, starting with established
good practice, and going on to a slightly unconventional technique
that can be used when dealing with less conventional situations. I
will also be describing an effective technique for optimising
routines where value types are created on the stack as temporary
storage during repeated calculations.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e32" id="d0e32"></a>Using
initialiser lists</h2>
</div>
<p>Construction should always result in an object that is in a
valid state - i.e. is initialised. Use of types that require
explicit (multi stage) initialisation is more error prone than ones
that do not, and should therefore be avoided. It is also generally
advisable to provide a default constructor for value types as value
types should generally be usable within a system in the same way as
you might use built in types such as <tt class=
"computeroutput">int</tt>, <tt class="computeroutput">float</tt>,
and <tt class="computeroutput">char</tt>.</p>
<p>The best way to initialise / construct an object in C++ is via
the initialiser list syntax [<a href="#Meyers">Meyers</a>]. From
the perspective of high performance computing, not using the
initialiser list syntax can and will result in double intialisation
- more code will be generated and executed, resulting in a larger,
slower executable - potentially more than double for many layers of
aggregate types.</p>
<p>For example, starting from the ground up:</p>
<pre class="programlisting">
struct A {
  inline A() {
    i = 0;
  }
  //...
  int i;
};
</pre>
<p>Does what it looks like it does, because i is a built in type
(<tt class="computeroutput">int</tt>) and built in types do not
have default initialisation.</p>
<pre class="programlisting">
struct B {
  inline B() {
    a.i = 1;
  }
  //...
  A a;
};
</pre>
<p>We now have double intialisation of the integer i, because on
entry to the body of the constructor, all the members and base
classes have already been constructed. Thus the code that is
executed is as follows<sup>[<a name="d0e64" href="#ftn.d0e64" id=
"d0e64">1</a>]</sup>:</p>
<pre class="programlisting">
  B b;
0040102B mov    dword ptr [b],0
00401032 mov    dword ptr [b],1
</pre>
<p>For readers unfamiliar with the instruction set used here, the
instructions above set a location in memory to 0 (zero)
<span class="emphasis"><em>and then set the same location in memory
to 1 (one)</em></span>. By providing a specific constructor for the
type A, and using it in the initialiser list of the constructor B
the unnecessary memory write operation can be avoided:</p>
<pre class="programlisting">
struct A {
  //...
  inline A(int a)
    : i(a) { }
  //...
};
struct B {
  inline B()
    : a(1) { }
  //...
};
</pre>
<p>By specifically initialising the member a we avoid assignment to
it in the body of the constructor and the code generated by the
compiler is as follows:</p>
<pre class="programlisting">
  B b;
0040102B mov    dword ptr [b],1
</pre>
<p>Unfortunately using initialiser lists is not always appropriate
or desirable. There are times when setting the initial state of an
object is non trivial and cannot be achieved within an initialiser
list, or to do so would result in complex and difficult to maintain
code.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e93" id="d0e93"></a>Using the
&quot;uninitialised&quot; constructor</h2>
</div>
<p>The solution to this problem - of wanting to perform
initialisation within the body of the constructor without paying
the runtime cost of default initialisation - relies on the
functionality provided by the members or base classes in the same
way as the initialiser lists shown above. In this case though it is
necessary for the component parts of the class to provide a
mechanism for constructing an instance that is not initialised.
This is where we need to write code that very deliberately does
nothing at all:</p>
<pre class="programlisting">
struct UnInitialised {} uninitialised;<sup>[<a name="d0e100" href=
"#ftn.d0e100" id="d0e100">2</a>]</sup>
struct A {
  //...
  inline A(const UnInitialised&amp;) { }
  //...
};
</pre>
<p>Now any class that uses instances of type A is free to defer
initialisation to the body of the constructor, like so:</p>
<pre class="programlisting">
struct B {
  inline B()
    : a(uninitialised) {
    a.i = 1;
  }
  //...
};
</pre>
<p>This, as expected, generates machine code that is exactly the
same as the version that used the initialiser list directly:</p>
<pre class="programlisting">
  B b;
0040102B mov    dword ptr [b],1
</pre>
<p>This use of function overloading is a technique known as Tags
(or Type Tags), and sometimes Discriminators, and is a degenerate
case of of the traits technique where the trait type has no
associated behaviour and no type parameters [<a href=
"#Matthews">Matthews</a>]. There are many examples of the traits
technique throughout the standard library: <tt class=
"computeroutput">allocator&lt;&gt;</tt>, <tt class=
"computeroutput">char_traits&lt;&gt;</tt>, etc. Most directly
similar to this is the use of <tt class=
"computeroutput">std::nothrow</tt>.</p>
<p>In this case the type of the parameter is selecting a
constructor implementation that does not initialise the object,
allowing the programmer who uses that type to deliberately select a
course of action that would not normally be desirable.</p>
<p>Even as an optimisation technique leaving an object
uninitialised can be counterproductive if doing so may have a
negative impact on the implementation of the rest of the class. I
would recommend that this facility only be provided for value
classes, by which I mean classes that represents a value or set of
values, such as a vector or matrix, and would typically be used in
the same way as any of the built in numerical types (<tt class=
"computeroutput">int</tt>, <tt class="computeroutput">float</tt>,
etc).</p>
<p>As a technique for optimising routines where a value type is
created on the stack as temporary storage during repeated
calculations, this is extremely effective, and can be shown to be
more efficient than using a <tt class="computeroutput">static</tt>
variable. When a function level <tt class=
"computeroutput">static</tt> variable is used the compiler must
generate both the code to construct it, and a test to see if it is
already constructed. For operations executed many thousands or
millions of times in succession the extra memory accesses and code
branches can have a surprisingly negative effect on the performance
of the processor level cache, and the time it takes for the
operations to complete.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e163" id="d0e163"></a>Dealing with
(illegal) unions</h2>
</div>
<p>I was recently given the task of optimising a math library that
used unions to provide a flexible interface. While this was not
standard conforming C++<sup>[<a name="d0e168" href="#ftn.d0e168"
id="d0e168">3</a>]</sup> it worked on the target compilers, with
the exception of the construction behaviour. It was found that the
constructors for each of the members of the union where executed,
causing the memory to be initialised many times over, something
that became a performance issue for the project. By using the
&quot;uninitialised&quot; constructor technique, I was able to get this
unwanted behaviour under control. Some before and after comparisons
are shown in Figure 1 and Figure 2.</p>
<p>Figure 1 shows the source code and compiler output for a
constructor for a 4x3 matrix class that contains an illegal union.
The union contains two structs, one with a 3x3 matrix and a 3D
vector, one with four 3D vectors. The 3x3 matrix is itself made up
of three 3D vectors. In this particular constructor the four
vectors are being initialised on construction via the initialiser
list, but the matrix / vector pair are not explicitly initialised,
and so their default constructors are being called. By specifically
triggering the &quot;uninitialised&quot; constructor for the matrix / vector
pair (shown in the right hand column) the redundant code generation
is suppressed. Note that the disassembly shown in this and the
following example were generated by a compiler configured to
perform full optimization.</p>
<p>In Figure 2, we see an even more dramatic improvement, in the
implementation of a copy constructor. In this example the
initialiser list was not in the first place used at all, and so
before the assignment in the body of the constructor was executed
all the elements of the union are first default initialised. In the
improved version, shown in the right hand column, the copy is
implemented via the rotation (matrix) and translation (vector) copy
constructors, and the unwanted default constructors (from the
second elements of the union) are suppressed via the use of the
uninitialised constructor.</p>
<p>This article is also available at : <a href=
"http://thad.notagoth.org/more_is_less/" target=
"_top">http://thad.notagoth.org/more_is_less/</a></p>
<div class="figure"><a name="d0e180" id="d0e180"></a>
<p class="title c2">Figure 1. </p>
<div class="mediaobject"><img src=
"/var/uploads/journals/resources/More%20is%20Less%20Figure%201.png"></div>
</div>
<div class="figure"><a name="d0e185" id="d0e185"></a>
<p class="title c2">Figure 2. </p>
<div class="mediaobject"><img src=
"/var/uploads/journals/resources/More%20is%20Less%20Figure%202.png"></div>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e190" id="d0e190"></a>With Thanks
To</h2>
</div>
<p>The Overload team (especially Phil Bass for some invaluable
feedback), Tobias Sicheritz, and the accu-general mailing list
(especially Ric Parkin, Alan Stokes, Hubert Matthews, David Nash,
Paul Grenyer, Jon Jagger, &amp; Kevlin Henney) and everyone from
HuSi who gave me feedback on this article.</p>
</div>
<div class="bibliography">
<div class="titlepage">
<h2><a name="d0e195" id="d0e195"></a>References and
Further Reading</h2>
</div>
<div class="bibliomixed"><a name="Jackson" id="Jackson"></a>
<p class="bibliomixed">[Jackson] Michael Jackson, 1975,
<span class="citetitle"><i class="citetitle">Principles of Program
Design</i></span></p>
</div>
<div class="bibliomixed"><a name="Meyers" id="Meyers"></a>
<p class="bibliomixed">[Meyers] Scott Meyers, 1997, <span class=
"citetitle"><i class="citetitle">Effective C++ (Item
12)</i></span></p>
</div>
<div class="bibliomixed"><a name="Matthews" id="Matthews"></a>
<p class="bibliomixed">[Matthews] Hubert Matthews, 2003
(accu-general)</p>
</div>
<div class="bibliomixed">
<p class="bibliomixed">Andrei Alexandrescu, <span class=
"citetitle"><i class="citetitle">Modern C++ Design</i></span></p>
</div>
<div class="bibliomixed">
<p class="bibliomixed">Czarnecki &amp; Eisenecker, <span class=
"citetitle"><i class="citetitle">Generative
Programming</i></span></p>
</div>
<div class="bibliomixed">
<p class="bibliomixed">Jim Coplien, <span class=
"citetitle"><i class="citetitle">Multi-Paradigm Design for
C++</i></span></p>
</div>
</div>
<div class="footnotes"><br>
<hr class="c3" width="100">
<div class="footnote">
<p><sup>[<a name="ftn.d0e64" href="#d0e64" id=
"ftn.d0e64">1</a>]</sup> The compiler output examples shown in the
first half of the article were compiled using MS VC++ .NET,
targeting Intel/Win32, using a debug build, with inlining turned
on. For an example that trivial an optimised build will eliminate
the example class completely. In testing GCC exhibited equivalent
code generation.</p>
<p>The compiler output shown in Figures 1 and 2 were compiled by MS
VC++ .NET 2003 targeting XBOX (with SSE instruction set disabled).
A MIPS version was also compiled using CodeWarrior for PS2 R3.5,
and was also shown to have equivalent code generation.</p>
<p>Note also that it is important that the <span class=
"emphasis"><em>uninitialised</em></span> constructors be inline,
and that the <tt class="computeroutput">inline</tt> keyword is not
being ignored by the compiler, otherwise any gains seen in not
initialising the member twice could be minimal compared to the
overhead of a function call.</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e100" href="#d0e100" id=
"ftn.d0e100">2</a>]</sup> An alternative to using an instance of an
empty struct is to use an <tt class="computeroutput">enum</tt>:</p>
<pre class="literallayout">
enum UnInitialised {
  uninitialised
};
</pre>
<p>Both techniques have advantages and disadvantages. Where an
empty <tt class="computeroutput">struct</tt> or <tt class=
"computeroutput">class</tt> is used an instance of that type must
also be provided in at least one translation unit. Where an
<tt class="computeroutput">enum</tt> is used one must be aware of
its silent conversion to an <tt class="computeroutput">int</tt>,
and any risks this may introduce.</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.d0e168" href="#d0e168" id=
"ftn.d0e168">3</a>]</sup> A union containing non trivial types -
including any type with a constructor - is illformed, and creating
an instance of such a union leads to undefined behaviour. C++
Standard, 9.5/1.</p>
</div>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
