When it comes to optimisation the established wisdom amongst "serious" programmers these days seems to be:
-
Don't do it
-
(For experts only) Don't do it yet [Jackson].
Fortunately established good practice tends to avoid "pessimisation". Sometimes, however, it is necessary to go one step further, and actually perform some optimisations, based of course on careful measurements. In this article I will show two methods for streamlining object creation semantics, preventing the compiler from generating code that is not needed, starting with established good practice, and going on to a slightly unconventional technique that can be used when dealing with less conventional situations. I will also be describing an effective technique for optimising routines where value types are created on the stack as temporary storage during repeated calculations.
Construction should always result in an object that is in a valid state - i.e. is initialised. Use of types that require explicit (multi stage) initialisation is more error prone than ones that do not, and should therefore be avoided. It is also generally advisable to provide a default constructor for value types as value types should generally be usable within a system in the same way as you might use built in types such as int, float, and char.
The best way to initialise / construct an object in C++ is via the initialiser list syntax [Meyers]. From the perspective of high performance computing, not using the initialiser list syntax can and will result in double intialisation - more code will be generated and executed, resulting in a larger, slower executable - potentially more than double for many layers of aggregate types.
For example, starting from the ground up:
struct A { inline A() { i = 0; } //... int i; };
Does what it looks like it does, because i is a built in type (int) and built in types do not have default initialisation.
struct B { inline B() { a.i = 1; } //... A a; };
We now have double intialisation of the integer i, because on entry to the body of the constructor, all the members and base classes have already been constructed. Thus the code that is executed is as follows[1]:
B b; 0040102B mov dword ptr [b],0 00401032 mov dword ptr [b],1
For readers unfamiliar with the instruction set used here, the instructions above set a location in memory to 0 (zero) and then set the same location in memory to 1 (one). By providing a specific constructor for the type A, and using it in the initialiser list of the constructor B the unnecessary memory write operation can be avoided:
struct A { //... inline A(int a) : i(a) { } //... }; struct B { inline B() : a(1) { } //... };
By specifically initialising the member a we avoid assignment to it in the body of the constructor and the code generated by the compiler is as follows:
B b; 0040102B mov dword ptr [b],1
Unfortunately using initialiser lists is not always appropriate or desirable. There are times when setting the initial state of an object is non trivial and cannot be achieved within an initialiser list, or to do so would result in complex and difficult to maintain code.
The solution to this problem - of wanting to perform initialisation within the body of the constructor without paying the runtime cost of default initialisation - relies on the functionality provided by the members or base classes in the same way as the initialiser lists shown above. In this case though it is necessary for the component parts of the class to provide a mechanism for constructing an instance that is not initialised. This is where we need to write code that very deliberately does nothing at all:
struct UnInitialised {} uninitialised;[2] struct A { //... inline A(const UnInitialised&) { } //... };
Now any class that uses instances of type A is free to defer initialisation to the body of the constructor, like so:
struct B { inline B() : a(uninitialised) { a.i = 1; } //... };
This, as expected, generates machine code that is exactly the same as the version that used the initialiser list directly:
B b; 0040102B mov dword ptr [b],1
This use of function overloading is a technique known as Tags (or Type Tags), and sometimes Discriminators, and is a degenerate case of of the traits technique where the trait type has no associated behaviour and no type parameters [Matthews]. There are many examples of the traits technique throughout the standard library: allocator<>, char_traits<>, etc. Most directly similar to this is the use of std::nothrow.
In this case the type of the parameter is selecting a constructor implementation that does not initialise the object, allowing the programmer who uses that type to deliberately select a course of action that would not normally be desirable.
Even as an optimisation technique leaving an object uninitialised can be counterproductive if doing so may have a negative impact on the implementation of the rest of the class. I would recommend that this facility only be provided for value classes, by which I mean classes that represents a value or set of values, such as a vector or matrix, and would typically be used in the same way as any of the built in numerical types (int, float, etc).
As a technique for optimising routines where a value type is created on the stack as temporary storage during repeated calculations, this is extremely effective, and can be shown to be more efficient than using a static variable. When a function level static variable is used the compiler must generate both the code to construct it, and a test to see if it is already constructed. For operations executed many thousands or millions of times in succession the extra memory accesses and code branches can have a surprisingly negative effect on the performance of the processor level cache, and the time it takes for the operations to complete.
I was recently given the task of optimising a math library that used unions to provide a flexible interface. While this was not standard conforming C++[3] it worked on the target compilers, with the exception of the construction behaviour. It was found that the constructors for each of the members of the union where executed, causing the memory to be initialised many times over, something that became a performance issue for the project. By using the "uninitialised" constructor technique, I was able to get this unwanted behaviour under control. Some before and after comparisons are shown in Figure 1 and Figure 2.
Figure 1 shows the source code and compiler output for a constructor for a 4x3 matrix class that contains an illegal union. The union contains two structs, one with a 3x3 matrix and a 3D vector, one with four 3D vectors. The 3x3 matrix is itself made up of three 3D vectors. In this particular constructor the four vectors are being initialised on construction via the initialiser list, but the matrix / vector pair are not explicitly initialised, and so their default constructors are being called. By specifically triggering the "uninitialised" constructor for the matrix / vector pair (shown in the right hand column) the redundant code generation is suppressed. Note that the disassembly shown in this and the following example were generated by a compiler configured to perform full optimization.
In Figure 2, we see an even more dramatic improvement, in the implementation of a copy constructor. In this example the initialiser list was not in the first place used at all, and so before the assignment in the body of the constructor was executed all the elements of the union are first default initialised. In the improved version, shown in the right hand column, the copy is implemented via the rotation (matrix) and translation (vector) copy constructors, and the unwanted default constructors (from the second elements of the union) are suppressed via the use of the uninitialised constructor.
This article is also available at : http://thad.notagoth.org/more_is_less/
The Overload team (especially Phil Bass for some invaluable feedback), Tobias Sicheritz, and the accu-general mailing list (especially Ric Parkin, Alan Stokes, Hubert Matthews, David Nash, Paul Grenyer, Jon Jagger, & Kevlin Henney) and everyone from HuSi who gave me feedback on this article.
Andrei Alexandrescu, Modern C++ Design
Czarnecki & Eisenecker, Generative Programming
Jim Coplien, Multi-Paradigm Design for C++
[1] The compiler output examples shown in the first half of the article were compiled using MS VC++ .NET, targeting Intel/Win32, using a debug build, with inlining turned on. For an example that trivial an optimised build will eliminate the example class completely. In testing GCC exhibited equivalent code generation.
The compiler output shown in Figures 1 and 2 were compiled by MS VC++ .NET 2003 targeting XBOX (with SSE instruction set disabled). A MIPS version was also compiled using CodeWarrior for PS2 R3.5, and was also shown to have equivalent code generation.
Note also that it is important that the uninitialised constructors be inline, and that the inline keyword is not being ignored by the compiler, otherwise any gains seen in not initialising the member twice could be minimal compared to the overhead of a function call.
[2] An alternative to using an instance of an empty struct is to use an enum:
enum UnInitialised { uninitialised };
Both techniques have advantages and disadvantages. Where an empty struct or class is used an instance of that type must also be provided in at least one translation unit. Where an enum is used one must be aware of its silent conversion to an int, and any risks this may introduce.
[3] A union containing non trivial types - including any type with a constructor - is illformed, and creating an instance of such a union leads to undefined behaviour. C++ Standard, 9.5/1.
Overload Journal #59 - Feb 2004 + Design of applications and programs
Browse in : |
All
> Journals
> Overload
> 59
(7)
All > Topics > Design (236) Any of these categories - All of these categories |