# Complex Logic in the Member Initialiser List

Overload Journal #112 - December 2012 + Programming Topics   Author: Cassio Neri
The syntactic form of the member initialiser list restricts the logic that it contains. Cassio Neri presents some techniques to overcome these constraints.

In C++, during a constructor call, before execution gets into its body all subobjects – base classes and non-static data members – of the class are initialised. (In C++11, this rule has an exception which we shall exploit later.) The member initialiser list (MIL) lets the programmer customise this initialisation. A subobject is initialised from a parenthesised1 list of expressions that follows its identifier in the MIL. The MIL of `bar`’s constructor is emphasised in Listing 1.

 ```class base { ... public: base(double b); }; class foo { ... public: foo(double f1, double f2); }; class bar : public base { const double x_, y_; foo& r_; foo f_; double d_; ... public: bar(double d, foo& r1, foo& r2); }; bar::bar(double d, foo& r1, foo& r2) : base(d * d), x_(cos(d * d)), y_(sin(d * d)), r_(d > 0.0 ? r1 : r2), f_(exp(d), -exp(d)) { d_ = d; } ``` Listing 1

Most often the MIL forwards the arguments to the subobject initialisers. In contrast, `bar` constructor’s MIL firstly performs computations with the arguments and then passes the results through. The operations here are still fairly simple to fit in full expressions but had they been more complex (e.g. with branches and loops) the syntactic form of the MIL would be an obstacle.

This article presents some techniques that allow more complex logic in the MIL. It’s not advocating complexity in the MIL, it only shows some ways to achieve this if you have to.

Before looking at these methods, we consider the possibility of avoiding the MIL altogether.

## Avoiding the MIL

Notice that `d_` isn’t initialised in the MIL. In this case, the compiler implicitly initialises2 `d_` and then we assign it to `d` in the constructor’s body. Could we do the same for the other subobjects? Not always. Assume that `foo` doesn’t have an accessible default constructor. Then, the compiler can’t implicitly initialise `f_` and yields an error. We simply don’t have a choice and must initialise `f_` in the MIL. In addition to subobjects of types without an accessible default constructor, reference members (e.g. `r_`) and `const` members of non class type (e.g. `x_` and `y_`) must be explicitly initialised otherwise the compiler complains. Although not enforced by the language, we can add to this list subobjects of immutable types – types with no non-`const` methods apart from constructors and a destructor.

It’s possible for some subobjects to be default initialised first and then changed in the constructor’s body. Nevertheless this two-step set up process might be wasteful. Actually, this argument is the most common stated reason to prefer initialisation in the MIL to assignment in constructor [Meyers05, §4]. For fundamental types, however, there’s no penalty because default initialisation does nothing and costs nothing.

## Initialiser functions

The first idea for complex initialisation is very simple and consists of writing an initialiser function that delivers the final result to direct initialise a subobject. Listing 2 shows this technique applied to our example.

 ```double init_x(double d) { const double b = d * d; const double x = cos(b); return x; } bar::bar(double d, foo& r1, foo& r2) : ... x_(init_x(d)), ... ``` Listing 2

We emphasise that, in our toy example, `x_` can be directly initialised in the MIL (as seen in Listing 1). Listing 2 is merely a sample for more complex cases.

Most frequently the initialiser function creates a local object of the same type of the subobject that it initialises and returns it by value. Then the subobject is copy- or move-initialised from this value. Therefore, the subobject’s type must be constructible (in particular, it can’t be an abstract class) and also copy- or move-constructible.

Calling the copy- or move-constructor might have a cost. Nevertheless, mainstream compilers implement the return value optimisation [RVO] which, under certain circumstances, elides this call. Unfortunately, this doesn’t eliminate the need for the subobject’s type to be copy- or move-constructible.

In another variation, there are initialisers for various arguments that the subobjects’ constructors take. For instance, an initialiser function for base might compute `d * d` and return this value which is then passed to `base`’s constructor. In this way, the argument types, rather than the subobjects, must be constructible and copy- or move-constructible.

It’s worth mentioning that when the subobject is a reference member, the initialiser function must return a reference to a non-local object, otherwise the member will dangle. For instance, an initialiser function for `r_` could be as follows.

```  foo& init_r(double d, foo& r1, foo& r2) {
// r1 and r2 are non-local
return d > 0.0 ? r1 : r2;
}```

A positive aspect of having an initialiser function is that it can be used (and it most likely will be) by many constructors. When there’s no need to reuse the initialiser, C++11 offers the tempting possibility of writing the initialiser function as a lambda expression as shown below. Notice, however, that readability suffers.

```  x_([&]() -> double {
const double b = d * d; // d is captured
const double x = cos(b);
return x;
} (/* parentheses for calling the lambda */) )```

Where should the initialiser function be? Assuming that its sole purpose is initialising a class member (so it’s not going to be used anywhere else), then placing it in the global or in a named `namespace` is pollution. Making the initialiser a member of the class might come to mind but this isn’t ideal because it decreases encapsulation [Meyers00]. Additionally, this requires the initialiser’s declaration to be in the class header file forcing on clients an artificial dependency on the initialiser function. The best place for it is inside the class source file (which we’re assuming is not its header file). Making the initialiser invisible outside the file (by declaring it either static or in an unnamed `namespace`) improves encapsulation and decreases linking time.

Using an initialiser function is the best technique presented in this article as far as encapsulation, clarity and safety are concerned. However, one feature that this solution lacks is the ability to reuse results obtained by one initialiser into another. For instance, the value of `d * d` must be calculated by the initialiser functions of `base`, `x_` and `y_`. In this example, this issue isn’t a big deal but it could be if the result was obtained through a very costly operation.

Classes can have a member whose only purpose is storing a result to be used by different initialiser functions (e.g. `bar` could have a member `b_` to store `d * d`). This is obviously wasteful and, as in this section, we want partial results to have a short lifetime. The next sections present methods to achieve this goal.

## Bundling members

We can bundle some related members into a nested `struct` and create an initialiser function for the `struct` rather than for individual members. Listing 3 shows relevant changes to bar needed to initialise the two `const` members in one go.

 ```class bar : public base { struct point { double x, y; }; const point p_; static point init_p(double d); ... }; bar::point bar::init_p(double d) { const double b = d * d; const bar::point p = {cos(b), sin(b)}; return p; } bar::bar(double d, foo& r1, foo& r2) : ... p_(init_p(d)), ... ``` Listing 3

As in the previous section, the type returned by the initialiser function must be copy- or move-constructible and so do the `struct` members.

The initialiser function needs access to the nested `struct`. Ideally, this type will be `private` and the initialiser will be a `static private` member. The initialiser could be a `friend` but, being an implementation detail, hiding it inside the class is advisable. (Unfortunately, it can’t be hidden as much as in the previous section.) Alternatively, the initialiser function can be non-member and non-`friend` provided that the `struct` is made `public` but this decreases encapsulation even further.

We can’t include base classes in the `struct` and each of them needs a different initialiser function. However, as in our example, the initialiser function of a base class could profit from results obtained by other initialiser functions. The next section shows how to achieve this goal.

## Using an argument for temporary storage

In rare cases we can change the value of an argument to something that is more reusable. Listing 4 is an attempt for our example and consists of changing `d` to `d * d` just before initialising `base`. Unfortunately, this doesn’t work here since initialisations of `r_`, `f_` and `d_` need the original value of `d` but they also get the new one.

 ```bar::bar(double d, foo& r1, foo& r2) : base(d = d * d), // d has a new value x_(cos(d)), y_(sin(d)), // OK : uses new value r_(d > 0.0 ? r1 : r2), // BUG: uses new value f_(exp(d), -exp(d)) { // BUG: uses new value d_ = d; // BUG: uses new value } ``` Listing 4

A fix for the issue above is to use a dummy argument for temporary storage and giving it a default value to avoid bothering clients. This technique is in practice in Listing 5.

 ```class bar : public base { ... public: bar(double d, foo& r1, foo& r2, double b = 0.0); }; bar::bar(double d, foo& r1, foo& r2, double b) : base(b = d * d), // b has a new value x_(cos(b)), y_(sin(b)), // OK : uses b = d * d r_(d > 0.0 ? r1 : r2), // OK : uses d f_(exp(d), -exp(d)) { // OK : uses d d_ = d; // OK : uses d } ``` Listing 5

This works because the dummy argument persists for a short period but long enough to be reused by different initialisers. More precisely, its lifetime starts before the first initialisation of a subobject (`base` in our example) and ends after the constructor exits.

A problem (alas, there will be others) with this approach is that the constructor’s extended signature might conflict with another one. If it doesn’t today, it might tomorrow. As an improvement, we create a new type for the storage. For better encapsulation this type is nested in the `private` section of the class as Listing 6 illustrates.

 ```class bar : public base { struct storage { double b; }; ... public: bar(double d, foo& r1, foo& r2, storage tmp = storage()); }; bar::bar(double d, foo& r1, foo& r2, storage tmp) : base(tmp.b = d * d), x_(cos(tmp.b)), y_(sin(tmp.b)), ... ``` Listing 6

The simplicity of our example is misleading because the assignment `tmp.b = d * d` can be nicely put in the MIL whereas in more realistic scenarios `tmp` might need a more complex set up. It can be done, for instance, in `base`’s initialiser function by making it take a storage argument by reference as Listing 7 shows.

 ```double bar::init_base(double d, storage& tmp) { tmp.b = d * d; return tmp.b; } double bar::init_x(const storage& tmp) { const double x = cos(tmp.b); return x; } bar::bar(double d, foo& r1, foo& r2, storage tmp) : base(init_base(d, tmp)), x_(init_x(tmp)), ... ``` Listing 7

Notice that `tmp` is passing through the two-step set up process that we have previously advised against. Could we forward `d` to `storage`’s constructor to avoid the default initialisation? For this, `bar`’s constructor requires a declaration similar to

```  bar(double d, foo& r1, foo& r2,
storage tmp = storage(d));```

Unfortunately, this isn’t legal. The evaluation of one argument can’t refer to others. Indeed, it’s fairly well known that in a function call the order of argument evaluation is undefined. If the code above were allowed, then we could not be sure that the evaluation of `tmp` occurs after that of `d`. Recall that if `storage` consists of fundamental types only, then the default initialisation costs nothing. If it contains a member of non-fundamental type, then the technique presented in the next section applies to prevent default initialisation of a member. The method is general and equally applies to `bar` itself.

A very important warning is in order before leaving this section. Unfortunately, the method presented here is unsafe! The main issue is that the technique is very dependent on the order of initialisation of subobjects. In our example, `base` is the first subobject to be initialised. For this reason, `init_base` had the responsibility of setting up `tmp` before it could be used by `init_x`. The order of initialisation of subobjects is very sensitive to changes in the class. To mitigate this issue you can create a reusable empty class, say, `first_base`, that as its name indicates, must be the first base of a class to which we want to apply the technique presented here. Furthermore, this class’ initialiser function will have the responsibility of setting up the temporary storage as shown in Listing 8.

 ```class first_base { protected: explicit first_base(int) { // does nothing } }; class bar : first_base, public base { ... }; int bar::init_first_base(double d, storage& tmp) { tmp.b = d * d; return 0; } double bar::init_base(const storage& tmp) { return tmp.b; } bar::bar(double d, foo& r1, foo& r2, storage tmp) : first_base(init_first_base(d, tmp)), base(init_base(tmp)), ... ``` Listing 8

The use of `first_base` makes the code safer, clear and almost solves the problem. Even when `first_base` is the first in the list of base classes, there’s still a chance that it’s not going to be the first subobject to be initialised. This occurs when the derived class has a direct or indirect virtual base class because virtual bases are initialised first. Experience shows that only a minority of inheritances are virtual and, therefore, this issue is unlikely to happen. However, it’s always good to play safe. So, to be 100% sure, it suffices to virtually inherit from `first_base` (always keeping it as the first base in the list). The price that a class has to pay for this extra safety is carrying an extra pointer.

## Delaying initialisation

We arrive at the final technique of this article. The basic idea is delaying the initialisation of a subobject until the constructor’s body where more complex code can sit.

Compilers have a duty of trying to ensure that every object of class type is properly initialised before being used. Their way to perform this task is calling the default constructor whenever the programmer doesn’t explicitly call one. However, C++11 offers a loophole that we can exploit to prevent the compiler calling the default constructor.

The underlying pattern that supports delayed initialisation is the tagged union [TU], also known by various other names (e.g. discriminated union, variant type). A tagged union can hold objects of different types but at any time keeps track of the type currently held. Frequently, default initialisation of a tagged union means either no initialisation at all or default initialisation of a particular type (which again might mean no initialisation at all).

In general, tagged unions are implemented in C/C++ through unions. Unfortunately, the constraints that C++03 imposes on types that can be members of unions are quite strict and implementing tagged unions demands a lot of effort [Alexandrescu02]. C++11 relaxes the constraints on union members and gives more power to programmers. However, this come with a cost: now the programmer is responsible for assuring proper initialisation of union members. The technique that we shall see now relies on C++11. Later we shall see what can be done in C++03.

Class `foo` has no accessible default constructor and we are forced to initialise `f_` in the MIL to prevent a compiler error. We want to postpone the initialisation of `f_` to the constructor’s body where we can compute, store and reuse `exp(d)`. This can be achieved by putting `f_` inside an unnamed `union` as shown in Listing 9.

 ```class bar : public base { union { // unnamed union type foo f_; }; ... }; bar::bar(double d, foo& r1, foo& r2) : ... /* no f_ in the MIL */ { const double e = exp(d); new (&f_) foo(e, -e); } bar::~bar() { (&f_)->~foo(); } ``` Listing 9

Since the `union` is unnamed all its members (only `f_` in this case) are seen as if they were members of `bar` but the compiler forgoes their initialisations. A member of the `union` can be initialised in the constructor’s body through a placement `new`. In Listing 9 this builds an object of type `foo` in the address pointed by `&f_` or, in other words, the `this` pointer inside `foo`’s constructor will be set to `&f_`. Simple, beautiful and efficient – but this isn’t the end of the story.

The compiler neither initialises a member of a `union` nor destroys it. Ensuring proper destruction is again the programmer’s responsibility. Previously – listings 1–8 – the destruction of `f_` was called when its containing `bar` object was destroyed. To imitate this behaviour, the new `bar`’s destructor calls `~foo()on` the object pointed by `&f_`.

We have just written a destructor, and the rule of three says that we probably need to write a copy-constructor and an assignment operator as well. This is the case here. In addition, there are extra dangers that we must consider. For instance, a new constructor might be added to `bar` and the writer might forget to initialise `f_`. If a bar object is built by this constructor, then at destruction time (probably earlier) `f_` will be used. The code is then in undefined behaviour situation. To avoid this and other issues, we use a `bool` flag to signal whether `f_` has been initialised or not. When an attempt to use an uninitialised `f_` is made, the code might inform you by, say, throwing an exception. However, `bar`’s destructor can be more forgiving and ignore `f_` if it’s uninitialised. (Recall that a destructor shouldn’t throw anyway.)

Instead of forcing `bar` to manage `f_`’s usage and lifetime, it’s better to encapsulate this task in a generic template class called, say, `delayed_init`. Listing 10 shows a rough draft of an implementation. A more complete version is available in [Neri] but don’t use it (I repeat, don’t use it) because Boost.Optional [Optional] is a better alternative. Indeed, it’s a mature library that has been heavily tested over the last few years and also works with C++03. `delayed_init` is presented for didactic purposes only. As mentioned above, `union` rules in C++03 are strict and make the implementation of `boost::optional` more complex and difficult to understand. In contrast, `delayed_init` assumes C++11 rules and has a simpler code. See `delayed_init` as a draft of what `boost::optional` could be if written in C++11. Even though, Fernando Cacciola – the author of Boost.Optional – and Andrzej Krzemienski are working on a proposal [Proposal] for `optional` to be added to the C++ Standard Library. This idea has already been praised by a few members of the committee.

 ```template class delayed_init { bool is_init_ = false; union { T obj_; }; public: delayed_init() { } ~delayed_init() { if (is_init) (&obj_)->~T() } template void init(Args&&... args) { new (&obj_) T(std::forward(args)...); is_init_ = true; } T* operator->() { return is_init_ ? &obj_ : nullptr; } T& operator*() const { if (is_init_) return obj_; throw std::logic_error("attempt to use " "uninitialised object"); } ... }; ``` Listing 10

Let’s see what `delayed_init` looks like. Its member `is_init_` is initialised to false using the new brace-or-equal initialisation feature of C++11. Therefore, we don’t need to do it in the MIL. This leaves the default constructor empty and you might wonder why bother writing this constructor since the compiler will automatically implement one exactly as ours. Actually, it won’t because `delayed_init` has an unnamed `union` member (which is the whole point of this template class).

When the time comes to initialise the inner object, it suffices to call `init()`. This method is a variadic template function – another welcome and celebrated C++11 novelty – that takes an arbitrary number of arguments (indicated by the ellipsis `...`) of arbitrary types by universal reference [Meyers12] (indicated by `Args&&` where `Args` is deduced). These arguments are simply handed over to `T`’s constructor via `std::forward`. (Take another look at this pattern since it’s expected to become more and more frequent.)

Also note the presence of `operator->()`. Essentially, the class `delayed_init<T>` is a wrapper to a type `T`. We wish it could be used as a `T` by implementing `T`’s `public` interface and simply forwarding calls to `obj_`. This is impossible since `T` is unknown. A close alternative is returning a pointer to `obj_` because `T*` replicates `T`’s interface with slightly different syntax and semantics. Actually, pointer semantics fits very naturally here. Indeed, it’s common for a class to hold a pointer to an object rather than the object itself. In this way, the class can delay the object’s initialisation to a later moment where all data required for the construction is gathered. At this time the object is created on the heap and its address is stored by the pointer. Through `delayed_init`, we are basically replacing the heap with internal storage and, like in a smart pointer, managing the object’s lifetime. Finally, the `operator*()` is also implemented. It provides access to `obj_` and throws if `obj_` hasn’t been initialised.

## Conclusion

Initialisation in the MIL rather than assignment in the constructor has been advocated for long time. However, in some circumstances, there’s genuine need for not so simple initialisations which conflict with the poorness of the MIL’s syntax. This article has presented four techniques to overcome this situation. They vary in applicability, clarity and safety. On the way it presented some of the new C++11 features.

## Acknowledgements

Cassio Neri thanks Fernando Cacciola and Lorenz Schneider for their suggestions and careful reading of this article. He also thanks the Overload team for valuable remarks and feedback.

## References

[Alexandrescu02] Andrei Alexandrescu, Generic: Discriminated Unions (I), (II) & (III), Dr.Dobb’s, June 2002. http://tinyurl.com/8srld2z http://tinyurl.com/9tofeq4 http://tinyurl.com/8ku347d

[Meyers00] Scott Meyers, How Non-Member Functions Improve Encapsulation, Dr.Dobb’s, February 2000.http://tinyurl.com/8er3ybp

[Meyers05] Scott Meyers, Effective C++, Addison-Wesley 2005.

[Meyers12] Scott Meyers, Universal References in C++11, Overload 111, October 2012. http://tinyurl.com/9akcqjl

[Neri] Cassio Neri, delayed_init implementation. https://github.com/cassioneri/delayed_init

[Optional] Fernando Cacciola, Boost.Optional.http://tinyurl.com/8ctk6rf

[Proposal] Fernando Cacciola and Andrzej Krzemienski, A proposal to add a utility class to represent optional objects (Revision 2), September 2012. http://tinyurl.com/bvyfjq7

[RVO] Return Value Optimization, Wikipedia. http://tinyurl.com/kpmvdw

[TU] Tagged Union, Wikipedia. http://tinyurl.com/42p5tuz

1. C++11 also allows the use of braces but their semantics are different and outside the scope of this article. Therefore, we shall consider only parenthesised initialisations and their C++03 semantics.
2. It’s unfortunate but according to C++ Standard definitions, sometimes – as in this particular case – initialisation means doing nothing and the value of the object is indeterminate.