Title: Auto â€“ a necessary evil? (Part 2)

Author: Martin Moene

Date: 01 August 2013 18:01:13 +01:00 or Thu, 01 August 2013 18:01:13 +01:00

Summary: Should you declare (almost) everything auto? Roger Orr considers when auto is appropriate.

Body:

To have a right to do a thing
is not at all the same as
to be right in doing it.

~ G.K.Chesterton

In the first article we covered the rules governing the auto keyword that was added to the language in C++11 (or added back, if your memory of C++ goes back far enough!)

It is important with a feature like auto not only to know the rules about what is permitted by the language â€“ and the meaning of the consequent code â€“ but also to be able to decide when the use of the feature is appropriate and what design forces need to be considered when taking such decisions.

In this article we look in more detail at some uses of auto with the intent of identifying some of these issues.

A â€˜complex typeâ€™ example

One of the main motivations for auto was to simplify the declaration of variables with â€˜complicatedâ€™ types. One such example is in the use of iterators over standard library containers in cases such as:

  std::vector<std::set<int>> setcoll;
  std::vector<std::set<int>>::const_iterator it =
     setcoll.cbegin();¹

Many programmers were put off using the STL because of the verbosity of the variable declarations. With C++03 one recommendation was to use a typedef â€“ and this approach remains valid in C++11:

  typedef std::vector<std::set<int> > collType;
  // C++03 code still works fine
  collType setcoll;6">
  collType::const_iterator it = setcoll.begin();

With the addition of auto to the language the code can be shortened considerably:

  std::vector<std::set<int>> setcoll;
  auto it = setcoll.cbegin();

But is it better?

To help answer that question let us consider the alternatives in more detail.

The original code is often seen as hard to read because the length of the variable declaration dwarfs the name itself. Many programmers dislike the way that the meaning of the code is masked by the scaffolding required to get the variable type correct.

Additionally, the code is fragile in the face of change. The type of the iterator is heavily dependent on the type of the underlying container so the two declarations (for setcoll and it) must remain in step if the type of one changes.

The second code, using a typedef, improves both the readability of the code and also the maintainability as, should the type of the container change, the nested type const_iterator governed by the typedef will change too. However, having to pick a type name adds to the cognitive overhead; additionally good names are notoriously hard to pin down.

In the final code the use of auto further helps readability by focussing the attention on the expression used to initialise it as this defines the type that auto will resolve to. Given this, code maintainability is improved as the type of it will track the type required by the initialising expression.

We retain the type safety of the language â€“ the variable is still strongly typed â€“ but implicitly not explicitly. The main downside of the final version of the code is that if you do need to know the precise type of the variable then you have to deduce it from the expression, to do which also means knowing the type of the container. On the other hand, it can be argued that to understand the semantics of the line of code you already have to know this information, so the new style has not in practice made understanding the code any more difficult.

In this case I am inclined to agree with this view and I can see little downside to the use of auto to declare variables for iterators and other such entities. So:

the code is quicker and easier to write and, arguably, to read
the purpose is not lost in the syntax
code generated is identical to the explicit type
the variable automatically changes type if the collection type changes

However, the last point can be reworded as the variable automatically silently changes type if the collection type changes. In particular this can be an issue with the difference between a const and non-const container. Note that the C++11 code uses cbegin():

  auto it = setcoll.cbegin();

If weâ€™d retained the used of begin() we would have got a modifiable iterator from a non-const collection. The C++03 code makes it explicit by using the actual type name:

  std::vector<std::set<int>>::const_iterator it;

The stress is slightly different and may mean making some small changes to some class interfaces, as with the addition of cbegin().

DRY example

auto allows you to specify the type name once. Consider this code:

    std::shared_ptr<std::string> str =
        std::make_shared<std::string>("Test");

Weâ€™ve repeated the std::string
make_shared exists solely to create std::shared_ptr objects

We can write it more simply as:

  auto str = std::make_shared<std::string>("Test");

The resulting code is just over half as long to write (and read) and I donâ€™t think weâ€™ve lost any information. Additionally the code is easier to change.

Using auto rather than repeating the type is indicated most strongly when:

the type names are long or complex
the types are identical or closely related

auto is less useful when:

the type name is simple â€“ or important
the cognitive overhead on the reader of the code is higher

So I think auto may be less useful in an example like that in Listing 1.

// in some header
struct X {
 int *mem_var;
 void aMethod();
};

// in a cpp file
void X::aMethod() {
  auto val = *mem_var; // what type is val?
  ...

Listing 1

YMMV (Your mileage may vary) â€“ opinions differ here. The ease of answering the question about the type of val may also depend on whether you are using an IDE with type info.

For example, with Microsoft Visual Studio you get the type for the example in Listing 1 displayed in the mouse-over as shown in Figure 1.

Figure 1

Dependent return type example

auto can simplify member function definitions. Consider the class and member function definition in Listing 2.

class Example
{
public:
  typedef int Result;

  Result getResult();
};

Example::Result Example::getResult()
{ return ...; }

Listing 2

We have to use the prefix of Example:: for the return type Result as at this point in the definition the scope does not include Example. auto allows the removal of the class name from the return type.

The syntax is to place the auto where the return type would otherwise go, then follow the function prototype with -> and the actual return type:

  auto Example::getResult() -> Result
  { return ...; }664">

Whether or not this makes the code clearer depends on factors including:

familiarity
consistent use of this style.

Personally, I still canâ€™t decide on this one. I think the new style is an improvement over the old one, but until use of C++11 is sufficiently widespread trying to use the style may simply result in a mix of the old and new styles being used. I do not think this would be a great step forward for existing code bases, but might be worth trying out for new ones.

Polymorphism?

One problem with auto is the temptation to code to the implementation rather than to the interface. If we imagine a class hierarchy with an abstract base class Shape and various concrete implementations such as Circle and Ellipse. We might write code like this:

  auto shape = make_shared<Ellipse>(2, 5);
  ...
  shape->minor_axis(3);

The use of auto has made the generic variable shape to be of the explicit type shared pointer to Ellipse. This makes it too easy to call methods â€“ such as minor_axis above â€“ that are not part of the interface but of the implementation.

When the type of shape is â€˜shared pointer to the abstract base classâ€™, you canâ€™t make this mistake. (Aside: I think this is a bigger problem with var in C# than with auto in C++ but your experience may be different.) The trouble is that auto is too â€˜plasticâ€™ â€“ it fits the exact type that matches whereas without auto the author needs to make a decision about the most appropriate type to use. This doesnâ€™t only affect polymorphism: const, signed/unsigned integer types and sizes are other possible pinch points where the deduction of the type done by auto is not the best choice.

What type is it?

It is possible to go to the extreme of making everything in the program use auto, but Iâ€™m not convinced this is a good idea. For example, what does the program in Listing 3 do?

auto main() -> int {
  auto i = '1';
  auto j = i * 'd';
  auto k = j * 100l;
  auto l = k * 100.;
  return l;
}

Listing 3

It is all too easy to assume the auto types are all the same â€“ miss the promotion, the 'l' or the '.'. Opinions also vary on whether writing main using auto aids readability â€“ I am not at all sure it does, especially given the large amount of existing code predating this use of auto.

You can use the auto rules (on some compilers) to tell you the type. For example, if we want to find out the actual type of j we could write this code:

  auto main() -> int {
    auto i = '1';
    auto j = i * 'd', x = "x";
    ...

When compiled this will error as the type deduction for auto for the variables j and x produces inconsistent types. A possible error message is:

    error: inconsistent deduction for 'auto':
  'int' and then 'const char*'

You may also be able to get the compiler to tell you the type by using template argument deduction, for example:

  template <typename T>
  void test() { T::dummy(); }
  
  auto val = '1';
  test<decltype(val)>();

This generates an error and the error text (depending on the compiler) is likely to include text such as:

see reference to function template instantiation 'void test<char>(void)' being compiled

What are the actual rules?

The meaning of an auto variable declaration follows the rules for template argument deduction.

We can consider the invented function template

  template <typename T>
  void f(T t) {}

and then in the expression auto val = '1'; the type of val is the same as that deduced for T in the call f('1').

This meaning was picked for good reason â€“ type deduction can be rather hard to understand and it was felt that having a subtly different set of rules for auto from existing places where types are deduced would be a bad mistake. However, this does mean that the type deduced when using auto differs from a (naÃ¯ve) use of decltype:

  const int ci;
  auto val1 = ci;
  decltype(ci) val2 = ci;

val1 is of type int as the rules for template argument deduction will drop the top-level const; but the type of val2 will be const int as that is the declared type of ci.

Adding modifiers to auto

Variables declared using auto can be declared with various combinations of const and various orts of references. So whatâ€™s the difference?

  auto          i   = <expr>;
  auto const    ci  = <expr>;
  auto       &  ri  = <expr>;
  auto const &  cri = <expr>;
  auto       && rri = <expr>;

As above, auto uses the same rules as template argument deduction so we can ask the equivalent question about what type is deduced for the following uses of a function template:

  template <typename T>;
    void f(T          i);
    void f(T const    ci);
    void f(T       &  ri);
    void f(T const &  cri);
    void f(T       && rri);

The answer to the question is, of course, â€˜it dependsâ€™ ... especially for the && case (which is an example of what Scott Meyers has named the â€˜Universal Referenceâ€™).

const inference (values)

Let us start by looking at a few examples of using auto together with const for simple value declarations.

  int i(0); int const ci(0);
  
  auto       v0 = 0;
  auto const v1 = 0;
  auto       v2 = i;
  auto const v3 = i;
  auto       v4 = ci;
  auto const v5 = ci;

This is the easiest case and, as in the earlier discussion of the difference between auto and decltype, v0 is of type int and v1 is of type int const (you may be more used to calling it const int). Similarly v2 and v4 are of type int and v3 and v5 are of type int const.

In general, with simple variable declarations, I prefer using auto const by default as the reader knows the value will remain fixed. This means if they see a use of the variable later in the block they do not have to scan the intervening code to check whether or not the value has been modified.

const inference (references)

Letâ€™s take the previous example but make each variable an l-value reference:

  int i(0); int const ci(0);
  
  auto       & v0 = 0;  // Error
  auto const & v1 = 0;
  auto       & v2 = i;
  auto const & v3 = i;
  auto       & v4 = ci;
  auto const & v5 = ci;

The first one fails as you may not form an l-value reference to a temporary value. However, you are allowed to form a const reference to a temporary and so v1 is valid (and of type int const &).

v2 is valid and is of type int & and the three remaining variables are all of type int const &. Notice that the const for v4 is not removed, unlike in the previous example, as it is not a top-level use of const.

Reference collapsing and auto

Things get slightly more complicated again when we use the (new) r-value reference in conjunction with auto.

  int i(0); int const ci(0);
  
  auto       && v0 = 0;
  auto const && v1 = 0;
  auto       && v2 = i;
  auto const && v3 = i;   // Error

  auto       && v4 = ci;
  auto const && v5 = ci;  // Error

The first variable, v0, becomes an r-value reference to the temporary 0 (type int &&) and the second, v1, is the const equivalent (int const &&). When it comes to v2, however, the reference type â€˜collapsesâ€™ to an l-value reference and so the type of v2 is simply int &. v3 is invalid as the presence of the const suppresses the reference collapsing and you are not allowed to bind an r-value reference to an l-value. v4 reference-collapses to int const & and the declaration of v5 is an error for the same reason as for v3.

So this is the complicated one: auto && var = <expr>; as, depending on the expression, var could be

  T &
  T &&
  T const &
  T const &&

Deducing the last case is a little more obscure â€“ you need to bind to a const temporary that is of class type. Hereâ€™s an example of deducing const &&:

  class T{};
  const T x() { return T(); }
  auto && var = x();   // var is of type T const &&

Note that non-class types, like int, decay to &&. This changed during the development of C++11 and at one point Microsoftâ€™s compiler and the Intellisense disagreed over the right answer (see Figure 2)!

Figure 2

(The compiler in the Visual Studio 2013 preview edition does now get this right.)

More dubious cases

auto does not work well with initializer lists as the somewhat complicated rules for parsing these results in behaviour, when used with auto, that may not be what you expect:

  int main() {
    int var1{1};
    auto var2{1};

You might expect var1 and var2 to have the same type. Sadly the C++ rules have introduced a new â€˜vexing parseâ€™ into the language. The type of var2 is std::initializer_list<int>. There is a proposal to make this invalid as almost everyone who stumbles over this behaviour finds it unexpected.

A mix of signed and unsigned integers â€“ or integers of different sizes â€“ can cause problems with auto. In many cases the compiler generates a warning, if you set the appropriate flag(s), and if you heed the warning you can resolve possible problems. But not in all cases ....

  std::vector<int> v;
  ...
  for (int i = v.size() - 1; i > 0; i -= 2)
  {
    process(v[i], v[i-1]);
  }

If you change int to auto then the code breaks. The trouble here is that v.size() returns std::vector::size_type which is an unsigned integer value. The rules for integer promotions means that i is also an unsigned integer value. If it starts out odd it will decrease by 2 round the loop as far as 1, then the next subtraction will wrap around â€“ to a large positive value. Of course, care must be taken to ensure that an int will be large enough for all possible values of size() that the program might encounter.

Iâ€™m less convinced by the use of auto for variables defined by the results of arithmetic expressions as the correct choice of variable type may be necessary to ensure the desired behaviour of the program.

Conclusion

auto is a very useful tool in the programmerâ€™s armoury as it allows you to retain type safety without needing to write out the explicit types of the variables. I expect that use of auto will become fairly widespread once use of pre-C++11 compilers becomes less common.

However, I do have a concern that thoughtless use of auto may result in code that does not behave as expected, especially when the data type chosen implicitly is not the one the reader of the code anticipates.

Please donâ€™t use auto without thought simply to save typing, but make sure you use it by conscious choice and being aware of the potential issues and possible alternatives.

Acknowledgements

Many thanks to Rai Sarich and the Overload reviewers for their suggestions and corrections which have helped to improve this article.

This article is based on the presentation of that title at ACCU 2013.

cbegin is another C++11 addition: it explicitly returns a const iterator even from a mutable container.

Notes:

More fields may be available via dynamicdata ..