Programming Topics + CVu Journal Vol 29, #3 - July 2017
Browse in : All > Topics > Programming
All > Journals > CVu > 293
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Code Critique Competition 106

Author: Bob Schmidt

Date: 02 July 2017 17:41:54 +01:00 or Sun, 02 July 2017 17:41:54 +01:00

Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.

Body: 

Please note that participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.

Note: If you would rather not have your critique visible online, please inform me. (Email addresses are not publicly visible.)

Last issue’s code

I was writing a simple program to analyse the programming languages known by our team. In code review I was instructed to change languages() method in the programmer class to return a const reference for performance. However, when I tried this the code broke. I’ve simplified the code as much as I can: can you help me understand what’s wrong? It compiles without warnings with both MSVC 2015 and gcc 6. I am expecting the same output from both ‘Original way’ and ‘New way’ but I get nothing printed when using the new way.

The listings are as follows:

#pragma once
class programmer
{
public:
  // Add a language
  void add_language(std::string s)
  { languages_.push_back(s); }
  // the programmer's languages - original
  std::vector<std::string>
  original() const
  {
    return {languages_.begin(),
            languages_.end()};
  }
  // new - avoid copying for performance
  std::list<std::string> const &
  languages() const
  { return languages_; }
  programmer() = default;
  programmer(programmer const& rhs)
  : languages_(rhs.languages_) {}
  programmer(programmer &&tmp)
  : languages_(std::move(tmp.languages_)) {}
  ~programmer() { languages_.clear(); }
private:
  std::list<std::string> languages_;
};
			
Listing 1
#pragma once
#include <map>
class team
{
public:
  // Add someone to the team
  void add(std::string const &name,
    programmer const & details)
  {
    team_.emplace(
      std::make_pair(name, details));
  }
  // Get a team member's details
  auto get(std::string const &name) const
  {
    auto it = team_.find(name);
    if (it == team_.end())
      throw std::runtime_error("not found");
    return it->second;
  }
private:
  std::map<std::string, programmer> team_;
};
			
Listing 2
#include <iostream>
#include <list>
#include <map>
#include <string>
#include <vector>
#include "programmer.h"
#include "team.h"

int main()
{
  team t;
  programmer p;
  p.add_language("C++");
  p.add_language("Java");
  p.add_language("C#");
  p.add_language("Cobol");
  t.add("Roger", std::move(p));

  p.add_language("javascript");
  t.add("John", std::move(p));

  std::cout << "Original way:\n";
  for (auto lang : t.get("Roger").original())
  {
    std::cout << lang << '\n';
  }
  std::cout << "\nNew way:\n";
  for (auto lang : t.get("Roger").languages())
  {
    std::cout << lang << '\n';
  }
}
			
Listing 3

Critique

James Holland

The problem with the student’s code lies in the ‘New way’ range-based for loop. Calling the get() member function on t results in a temporary object of type programmer. The temporary object’s languages() function is then called which returns an std::list of std::strings. The range-based for loop then effectively calls begin() on the std::list thus returning an iterator to the list. The iterator will be then compared with the end of the list to determine whether there are any items in the list. Unfortunately, it is around this time (and I must admit I am not exactly sure when) that the temporary programmer object will go out of scope, its destructor called and will no longer be valid. We are now in the realms of undefined behaviour. Anything could happen and we should not be too surprised if nothing is printed.

Obtaining a reference to t’s programmer object, instead of making a copy, will result in the range-based for loop referring to a valid list of strings and ultimately in the expected printed output. This can be achieved by amending team’s get() function to return a reference to a programmer object. As the t object will remain valid while the for loop is executing, there is no need to make a copy.

I think there are other oddities with the student’s code. The std::move() within the second parameter of the t.add() has no effect as the parameter is being passed by (const) reference. This means object p contains the same programming languages after the call to t.add() as before. This results in John possessing all the programming languages of Roger plus Javascript. This is probably not what was required. If std::move(p) is used, it must be assumed that p is left in a valid but unknown state. It may contain values or it may not. It is best to modify the programmer class to have a public clear() function. This function should then be called before repopulating p with another member of staff’s programming languages.

Commentary

The main problem with this critique, as James explained, is holding a reference to a temporary after its destruction. This is, unfortunately, quite easy to do in C++ and the new-style range based for provides a potential site.

In C++ the statement

  for (auto lang : t.get("Roger").languages()){}

is equivalent to this rewritten block of statements:

  {
    auto &&__range = t.get("Roger").languages();
    auto __begin = __range.begin();
    auto __end = __range.end();

    for ( ; __begin != __end; ++__begin ) {
      auto lang = *__begin;
      {}
   }
  }

Since languages() returns a const reference to an std::list this means __range in turn is a const reference to an std::list – but the target of the reference is member data of a temporary that is destroyed at the end of the first statement in the re-written code.

James’ solution in this case, of making the get() function return a reference not a copy, solves the problem as __range now refers to an object owned by t, whose lifetime lasts after the end of the for loop, and so there is no longer a ‘dangling reference’.

The root cause of the problem is the declaration of the method:

  auto get(std::string const &name) const;

Somehow it seems the presence of auto partially disables the programmer’s critical mind. Without auto the programmer, in my experience, is more likely to notice the implied copy in:

  programmer get(std::string const &name) const;

and to have written the method to return a reference:

  programmer const
    &get(std::string const &name) const;

In other cases, though, this may not be a possible solution (for example, the data returned by get() might be generated during the call). We can make the code safe by binding the temporary to a named variable that outlasts the for loop:

  auto temp = t.get("Roger");
  for (auto lang : temp.languages()) {}

One drawback with this solution is that the variable introduced remains in scope until the end of the enclosing block; it would be nicer to ensure it gets deleted at the end of the for loop. By a bit of a co-incidence, two proposals published this year would provide alternative ways of making the code safe. (Note that neither proposal has, as yet, been approved by the standards committee.)

Firstly Zhihao Yuan’s proposal P0577R0 (‘Keep that temporary!’) would allow using the proposed ‘register’ expression to extend the lifetime of the temporary to the end of the loop, using this syntax:

  for (auto lang :
    (register t.get("Roger")).languages()){}

Secondly Thomas Köppe’s proposal P0614R0 (‘Range-based for statements with initializer’) would allow using an initializer in a range for loop, using this syntax:

  for (auto tmp = t.get("Roger");
       auto lang : tmp.languages()){}

This elegantly ensures the scope of the introduced variable ends at the end of the for loop.

Even if either or both of these proposals are accepted, the underlying problem of indirectly binding a reference to a temporary object remains untouched, and can be hard to diagnose. The problem occurs when chaining method calls where the final call returns a reference but one or more of the intermediate calls return a temporary. This is something a static analysis tool could in principle detect; does anyone know of an existing tool that does so?

The second problem with the code is the re-use of the moved-from p –James correctly explains the problem. In order for the std::move() to have any effect, the argument to add() would need to be either a value or an r-value reference, for example:

  void add(std::string const &name,
    programmer details)
  {
    team_.emplace(
      std::make_pair(name,
        std::move(details)));
  }

(Note you also need to add std::move() when passing the details object to emplace().)

However, it is still unspecified what the state of the object will be after the call, so it cannot be assumed to be empty.

The Winner of CC 105

After a full post-bag of six entries for CC104 I was disappointed to only receive one entry for CC105 – but it was a clear and concise critique of the code and so I have no qualms in awarding James this issue’s prize.

Code Critique 106

(Submissions to scc@accu.org by Aug 1st)

I am learning some C++ by writing a simple date class. The code compiles without warnings but I’ve made a mistake somewhere as the test program doesn’t always produce what I expect.

    > testdate
    Enter start date (YYYY-MM-DD): 2017-06-01
    Enter adjustment (YYYY-MM-DD): 0000-01-30
    Adjusted Date: 2017-07-31
    >testdate
    Enter start date (YYYY-MM-DD): 2017-02-01
    Enter adjustment (YYYY-MM-DD): 0001-01-09
    Adjusted Date: 2018-03-10
    >testdate
    Enter start date (YYYY-MM-DD): 2017-03-04
    Enter adjustment (YYYY-MM-DD): 0001-00-30
    Adjusted Date: 2017-04-03

That last one ought to be 2018-04-04, but I can’t see what I’m doing wrong.

Please can you help the programmer find his bug – and suggest some possible improvements to the program!

// A start of a basic date class in C++.
#pragma once

class Date
{
  int year;
  int month;
  int day;

public:
  void readDate();
  void printDate();
  void addDate(Date lhs, Date rhs);
  bool leapYear();
};
			
Listing 4
#include "date.h"
#include <iostream>
using namespace std;

// Read using YYYY-MM-DD format
void Date::readDate()
{
  cin >> year;
  cin.get();
  cin >> month;
  cin.get();
  cin >> day;
}
// Print using YYYY-MM-DD format
void Date::printDate()
{
  cout << "Date: " << year << '-' <<
    month/10 << month%10 << '-' <<
    day/10 << day %10;
}

void Date::addDate(Date lhs, Date rhs)
{
  year = lhs.year + rhs.year;
  month = lhs.month + rhs.month;
  day = lhs.day + rhs.day;

  // first pass at the day -- no months
  // are over 31 days
  if (day > 31)
  {
    day -= 31;
    month = month + 1;
    if (month > 12)
    {
      year += 1;
      month -= 12;
    }
  }
  // normalise the month
  if (month > 12)
  {
    year += 1;
    month -= 12;
  }
  // now check for the shorter months
  int days_in_month = 31;
  switch (month)
  {
  default: return; // done 31 earlier
  case 2: // Feb
    days_in_month = 28 + leapYear()?1:0; 
    break;
  case 4:  // Apr
  case 6:  // Jun
  case 9:  // Sep
  case 11: // Nov
    days_in_month = 30;
  }
  if (day > days_in_month)
  {
    day -= days_in_month;
    month += 1;
    if (month > 12)
    {
      month -= 12;
      year += 1;
    }
  }
}
bool Date::leapYear()
{
  // Every four years, or every four centuries
  if (year % 100 == 0) return year % 400 == 0;
  else return year % 4 == 0;
}
			
Listing 5
#include "date.h"
#include <iostream>

using std::cout;

int main()
{
  Date d1, d2, d3;
  cout << "Enter start date (YYYY-MM-DD): ";
  d1.readDate();
  cout << "Enter adjustment (YYYY-MM-DD): ";
  d2.readDate();

  // Add the two dates
  d3.addDate(d1, d2);
  cout << "Adjusted ";
  d3.printDate();
}
			
Listing 6

You can also get the current problem from the accu-general mail list (next entry is posted around the last issue’s deadline) or from the ACCU website (http://accu.org/index.php/journal). This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.

Notes: 

More fields may be available via dynamicdata ..