Journal Articles

CVu Journal Vol 28, #1 - March 2016 + Student Code Critiques from CVu journal.
Browse in : All > Journals > CVu > 281 (11)
All > Journal Columns > Code Critique (70)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Code Critique Competition 98

Author: Martin Moene

Date: 04 March 2016 20:34:23 +00:00 or Fri, 04 March 2016 20:34:23 +00:00

Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.

Body: 

Participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.

Note: If you would rather not have your critique visible online, please inform me. (We will remove email addresses!)

Last issue’s code

I have a simple template class that holds a collection of values but sometimes code using the class crashes. I’ve written a simple test for the class, which works, but I do get a warning about a signed/unsigned mismatch on the for loop. I thought using auto would stop that. Is that anything to do with the crash? I’ve commented out all the other methods apart from add and remove.

The code is in Listing 1 (values.h) and Listing 2 (test.cpp).

#include <utility>
#include <vector>

#pragma once

// An unordered collection of values
template <typename T>
class Values
{
public:
  void add(T t);
  bool remove(T t);
  std::vector<T> const & values() const
  { return v; }
private:
  std::vector<T> v;
};

// Add a new item
template <typename T>
void Values<T>::add(T t)
{
   v.push_back(t);
}

// Remove an item
// Returns true if removed, false if not present
template <typename T>
bool Values<T>::remove(T t)
{
  bool found(false);
  for (auto i = 0; i != v.size(); ++i)
  {
    if (v[i] == t)
    {
      v[i] = std::move(v.back());
      v.pop_back();
      found = true;
    }
  }
  return found;
}
			
Listing 1
#include <iostream>
#include "values.h"

void test()
{
  Values<int> vi;
  for (int i = 0; i != 10; ++i)
  {
    vi.add(i);
  }

  if (!vi.remove(1))
  {
     std::cout << "Can't remove 1\n";
  }

  if (vi.remove(1))
  {
     std::cout << "Can remove 1 twice\n";
  }

  if (!vi.remove(9))
  {
    std::cout << "Can't remove 9\n";
  }

  std::cout << "size: " << vi.values().size()
    << std::endl;
}

int main()
{
  test();
}
			
Listing 2

Critiques

Mathias Gaunard

The answer is simple: pop_back() reduces the size by 1, so if i is already on the last element, when it enters the next iteration it is past the new size.

Two ways to solve this:

The test should also be amended to test removing the last element (it does remove 9, but by the time it does so it’s not the last element anymore, since remove reorders).

Paul Floyd

To paraphrase Douglas Adams, we can get an answer to the problem, but we don’t really know what the question is.

We have the comment telling us that Values contains an unordered collection, but are the items unique or not? If it is a multi-collection, should remove take out just the first instance or all instances? Furthermore, what are the complexity requirements for values::remove()?

I played around with the code a bit, and then read it a bit more carefully. There were three issues with remove that struck me:

  1. If the item to be removed is the last in the vector, pop_back() will reduce the size by 1 whilst i will increment to the old size(), which is now the new size() plus one, missing the loop termination condition. The loop will continue reading from the end of the vector, doing bad things. Probably this is the cause of the crash described. This could be fixed by changing the for loop termination condition to be i < v.size().
  2. If the condition is true, then the item at back() jumps the queue to a position that won’t be tested by the loop. So if the collection contains {0,1,2,1} and remove(1) is called, then when the first 1 is detected, the second 1 is moved to replace it and the next loop iteration continues with the 2. The second 1 thus evades detection. This could be fixed by decrementing i when an element is being removed so that the next iteration will test the moved value.
  3. In the case where t (and thus v[i]) matches v.back()
    v[i] = std::move(v.back());

    is potentially performing self move assignment. This isn’t a problem for the int type used in this example, but I believe that it is undefined behaviour for complex types.

    This could be fixed by adding a self assignment detection check.

Putting all that together, we have:

  template <typename T>
  bool Values<T>::remove(T t)
  {
    bool found(false);
  
    for (auto i = 0; i < v.size(); ++i)
    {
      if (v[i] == t)
      {
        // avoid self move assignment
        if (i != v.size()-1)
        {
            v[i] = std::move(v.back());
            // ensure that the value at v.back()
            // that just got moved
            // is checked next time round
            --i;
        } 
        v.pop_back();
        found = true;
      }
    }
    return found;
  }

In practice, however, I would use the standard algorithms, for instance:

  size_t oldSize(v.size());
  v.erase(std::remove(v.begin(), v.end(), t),
    v.end());
  return oldSize != v.size();

The complexity of the erase/std::remove solution may be quite different to the hand-rolled remove function. Since it preserves the order of the container, the worst case of removing 1 element from the front of the vector is that it has to move n-1 elements, which could be expensive. The hand rolled remove does not preserve order and may need to do fewer moves.

Robert Jones

My first thought on reading this Code Critique is that if at all possible I’d be inclined to try to avoid writing container classes. Users tend to have an expectation that any container will conform to the same style, and offer the same facilities, as the standard containers, which would make any container class a non-trivial exercise. If you’re performing non-standard manipulations on a collection it may well be better to do it explicitly inline, so readers can see what's going on, rather than hide it behind the facade of a container class.

Coming to the code in detail, the first point is the signed/unsigned mismatch in

  for (auto i = 0; i != v.size(); ++i)

which the author expected would have been eliminated by the use of auto. The mismatch occurs because the type of i is inferred from the initial assignment, which is with a literal '0', a signed value, which then doesn’t match the type of v.size(), an unsigned value.

The smallest minimal change to correct this would be

  for (auto i = 0u; i != v.size(); ++i)

In modern C++ this would really be better expressed directly using iterators, when the spurious signed type problem disappears

  for ( auto i = v.begin(); i != v.end(); ++i )

and subsequently all occurrences of v[i] become *i

Rather more succinctly in C++11 we can just write

  for ( auto & i : v )

and avoid the need to dereference at each usage. Notice that this syntax requires the introduction of the '&' l-value reference qualifier, otherwise 'i' would always be a copy of the container value, not a reference to it.

Next we come to the question of the unexplained crashes.

All of the methods of iterating over the range implicitly or explicitly depend on the value of end() or size(), which is invalidated or changed from within the iteration loop by the use of pop_back() when the element is found, so the loop termination condition, based on equality, may never be met. One solution to this would be to break out of the loop when a matching element is found, however this would mean that only the first matching element is removed. Generally a better solution is the erase-remove idiom:

https://en.wikipedia.org/wiki/Erase%E2%80%93remove_idiom

Finally, the use of std::move() is creating more uncertainty. Thomas Becker covers this rather well in

http://thbecker.net/articles/rvalue_references/section_04.html

when he writes:

Now consider the line

      a = std::move(b);

If move semantics are implemented as a simple swap, then the effect of this is that the objects held by a and b are being exchanged between a and b. Nothing is being destructed yet. The object formerly held by a will of course be destructed eventually, namely, when b goes out of scope. Unless, of course, b becomes the target of a move, in which case the object formerly held by a gets passed on again. Therefore, as far as the implementer of the copy assignment operator is concerned, it is not known when the object formerly held by a will be destructed.

So the line in code v[i] = std::move(v.back()); is potentially introducing an indeterminacy about when the value of v[i] will ultimately be destructed.

Generally the use of std::move() can be confined to constructors and assignment operators, so if you find yourself tempted to use it outside that context you want to think very carefully about the semantics of what you’re writing.

James Holland

First, let’s get the more obvious issues out of the way. The signed/unsigned mismatch warning can be resolved in several ways. As the student says, auto has been used in an attempt to solve this problem. Unfortunately, it has not worked as the student had hoped. The problem is that auto, in this case, makes i the same type as 0 and 0’s type is int. The type of such numeric literals can be changed to unsigned by adding the suffix U or u. Doing this will make i of type unsigned int (the same type of size()) and thus remove any mismatch. Another way to remove the type mismatch is to declare i of type size_t. size_t is unsigned, which is what is required, and makes the type of i clear.

Unfortunately the type mismatch has nothing to do with the crashing problem. It is also unfortunate that the student’s test code does not highlight the problem with the code. What the test code fails to do is to attempt to remove a value from the (back) end of the collection. It is this that causes the crash as described by the following scenario.

Assume there are eight elements in the container and remove() is about to inspect the last element to discover whether it should be removed. In this situation the index i has the value 7 and size() returns 8. Further assume the last element is to be removed. The function remove() now moves the contents of the last element of the container to the element having index i. As i is an index to the last element, the last element is moved to itself. This is not necessarily a problem but what happens next is. The function remove() then removes the last element, making the size of the container 7, and increments i, making its value 8. When remove() determines whether there are any more elements to process, by means of the for loop, it discovers that i is not equal to size() (because 8 is not equal to 7) and so executes the body of the for loop again with disastrous consequences. Specifically, it attempts to access an element beyond the bounds of the container. The problem is that i has increased in value and the size of the container has decreased in value within one iteration. The for loop never saw their values being equal and so did not stop processing the container when required.

It may be tempting to change the for loop to execute when i is less than v.size(). This would stop the program crashing, but making this change would not correct another problem with remove(); it does not necessarily remove all the required elements. This can happen when more than one element is to be removed and one of them is at the back of the container. The function moves the element at the back of the container to one of the other elements being removed and then moves on, not realising that the moved-to element still needs to be removed. To cure this problem, the index i should not be incremented when a move is made. This will give remove() the opportunity to process the moved element appropriately.

A revised version of remove() that does not suffer from the crashing problem and removes all required elements is shown below.

  template <typename T>
  bool Values<T>::remove(T t)
  {
    bool found(false);
    for (size_t i = 0; i != v.size();)
    {
      if (v[i] == t)
      {
        v[i] = std::move(v.back());
        v.pop_back();
        found = true;
      }
      else
      {
        ++i;
      }
    }
    return found;
  }

As can be seen, the for loop is not now responsible for incrementing the index i. The index is now incremented only if the current element is not to be removed from the container.

The student has made Values a class template. This implies that the type of object that Values contains is not necessarily known. Such a type must be well behaved when copied to itself as this could occur during the execution of remove(). A type that uses heap based memory, for example, must not allow the heap to become corrupted. The easiest way to prevent this is for its copy assignment operator to check for self-assignment as shown below.

  U & operator=(const U & rhs) noexcept
  {
    if (this != &lhs)
    {
      ...
    }
    return *this;
  }

A similar check should be made in its move assignment operator (operator=(U && rhs)).

One final point; #pragma once is not part of the C++ standard and so may not be supported on all platforms. The alternative is to use the standard include guard as follows.

  #ifndef HEADER_H
  #define HEADER_H
  …
  #endif

Commentary

There were a number of problems with the supplied code including the warning which was troubling the user. The warning was in this case a bit of a red herring as in this example the possible values over which i might range are small enough to fit exactly into the (signed) int. However, this is not always the case, and so avoiding the warning is good practice. As the critiques that tried to address this showed though, it is not as easy as you might like.

Changing i from int to unsigned int will typically double the maximum value i can reach – but the type of v.size() may well be larger than either of these (for example, it might be a 64-bit integer on a platform where int is only 32 bits) so removing the warning here actually hides a potential problem.

There was a long discussion on Herb Sutter’s blog early last year – see http://herbsutter.com/2015/01/14/reader-qa-auto-and-for-loop-index-variables/ – which includes many different ways of trying to solve this particular problem!

The crashing, caused by iterating past the end of the collection if the last element is removed, is a specific instance of a general problem – it is easy to make mistakes when modifying a collection while iterating over it. In this example, if remove() needs only remove the first occurrence found, a safe solution is to return as soon as pop_back() has been called. In general though care must be taken or maybe the modification can deferred until after the iteration has completed.

There is some worrying code in the example when self-assignment occurs: v[i] = std::move(v.back()); when both refer to the same element. While copy-assignment should be normally be designed to be safe under self-assignment – and to leave the item unchanged – there is no such requirement for self move assignment. The following simple program demonstrates the problem:

  #include <iostream>
  #include <vector>

  int main()
  {
    std::vector<int> v;
    v.push_back(1);
    v = std::move(v);
    std::cout << "size: " << v.size() << '\n';
  }

Some compilers may print 0, others may print 1. It is also possible some might do other things... such as crash.

In this particular case it doesn’t matter, as long as self move-assignment leaves the object in a valid state and does not invoke the dreaded ‘undefined behaviour’, as the very next thing the code does is to pop this item off the back of the vector and so destroy it. However, not all classes are necessarily going to produce defined behaviour so it is important, when using std::move, to ensure that the target is safe to treat as the unique reference to the object in this context.

Finally, as a couple of people pointed out, the program demonstrates the importance of checking that tests do actually test what is expected – the remove(9) appears to check removal of the last item but doesn’t as by that point 9 is no longer the last item!

The winner of CC 97

There were four critiques which between them did a pretty good job of covering the various issues in the code.

I am aware that the problem was inadequately specified and Paul in particular explicitly raises this concern. In my professional experience, I find all too often that I am dealing with a bug (for example, as in this case, a crashing program) but the right solution is unclear as the specification of the function is incomplete. The missing ingredient with a code critique is being able to talk to the original author – but often that is a problem in a commercial environment too as the original author may have moved on.

Suggesting that the user look at standard solutions first is a good idea. Even if the complexity constraints are important, having a working simple solution can be a very useful first step in the right direction. I share Robert’s concern that writing a container class can give people expectations about its behaviour that may not be the same as those of the author of the class.

Overall I think Paul asked good questions and covered the points well so he wins the prize for this issue’s critique.

Code critique 98

(Submissions to scc@accu.org by February 1st)

I am writing a simple program to index written text but it doesn’t quite work. I want to print out every word in the document with a list of each line it appears on. I’m only getting the first occurrence listed but can’t work out why.

Please explain why they have this problem... and suggest some other possible improvements to the program. The program is in Listing 3.

#include <iostream>
#include <map>
#include <sstream>
#include <vector>

int main()
{
  using namespace std;

  map<string, vector<int>> index;

  // read and index standard input
  string line;
  int lineno{};
  while (getline(cin, line))
  {
    ++lineno;
    istringstream iss(line);
    string word;
    while (iss >> word)
    {
      auto start = 
        word.find_first_not_of(":;.,'\"?!-");
      auto end = 
        word.find_last_not_of(":;.,'\"?!-");
      if (start != end)
        word.replace(end + 1, end, "");
        word.replace(0, start, "");

      if (word.empty()) continue;

      auto iter = index.find(word);
      if (iter == index.end())
      {
        index[word].push_back(lineno);
      }
      else
      {
        auto lines = iter->second;
        if (lines.back() == lineno)
          ; // ignore dups
        else
          lines.push_back(lineno);
      }
    }
  }
  // print the index
  for (auto entry : index)
  {
    cout << entry.first << ": ";
    string delim;
    for (auto line : entry.second)
    {
      cout << delim << line;
      delim = ", ";
    }
    cout << '\n';
  }
}
			
Listing 3

You can also get the current problem from the accu-general mail list (next entry is posted around the last issue’s deadline) or from the ACCU website (http://www.accu.org/journals/). This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.

Notes: 

More fields may be available via dynamicdata ..