Journal Articles

CVu Journal Vol 29, #2 - May 2017 + Programming Topics
Browse in : All > Journals > CVu > 292 (9)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Code Critique Competition 105

Author: Martin Moene

Date: 03 May 2017 09:23:32 +01:00 or Wed, 03 May 2017 09:23:32 +01:00

Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.

Body: 

Please note that participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.

Note: If you would rather not have your critique visible online, please inform me. (Email addresses are not publicly visible.)

Last issue’s code

I was trying to write a simple template that gets the unique values from a range of values (rather like the Unix program uniq) but my simple test program throws up a problem.

    test with strings
    a a b b c c => a b c
    test with ints
    1 1 2 2 3 3 => 1 1 2 3

Why is the duplicate 1 not being removed?

The code is in Listing 1 (unique.h) and Listing 2 (unique.cpp).

#include <iterator>
#include <vector>
// get unique values in the range [one, two)
template <typename iterator>
std::vector<typename iterator::value_type>
unique(iterator one, iterator two)
{
  if (distance(one, two) < 2)
  {
    // no duplicates
    return {one, two};
  }
  // first one can't be a duplicate
  std::vector<typename iterator::value_type>
  result{1, *one};
  while (++one != two)
  {
    auto next = *one;
    bool is_unique =
     (*result.rbegin() != next);
    if (is_unique)
    {
       result.push_back(next);
    }
  }
  return result;
}
			
Listing 1
#include <iostream>
#include <string>
#include <vector>
#include "unique.h"
template <typename T>
void test(std::ostream &os,
  std::vector<T> const &vector)
{
   auto result =
     unique(vector.begin(), vector.end());
   auto out =
     std::ostream_iterator<T>(os, " ");
   copy(vector.begin(), vector.end(), out);
   os << "=> ";
   copy(result.begin(), result.end(), out);
   os << "\n";
}
int main()
{
   std::cout << "test with strings\n";
   std::vector<std::string> ptrs;
   ptrs.push_back("a");
   ptrs.push_back("a");
   ptrs.push_back("b");
   ptrs.push_back("b");
   ptrs.push_back("c");
   ptrs.push_back("c");
   test(std::cout, ptrs);
   
   std::cout << "test with ints\n";
   std::vector<int> ints;
   ints.push_back(1);
   ints.push_back(1);
   ints.push_back(2);
   ints.push_back(2);
   ints.push_back(3);
   ints.push_back(3);
   test(std::cout, ints);
}
			
Listing 2

Critique

Jon Summers

Header file unique.h initialises the result vector in this statement:

  std::vector<typename iterator::value_type>
  result{ 1, *one };

The initial values are passed in an initialiser list. Unfortunately, at least for this question, the initialiser list passed to a vector can have two meanings. The data can either be a list of values, each having the same type; or an integer that specifies the initial size, followed by data either of the same or a different type.

The data type of *one is templatised. When instantiated with a std::string, the initialiser list is

CodeListing>
  { 1, std::string ("some string") }

The compiler understands that to mean: “Create a vector having one element, whose value is ‘string’”.

When instantiated with an int, the initialiser list is

  { 1, 1 }

The compiler understands that to mean: “Create a vector having two elements, whose values are 1 and 1.”

The following loop that examines successive values for uniqueness doesn’t see the first element in the vector<int>, because its logic assumes an initial length of 1.

Paul Floyd

My first reactions on reading the code were

The compiler [clang++ on Mac OS] also isn’t too happy with this re-inventing of the wheel:

  cc104.cpp:10:6: error: call to 'unique' is 
  ambiguous
    unique(vector.begin(), vector.end());

In order for it to compile, I had to rename ‘unique’.

The behaviour of this unique is different to that of std::unique. The version presented here copies unique elements to a new vector (without any call to reserve). It then returns this vector, which may require another copy operation – ‘unique’ has more than one return path which may inhibit NRVO. std::unique modifies the container, moving unique elements towards the head. This minimizes the amount of allocation and copying, and still allows the user to make a copy if she doesn’t want the original vector to be modified. Personally I prefer std::unique as it is more in the C++ philosophy that it doesn’t make you pay for something that you don’t necessarily want.

As to which constructor is being called for result in unique, let’s look at the code:

  result{1, *one};

Unfortunately, this could have two meanings. If the type of *one is the same as 1, i.e., an int, then this can be a std::initializer_list, corresponding to the following vector constructor:

  vector(std::initializer_list<T> init, 
    const Allocator& alloc = Allocator());

If the type of *one isn’t int, then the constructor is

vector(size_type count, const T& value, const Allocator& alloc = Allocator());

The second case occurs when the vector is instantiated with std::string, consequently result contains a single value, the first value of referenced by the input iterator one. This is probably the intent of the code.

The first case occurs when the vector is instantiated with int. In this case, the first 1 isn’t considered to be the count, it is treated as a value in the init list. By coincidence it happens to be the same as the input vector values. So it isn’t the case that the first 1 isn’t being removed, rather an extra 1 is always being inserted at the head position.

To wrap up, I have a few nits. I don’t like the arguments called vector and iterator. In fact I don’t like any language keywords or standard library names to be used as variable names. This just makes the code harder to read and more difficult to talk about.

Lastly, the std::vectors in main could be more concisely initialized with init lists.

Raimondo Sarich

Firstly, I could not compile the code with “error: call to 'unique' is ambiguous”. I renamed the function to myunique, I assume the developer is using a more permissive compiler.

The bug is pretty easy to spot, this:

  std::vector<typename iterator::value_type>
  result{1, *one};

has to change to this:

  std::vector<typename iterator::value_type> 
  result(1, *one);

This problem is covered Scott Meyers’ Effective Modern C++, Item 7:

If, however, one or more constructors declare a parameter of type std::initalizer_list, calls using the braced initialization syntax strongly prefer the overloads taking std:initalizer_lists. Strongly. If there’s any way for compilers to construe a call using a braced initializer to be a constructor taking a std::initalizer_list, compilers will employ that interpretation.

He goes on to explain exactly this difference between constructing a std::vector of numeric types with curly braces versus parentheses. Beware std::vector!

There seems to be little else to remark on. The code could use braced initialization of the test vectors, a scattering of const (the vectors, two, next, and is_unique), and cbegin/cend.

std::vector::back() would be better than *std::vector::rbegin(). And of course, std::unique should be preferred outside academic purposes.

James Holland

At first glance, the result of the student’s program seems baffling. How can the software produce two different results just because on one occasion the vector contains elements of type int and on another occasion elements of type std::string? The reason for this behaviour hinges on the set of constructors std::vector possesses and the way a particular constructor is selected.

Consider what happens when the compiler comes across the statement std::vector<std::string> result{1, "a"}, for example. The compiler tries to match the parameter types to one of std::vector’s constructors. The compiler’s first choice is to select the constructor with a parameter type of std::initializer_list<T>. This is feasible as the vector’s definition used braces in the initialisation. However, the compiler cannot use this constructor as the first value in the initialisation list is not convertible to a string. The compiler has to discard the constructor and attempt to select another. Eventually the compiler comes across a constructor that takes a numerical value and an std::string. The compiler will use this constructor to create an std::vector containing, in this case, one string of value "a".

Now, consider what happens when the definition is changed to std::vector<int> result{1, 1}. As brace initialisation is used, the compiler first considers a constructor with std::initializer_list<T> as a parameter type, as before. This time the compiler does not reject the constructor as all the braced initialisation values are of type int (the type specified in the angle brackets of the definition). In this case the selected constructor creates an std::vector with two elements, both of value 1.

What we have seen is that similar looking definitions can initialise vectors with a varying number of elements. This behaviour is regrettable and probably represents a flaw in the design of std::vector. It is this problem that the student has unwittingly encountered.

The troublesome statement in the student’s code is just after the comment "// first one can't be a duplicate". Using parenthesises instead of braces will result in the appropriate constructor being called and will provide the required initialisation. The student’s program will now work as expected.

When designing code to fulfil some requirement, it is always beneficial to see if library functions can be used in place of writing code from scratch. In the student’s code, unique() contains two if statements and a while loop and is quite difficult to understand and reason about. I suggest it is possible to use functions from the standard library to perform the same function while being simpler to understand. Using such library functions will go a long way in ensuring the software performs as required. I offer the code below as a directreplacement for the student’s unique() function.

  template <typename iterator>
  std::vector<typename iterator::value_type>
  unique(iterator one, iterator two)
  {
    std::vector<typename iterator::value_type>
      result{one, two};
    const auto end = std::unique(result.begin(),
      result.end());
    result.resize(std::distance(result.begin(),
      end));
    return result;
  }

The body of the replacement function has only three statements of any significance. The first creates a vector containing the range of values, the second moves any adjacent duplicates to the end of the vector and the fourth reduces the size of the vector by discarding the duplicate values. Finally, the function returns the processed vector.

Robert Lytton

Elizabeth Barrett Browning once said of C++, “How do I construct thee? Let me count the ways”, and let us follow her lead with a class T.

  T t1; T t2(t1); T t3=t1; T t4(1,2,3);
  T t5{}; T t6{t1}; T t7={t1}; T t8{1,2,3};
  T t9={1,2,3};

As an aside, T’s implementation uses the non-rule “rule of zero”, outsourcing ownership to smart pointers, thus the compiler produces just what we need, including rvalue constructors such as:

T t2b(returnsT());

Phew!

Regarding all this construction Liz may “love thee to the depth and breadth and height” but I am not so sure. It seems less “smiles” and more “tears, of all my life”.

So “as men strive for right” can’t we cut through all of this for one true construction syntax and forget about the rest?

The brace constructor syntax arrived in C++11 and at first glance looks like a uniform initializing syntax, with benefits. As intimated above, all we need to do is replace the parentheses with braces and get a few bonus constructs for free viz:

We can also construct containers more elegantly.

For example, in the exercise main() function, the container initialisation:

  std::vector<std::string> ptrs;
  ptrs.push_back("a");
  ptrs.push_back("a");
  ptrs.push_back("b");
  ...

can use an initializer_list as the constructor argument:

  std::vector<std::string> ptrs {"a","a","b",...

Hang on, what does “initializer_list as the constructor argument” mean?

Ah, yes. A class may declare a T(std::initializer_list<U>) constructor along side its other constructors. Ay, there’s the rub - the compiler will do all it can to use the initializer_list constructor, even if it needs to convert types to make it fit and one of the other constructors is an exact fit.

For example, in the exercise unique() function, the vector result is constructed using the literal integer 1 and a value of type ‘dereferenced iterator’ inside of braces:

  std::vector<typename iterator::value_type>
    result{1, *one};

In the first test:

In the second test:

Hence the output when the test is run.

Changing from braces to parentheses force the compiler to use vector(size_type,const T&) viz:

  std::vector<typename iterator::value_type> 
    result(1, *one);

As auto with braces prefers std::initializer_list<> too (unless it’s a return or lambda type or ...) but templates don’t, it seems a uniform initializing syntax is out of reach. I have to be content with, as Liz puts it, “if God choose, I shall but construct thee better after this.”

Herman Pijl

To start with, I don’t like the name iterator as a name for a template parameter type. The template parameter type should use a naming convention, indicating which iterator category (e.g. std::input_iterator_tag) is expected by the template algorithm. Concretely, I would like to see something like

  template<typename In>

I would even suggest to use a static_assert at the beginning of the template function definition. When the static_assert fails, a descriptive string would explain that assumption about the iterator category is not met, e.g.

  {
    static_assert(std::is_base_of<
      std::input_iterator_tag,
      typename std::iterator_traits<In>::
        iterator_category>(),
      "iterator category must be (derived from)"
      " std::input_iterator_tag");
    ...
  }

I like this ‘concept’ ;-).

The second problem I see in the template declaration is the use of iterator::value_type. This assumes that the iterator is a ‘complex’ type, i.e. some struct or class. In other words, the iterator cannot be a simple (plain old Kernighan and Ritchie) pointer. I would like to call the template with ‘legacy’ code

  int intArray[] = {1, 1, 2, 3};
  auto result = unique(intArray + 0,
    intArray + sizeof(intArray)/sizeof(int));

In order to achieve this, replace

  typename iterator::value_type

by

  typename std::iterator_traits<In>::value_type

It looks like std::iterator is deprecated in C++17, so get used to the std::iterator_traits instead.

Some comments about the template definition

The implementation starts with an attempt to find out the size of the iterator range. It is clear that if the range contains 0 or 1 element, then there can be no duplicates. Unfortunately, the distance algorithm is a serious overhead. I would guess that the standard implementation of ‘distance’ will use the iterator_traits to have an almost immediate answer for random access iterators, but for the other categories, the overhead is enormous.

The intention was clearly to have a (premature) optimisation for small collections, more in particular the empty collection and the singleton collection, but trying to find the size of the iterator range by entirely traversing the range is far from optimal.

Traversing the whole iterator range becomes a disaster when the iterators are traversing an input stream. As the iterators are incremented, the input stream is effectively consumed!

  std::istringstream is("1 1 2 2 3 3");
  std::istream_iterator<int> isit(is), isend;
  auto result = unique(isit, isend);

After the call to the distance template, the while-loop will not even be entered!

More critique to come.

When the (distance < 2) condition is met, the template function returns a braced initialiser. As the initialiser doesn’t fit the std::initializer_list<T> constructor, the compiler tries to find some non-explicit constructors that fit and it finds a template constructor. Personally I (still) prefer to use the traditional constructor instead of an inialiser because I want to keep control. I only use the curly braces for default construction or when I want to the construct the object with the std::initializer_list<T> constructor.

  return std::vector<typename 
    std::iterator_traits <In>::value_type>(
      one, two);

As the vector has move semantics there is no overhead. You only have to type some more characters.

Now we arrive at the bug.

  std::vector<typename In::value_type>
  result {1, *one};

When the value type is int, the initialiser (curly braces) syntax causes the compiler to find two potential constructors

And the Standard says that the winner is: the initializer_list! Unfortunately this was not the one that we wanted. You can solve the ambiguity by using the parentheses instead of the curly braces.

  std::vector<typename In::value_type>
  result (1, *one);

When the value type is string, there is only one constructor to consider.

At first sight I liked the trick to find the previous element

  *result.rbegin()

but when looking closer to the vector template there exists a back member function.

  result.back()

could be slightly more performant.

I made another version my_unique that should do the trick. I experimented a bit with a binary predicate.

  #include <iterator>
  #include <vector>
  //extra
  #include <iostream>
  #include <string>
  #include <sstream>
  #include <type_traits>
  #include <functional>
  template<typename In> std::vector<
    typename In::value_type>
    unique(In one, In two)
  {
    // concept
    static_assert(std::is_base_of<
      std::input_iterator_tag,
      typename std::iterator_traits<In>
                  ::iterator_category>(),
      "iterator category must be (derived"
      " from) std::input_iterator_tag");
    if (std::distance(one,two) < 2)
    {
      return std::vector<
        typename In::value_type>(one, two);
        //return {one, two};
    }
    std::vector<typename In::value_type>
      result (1, *one);
    while(++one != two)
    {
      auto next = *one;
      bool is_unique =
        (*result.rbegin() != next);
      if (is_unique){
        result.push_back(next);
      }
    }
    return result;
  }

  //template<typename In, typename BinPred = 
  //std::not_equal_to<typename In::value_type>>
  template<typename In>
  std::vector<typename 
    std::iterator_traits<In>::value_type>
  my_unique(In one,
    In two,
    std::function<bool(typename 
      std::iterator_traits<In>::value_type
        const &,
      typename
       std::iterator_traits<In>::value_type
         const &)> binPred =
    std::not_equal_to<typename 
      std::iterator_traits<In>::value_type>{})
  {
    static_assert(std::is_base_of<
      std::input_iterator_tag,
      typename std::iterator_traits<In>:: 
        iterator_category>(),
      "iterator category must be (derived from)"
      " std::input_iterator_tag");
    std::vector<typename
      std::iterator_traits<In>::value_type>
    result;
    if (one != two)
    {
      result.push_back(*one);
      while (++one != two){
        if(binPred(*one ,result.back())){
          result.push_back(*one);
      }
      }
    }
    // post condition: one == two
    return result;
  }
  template<typename In>
  void test(std::ostream &os, In one, In two)
  {
    auto result = my_unique(one, two);
    auto out = std::ostream_iterator<typename 
      std::iterator_traits<In>::value_type>(
        os, " ");
    std::copy(one, two, out);
    os << "=> ";
    std::copy(result.begin(),
      result.end(), out);
    os << "\n";
  }
  template<typename T>
  void test(std::ostream &os,
    std::vector<T> const &vec)
  {
    test(os, vec.begin(), vec.end());
  }
  int main()
  {
    std::vector<std::string> strPtrs
    {"a", "a", "b", "b", "c", "c"};
    test(std::cout, strPtrs);
    std::vector<int> intPtrs{1, 1, 2, 2, 3, 3};
    test(std::cout, intPtrs);
    int intArray[] = {1, 1, 2, 3};
    std::cout << "intArray ";
    test(std::cout, intArray + 0,
      intArray + sizeof(intArray)/sizeof(int));
    // auto resultForArray = my_unique(intArray
    // + 0, intArray +
    // sizeof(intArray)/sizeof(int)
    std::istringstream is("1 1 2 2 3 3");
    std::istream_iterator<int> isit(is), isend;
    auto result = my_unique(isit, isend);
    std::copy(result.cbegin(),
      result.cend(),
      std::ostream_iterator<int>(std::cout,
        " "));
    std::cout << '\n';
    std::vector<int> intPtrsOrig
    {1, 1, 2, 2, 3, 3};
    auto resultOrig = 
      unique(intPtrsOrig.begin(), 
        intPtrsOrig.end());
    std::copy(resultOrig.cbegin(),
      resultOrig.cend(),
      std::ostream_iterator<int>(std::cout,
        " "));
    std::cout << '\n';
  }

Commentary

This is, as a couple of people noted, a fairly straightforward critique with one main point - the troublesome construction of std::vector when used with braced initialisers.

The subsidiary point, which was about preferring the standard library over hand-written code, was slightly overtaken by events for those whose compilers detected the ambiguous overload with std::unique for them. (This occurs when some of the contents of <algorithm> are included indirectly by one of the headers explicitly listed.)

My own replacement for the hand-written unique was:

  std::vector<T> result;
  std::unique_copy(one, two,
    std::back_inserter(result));
  return result;

In production code which approach one might take would depend on the precise context.

I was a little surprised that no-one commented on this line:

  auto next = *one;

as this takes an unnecessary copy of the object: I would prefer to see auto const & used here. Somehow poor use of auto seems be easy to overlook.

Additionally, I dislike the use of the names one and two - it would be clearer, I feel, to follow the naming convention for algorithms in the standard library and name them first and last.

Finally I would like to expand on Robert’s closing remark about the interaction of auto and initializer_list. The following simple program demonstrates the issue:

  #include <initializer_list>
  int main()
  {
    auto i{1};
    return i; // Is this valid?

  }

In the published wording for C++11 and C++14, i is deduced as a variable of type std::initializer_list<int> and so the return statement is invalid.

The good news is that this behaviour has been changed: the relevant paper N3922 was voted into the C++17 working paper in Urbana-Champaign in Nov 2014. With this change, i is now deduced as int. Even better, it was decided to treat the resolution as a defect for previous versions of C++.

So this example already compiles successfully with g++ 5.1, clang 3.8 and Visual Studio 2015.

The Winner of CC 104

I was pleased to see all five entrants found the problem and were all able to explain, with more or less detail, what it was and how to fix it. Rai’s reference to Scott Meyers was helpful for any who wish to read further.

Several people commented that it would be preferable to make use of the standard library; it is a good habit to get into, when you are needing a fairly standard algorithm, to start by searching the standard library to see if it has already been written!

Herman pointed out a couple of places where the existing code was insufficiently general; input iterators can cause particular problems to generic algorithms (this surfaced recently in the discussions over the changes that were necessary when adding parallel versions of various algorithms into C++17).

Overall though I think Rai picked up the largest number of incidental improvements and so I have awarded him this issue’s prize.

iterators with pointers was valid and did have the side-effect of removing the undefined behaviour. I did like his introduction of a simple helper struct NameScoreEntry to assist with reading the data in: it is very easy to create such structs in C++ and there can be significant benefits for readability with having named fields.

Overall, I think James provided the best critique, so I have awarded him the prize.

Code Critique 105

(Submissions to scc@accu.org by Jun 1st)

I was writing a simple program to analyse the programming languages known by our team. In code review I was instructed to change languages() method in the programmer class to return a const reference for performance. However, when I tried this the code broke. I’ve simplified the code as much as I can: can you help me understand what’s wrong? It compiles without warnings with both MSVC 2015 and gcc 6. I am expecting the same output from both ‘Original way’ and ‘New way’ but I get nothing printed when using the new way.

The listings are as follows:

#pragma once

class programmer
{
public:
  // Add a language
  void add_language(std::string s)
  { languages_.push_back(s); }

  // the programmer's languages - original
  std::vector<std::string>
  original() const
  {
    return {languages_.begin(),
            languages_.end()};
  }

  // new - avoid copying for performance
  std::list<std::string> const &
  languages() const
  { return languages_; }

  programmer() = default;

  programmer(programmer const& rhs)
  : languages_(rhs.languages_) {}

  programmer(programmer &&tmp)
  : languages_(std::move(tmp.languages_)) {}

  ~programmer() { languages_.clear(); }
private:
  std::list<std::string> languages_;
};
			
Listing 3
#pragma once

#include <map>

class team
{
public:
  // Add someone to the team
  void add(std::string const &name,
    programmer const & details)
  {
    team_.emplace(
      std::make_pair(name, details));
  }

  // Get a team member's details
  auto get(std::string const &name) const
  {
    auto it = team_.find(name);
    if (it == team_.end())
      throw std::runtime_error("not found");
    return it->second;
  }

private:
  std::map<std::string, programmer> team_;
};
			
Listing 4
#include <iostream>
#include <list>
#include <map>
#include <string>
#include <vector>

#include "programmer.h"
#include "team.h"

int main()
{
  team t;
  programmer p;
  p.add_language("C++");
  p.add_language("Java");
  p.add_language("C#");
  p.add_language("Cobol");
  t.add("Roger", std::move(p));

  p.add_language("javascript");
  t.add("John", std::move(p));

  std::cout << "Original way:\n";
  for (auto lang : t.get("Roger").original())
  {
    std::cout << lang << '\n';
  }

  std::cout << "\nNew way:\n";
  for (auto lang : t.get("Roger").languages())
  {
    std::cout << lang << '\n';
  }
}
			
Listing 5

You can also get the current problem from the accu-general mail list (next entry is posted around the last issue’s deadline) or from the ACCU website (http://accu.org/index.php/journal). This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.

Notes: 

More fields may be available via dynamicdata ..