Journal Articles

CVu Journal Vol 30, #1 - March 2018 + ACCU Topics + Student Code Critiques from CVu journal.
Browse in : All > Journals > CVu > 301 (10)
All > Topics (2)
All > Journal Columns > Code Critique (70)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Code Critique Competition 110

Author: Bob Schmidt

Date: 04 March 2018 00:00:00 +00:00 or Sun, 04 March 2018 00:00:00 +00:00

Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.

Body: 

Please note that participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.

Note: If you would rather not have your critique visible online, please inform me. (Email addresses are not publicly visible.)

Last issue’s code

Thanks to Jason Spencer for the random number generator suggestion leading to this issue’s critique!

I’m trying to write a very simple dice game where the computer simulates two players each throwing dice. The higher score wins and after a (selectable) number of turns the player who’s won most times wins the game. (I’m going to make the game cleverer once it’s working.) But the games always seem to be drawn and I can’t see why. Here is what the program produces:

    dice_game
    Let's play dice
    How many turns? 10
    Drawn!
    How many turns? 8
    Drawn!
    How many turns? ^D

What’s going wrong, and how might you help the programmer find the problem? As usual, there may be other suggestions you might make of some other possible things to (re-)consider about their program.

// Class to 'zip' together a pair of iterators
template <typename T>
class zipit : public std::pair<T, T>
{
  zipit &operator+=(std::pair<int,int> const &rhs)
  {
    this->first += rhs.first;
    this->second += rhs.second;
    return *this;
  }
			
Listing 1
public:
  using std::pair<T, T>::pair;
  zipit &operator+=(int n)
  {
    return *this += std::make_pair(n, n);
  }
  zipit &operator-=(int n)
  {
    return *this += std::make_pair(-n, -n);
  }
  zipit &operator++()
  {
    return *this += 1;
  }
  zipit &operator--()
  {
    return *this += -1;
  }
  auto operator*()
  {
    return std::make_pair(
      *this->first, *this->second);
  }
			
Listing 2

Critiques

James Holland

In main(), the user has created an object named generator of type randomize and then has sensibly passed it by reference to play(). Within play() the randomize object is then passed to the two generate() functions by value. Passing by value makes a copy of the object and this where problems begin. The randomize objects contain state that is used in a predetermined way to generate the pseudo random numbers. Two randomize objects that start in the same state will produce the same sequence of numbers. The two calls to generate() each make use of identical copies of randomize and so both will produce exactly the same sequence of random numbers. This is why all runs of the dice game end in a draw.

What we would like to happen is for both calls of generate() to use the same randomize object. In this way when one call to generate() has obtained a set of random numbers, the other call to generate() will continue to use the same randomize object and so will obtain a different sequence of numbers.

One way to achieve this is to pass the randomize object to generate() by reference thus ensuring there is only ever one randomize object. This can be done by employing the standard library reference wrapper ref(). This wrapper conveys references to function templates that take the parameters by value. Fortunately, generate() is such a function. The calls to generate() simply become as shown below.

  std::generate(player1.begin(), player1().end(),
    std::ref(generator);
  std::generate(player2.begin(), player2().end(),
    std::ref(generator);

While we are talking about random number generators, it is recommended not to use one of the standard library generators, such as MT19937, on its own but instead to use it with a distribution, also available from the standard library. There are problems associated with dividing the output from a generator by a constant and taking the reminder. Using a distribution will ensure a high quality series of random numbers.

Running dice_game will now probably produce the desired result. I say ‘probably’ because the student’s software contains a serious flaw. The free functions begin() and end() pass their values by value. This means their parameters are copied. The functions begin() and end() construct a zipit object from iterators that point to copies of player1 and player2 arrays. When the functions return, the copies of player1 and player2 will be destroyed. The zipit object will be returned to its caller but the iterators it contains will be invalid as they point to objects that no longer exist. It may be the case that the memory locations where the copies of player1 and player2 existed maintain their values for as long as the program requires them. It is in this way that the program appears to work correctly. The behaviour of software should not be left to chance and so this bug must be corrected. Fortunately, this is a relatively simple matter.

Instead of passing player1 and player2 to begin() and end() by value, they should be passed by reference. In this way no copies are made and the iterators returned in the zipit object will point to the original (and still existing) versions of player1 and player2. The program is now running with a firm footing!

Although the software now works correctly, it seems quite complicated for what it does. Perhaps there is another way of structuring the program that is shorter and easier to understand. The student’s software involves the creation of two vectors that contain the results of rolling the dice, one for each of the players. There is also quite an elaborate arrangement of operator functions that are required to manipulate iterators that point to the two vectors. Instead of having a pair of vectors, each element of which contains a single value, I propose having a single vector that contains a pair of dice values, one for each player. Iterating through a single vector is much simpler that trying to iterate through two vectors at the same time. There is no danger of getting out of step. I suggest my version of the software is also considerably simpler as none of the code within zipit.h is required. I have kept the interface to Play() the same as in the student’s code and so main() can remain the same. I have modified the Randomize class to take advantage of the standard library’s uniform integer distribution as shown below.

  #include <algorithm>
  #include <functional>
  #include <iostream>
  #include <random>
  #include <vector>
  class Randomize
  {
    public:
    Randomize():d(1, 6){}
    std::pair<int, int> operator()()
    {
      return {d(e), d(e)};
    }
    private:
    std::mt19937 e;
    std::uniform_int_distribution<> d;
  };
  void play(int turns, Randomize & generator)
  {
    std::vector<std::pair<int, int>>
      players(turns);
    std::generate(players.begin(),
      players.end(), std::ref(generator));
    int total{};
    for (const auto & turn : players)
    {
      const auto p1_score = turn.first;
      const auto p2_score = turn.second;
      if (p1_score != p2_score)
      {
        const auto diff = p1_score - p2_score;
        total += diff > 0 ? 1 : -1;
      }
    }
    if (total > 0)
    {
      std::cout << "Player 1 wins\n";
    }
    else if (total < 0)
    {
      std::cout << "Player 2 wins\n";
    }
    else
    {
      std::cout << "Drawn!\n";
    }
  }
  int main()
  {
    Randomize generator;
    int turns;
    std::cout << "Let's play dice\n";
    while (std::cout << "How many turns? ",
      std::cin >> turns)
    {
      play(turns, generator);
    }
  }

Randomize now simulates rolling the dice for both players within one call of operator()(). I felt that copysign() is a slightly strange function to use in this context as it converts its integer parameters to doubles before copying the sign of the second parameter to the first. The resulting double is then converted back to an integer when added to total. The same effect can be simply achieved by a conditional operator as I have used. No conversion to and from double is required.

Paul Floyd

There are two main problems with this code. The first is the cause of the draws, the second results in crashes.

For the cause of the draws, we have a misuse of the interface of std::generate:

  template< class ForwardIt, class Generator >
  void generate( ForwardIt first,
    ForwardIt last, Generator g );

(from http://en.cppreference.com/w/cpp/algorithm/generate)

So in the original code, where the call to this is

  std::generate(player1.begin(),
    player1.end(), generator);
  std::generate(player2.begin(),
    player2.end(), generator);

then the variable generator is passed to std::generate by value (the template deduces a type of randomize).

This is borne out by nm (output slightly reformatted for printing):

  nm -C cc109 | grep generate
  00000000004013d3 W void std::generate
  <__gnu_cxx::__normal_iterator<int*,
   std::vector<int, std::allocator<int>>>,
   randomize>(
    __gnu_cxx::__normal_iterator<int*,
    std::vector<int, std::allocator<int>>>,
    __gnu_cxx::__normal_iterator<int*,
    std::vector<int, std::allocator<int>>>,
    randomize)

This means that the original copy does not get updated by the first call, so the second call to std::generate gets a copy of the same object. The fix for this would be either to make the call to the template function explicit (not nice) or to just use std::ref so that the template will be initialised with a reference type rather than passing by value. Now with a reference the variable generator does have its state modified by the first call.

Here’s nm again when using std::ref:

  nm -C cc109 | grep generate
  00000000004013ca W void std::generate
  <__gnu_cxx::__normal_iterator<int*,
   std::vector<int, std::allocator<int>>>,
   std::reference_wrapper<randomize>>(
    __gnu_cxx::__normal_iterator<int*, 
    std::vector<int, std::allocator<int>>>,
    __gnu_cxx::__normal_iterator<int*, 
    std::vector<int, std::allocator<int>>>,
    std::reference_wrapper<randomize>)

At first I thought that was the end of the story. But then I ran the code through AddressSanitizer (compiled with clang++ and the -fsanitize=address option), and I saw:

Let's play dice
How many turns? 10
===================================================
==2657==ERROR: AddressSanitizer: heap-use-after-free on address 0x6040000003d0 at pc 0x0001085aeca3 bp 0x7ffee7654010 sp 0x7ffee7654008
READ of size 4 at 0x6040000003d0 thread T0
    #0 0x1085aeca2 in zipit<std::__1::__wrap_iter<int*> >::operator*() .zipit.h:32
    #1 0x1085ad4f0 in play(int, randomize&) dice_game.cpp:30
    #2 0x1085af3e6 in main dice_game.cpp:59
    #3 0x7fff5c710114 in start (libdyld.dylib:x86_64+0x1114)
0x6040000003d0 is located 0 bytes inside of 40-byte region [0x6040000003d0,0x6040000003f8)
freed by thread T0 here:
    #0 0x1086226ab in wrap__ZdlPv (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x646ab)
    #1 0x1085af8f4 in std::__1::__vector_base<int, std::__1::allocator<int> >::~__vector_base() vector:445
    #2 0x1085af4c4 in std::__1::vector<int, std::__1::allocator<int> >::~vector() iterator:1386
    #3 0x1085ae384 in std::__1::vector<int, std::__1::allocator<int> >::~vector() iterator:1386
    #4 0x1085ad0b5 in play(int, randomize&) dice_game.cpp:27
    #5 0x1085af3e6 in main dice_game.cpp:59
    #6 0x7fff5c710114 in start (libdyld.dylib:x86_64+0x1114)

previously allocated by thread T0 here:
    #0 0x1086220ab in wrap__Znwm (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x640ab)
    #1 0x1085b0279 in std::__1::vector<int, std::__1::allocator<int> >::allocate(unsigned long) vector:925
    #2 0x1085b2631 in std::__1::vector<int, std::__1::allocator<int> >::vector(std::__1::vector<int, std::__1::allocator<int> > const&) vector:1200
    #3 0x1085ae35c in std::__1::vector<int, std::__1::allocator<int> >::vector(std::__1::vector<int, std::__1::allocator<int> > const&) vector:1193
    #4 0x1085acfcd in play(int, randomize&) dice_game.cpp:27
    #5 0x1085af3e6 in main dice_game.cpp:59
    #6 0x7fff5c710114 in start (libdyld.dylib:x86_64+0x1114)

So there is an error in the dereference operator for a vector that is created and destroyed on the same line. That smells like a local variable being created and destroyed.

The problem is with the zipit begin() and end() functions.

  template <typename T>
  auto begin(T one, T two)
    -> zipit<typename T::iterator>
  …

This is instantiated here

  for (auto it = begin(player1, player2);

Where player1 and player2 are the vectors of int. So T is a vector of int, and one and two are passed by copy. The local copy gets destroyed at the end of the call, leaving the returned pair of iterators dangling.

A third opinion from nm:

  nm -C cc109 | grep begin
  00000000004015a0 W zipit<std::vector<int,
    std::allocator<int>>::iterator>
  begin<std::vector<int, std::allocator<int>>>(
    std::vector<int, std::allocator<int>>,
    std::vector<int, std::allocator<int>>)
  0000000000401b9c W std::vector<int, 
    std::allocator<int>>::begin() const
  0000000000401356 W std::vector<int, 
    std::allocator<int>>::begin()

The fix for this is to make the begin/end arguments references:

  template <typename T>
  auto begin(T& one, T& two)
    -> zipit<typename T::iterator>
  {
    return {one.begin(), two.begin()};
  }
  template <typename T>
  auto end(T& one, T& two)
    -> zipit<typename T::iterator>
  {
    return {one.end(), two.end()};
  }

OK, now for the minor nits. I get a warning about the unused cout in the while condition. This can be silenced by a cast

  while((void)(std::cout << "How many turns? "),
    std::cin >> turns)
    {
      play(turns, generator);
    }
  }

A bit ugly.

There are no include guards, and the cpp file does not include zipit.h before the other files. This means that users that include zipit.h will have the misfortune of playing ‘guess the header dependency’ (or fixing the header).

Lastly, the read of the int for the number of turns does not prevent against reading negative values. This then gets converted to size_t in the vector constructor. If the allocation succeeds then it’s going to be a very long game.

Russel Winder

First and foremost why try to invent a custom zip capability, and risk getting it wrong? Where are the tests? I am sure there is a good criticism to be made of the code in zipit.h, but I shall just circumvent all that and say: use boost::combine to undertake the zip and delete zipit.h.

The original code also has output scattered around. OK so only in two functions, but still, the design principle applies: keep all input and output in one place and have functions return values to represent the result. Doing this enhances testability, so must be a good thing. In this case play should not do output but should return a ‘who won’ result. Seems like a good place for an enumerated type. The output of the result then happens in main.

So ignoring the time it took to make those changes and get the code to actually work (because the developer failed to read the documentation correctly), we find it still delivers a draw in all cases. A bit of a WTF moment, but surely one it was intended one would have. So what is the problem? If you print out the value of the two std::vectors of rolls, you find that you always get the same sequence of values. Always. Fairly obvious then the same start state is the case for all calls to the Mersenne Twister generator. It turns out that std::generate seems to clone the generator passed so the player1 generation and the player2 generation are identical since the randomize class creates a new generator for each instance initialised exactly the same. A cure for this is to make the mt instance shared by making it static. Since this is a single threaded application this does not seem to be a problem, even if global shared state is deemed anathema by some.

With this change in place, and resulting in the program shown in Listing 1, the program seems to work as originally intended.

Of course, one has yet again to ask the question whether C++ is the best programming language to use for this application. Let us assume no. So what to choose? Well to investigate this let’s look at Python, D and Rust.

Python, see Listing 2, has a built in zip function. Also the sum function acting over an iterable makes for a much nicer expression of the core part of the algorithm - which is rolling the dice and checking for who wins. Even though Python is a dynamic language at run time, type annotations have been used so that using the mypy command, we can be sure that all the function calls are correctly statically typed. The random numbers are provided via the random package, using a Mersenne Twister algorithm, which is ‘global’ so there are no problems about generating the same sequence of random numbers. It would seem that Python makes it harder to make errors in this sort of application.

D, see Listing 3, also has a builtin zip function, so no difficulties on that front. Comparing the C++, Python, and D, one gets the sneaking suspicion that D is the best language for writing this code: less faff. Well except maybe for the random number generator. Rather than using a random number generator creating integers from the set {1, 2, 3, 4, 5, 6} with each integer result assumed to have equal probability, The std.random dice function is used to create a random number generator of the values with an explicit probability. Equal probabilities are used here, but it allows for experimentation with ‘loaded dice’. Another ‘feature’ of the code worth noting is the function call chaining approach as opposed to function application as used in C++ and Python. This is blurring the ‘function application’ and ‘method call on an object’ that is quite rigid in C++ and Python. It is an aspect of trying to be as declarative as possible in what is an imperative language.

Of course Rust, see Listing 4, is reputedly the language of the moment, set to replace C, and perhaps C++, as the language of first choice for new projects. Immediately obvious is the very different way enumerations work. C++, Python, and D have very similar ways of defining and working with enumerations, Rust takes a very different approach. In particular an enumeration value can carry data of undefined value, just defined type. Rust, like D, emphasises being declarative, hence the creation of pipelines of data manipulation as seen in the D code. Of course Rust tries to go further and mandates the use of monads to create what is arguably a functional programming language (disguising the imperative language). This is seen most clearly in main. The code seems to a bit bigger, but there an appeal that error handling is enforced but in a quite pleasing way; especially highlighting the nice use of enumerations.. After the interesting dice emulation of D, in this Rust code returns to the generate an integer in a range assuming equal probabilities. The random number generator is not global, but it is shared, so no possibility of generating the same sequence for each player.

So is C++ the best language for this application? Probably not, but for something like this using the language you are most ‘at home’ with is likely the best choice. Unless you are challenging yourself as a programmer by using a programming language you are not entirely familiar with. Personally I quite favour the D code.

Why is Go not in this list of languages likely to be better than C++? It should be, but I ran out of time to complete that version of the code. Bad project management I admit.

Nota bene: the code presented here does not represent a reasonable formatting of the code, and indeed violates many de facto and de jure code formatting guidelines. However, the journal formatting structure requires badly formatted code to be presented. :-(

PS There is clearly a difference in the languages behaviour here since the different languages produce different frequencies of draws. Hummm…

[Editor: since the programs all model the same die throwing game then statistically significant differences imply at least one is buggy.]

Listing 1

#include <algorithm>
#include <chrono>
#include <iostream>
#include <numeric>
#include <random>
#include <boost/range/combine.hpp>
#include <boost/tuple/tuple.hpp>
class randomize {
  private:
    static std::mt19937 mt;
  public:
    int operator()() { return mt() % 6 + 1; }
};
std::mt19937 randomize::mt {
  (unsigned long int)std::chrono::system_clock::now()
  .time_since_epoch()
  .count()
};

enum class result {
  draw = 0,
  player1_win = 1,
  player2_win = 2,
};
typedef boost::tuple<int, int> int_pair;
result play(int const turns, randomize & generator) {
  std::vector<int> player1(turns);
  std::vector<int> player2(turns);
  std::generate(player1.begin(), player1.end(), generator);
  std::generate(player2.begin(), player2.end(), generator);
  auto pairs = boost::combine(player1, player2);
  int_pair const t_t = std::accumulate(
    pairs.begin(),
    pairs.end(),
    int_pair{0, 0},
    [](int_pair t, int_pair i) {
      int a, b;
      boost::tie(a, b) = i;
      return int_pair{
      boost::get<0>(t)
        + (a > b ? 1 : (a < b ? -1 : 0)),
                    0};
            });
    int const total = boost::get<0>(t_t);
    return total == 0
      ? result::draw
      : (total > 0 ? result::player1_win 
      : result::player2_win);
}
int main() {
    randomize generator;
    int turns;
    std::cout << "Let's play dice" << std::endl;
    while (
            std::cout << "How many turns? ",
            std::cin >> turns) {
        auto const result = play(turns, generator);
        if (result == result::draw) {
            std::cout << "Drawn!" << std::endl;
        } else {
            std::cout
              << "Player "
              << (int)result
              << " wins"
              << std::endl;
        }
    }
}

Listing 2

#!/usr/bin/env python3
from enum import Enum
import random
from typing import Tuple
class Result(Enum):
    draw = 0
    player1_win = 1
    player2_win = 2
def player1_win(p: Tuple[int, int]) -> int:
    if p[0] > p[1]:
        return 1
    elif p[0] < p[1]:
        return -1
    return 0
def play(turns: int, gen) -> Result:
    player1_wins = sum(
        player1_win(p) for p in zip(
            (gen() for _ in range(turns)),
            (gen() for _ in range(turns))
        ))
    if player1_wins > 0:
        return Result.player1_win
    elif player1_wins < 0:
        return Result.player2_win
    return Result.draw
def main() -> None:
    print("Let's play dice")
    random.seed()
    while True:
        try:
            turns = int(input('How many turns? '))
        except ValueError:
            return
        result = play(
            turns,
            lambda: random.randint(1, 6)
        )
        if result == Result.draw:
            print('Drawn!')
        else:
            print('Player {} wins.'.format(result.value))
if __name__ == '__main__':
    try:
        main()
    except (EOFError, KeyboardInterrupt):
        print()

Listing 3

#!/usr/bin/env rdmd
import std.algorithm: map, sum;
import std.array: array;
import std.conv: to;
import std.random: dice;
import std.range: iota, zip;
import std.stdio: readln, write, writefln, writeln;
import std.string: strip;
enum Result {
    draw = 0,
    player1_win = 1,
    player2_win = 2,
}
Result play(uint turns, uint delegate(uint) generator) {
    auto player1_wins =
      zip(
          iota(turns).map!generator(),
          iota(turns).map!generator())
      .map!"a[0] > a[1] ? 1 : (a[0] < a[1] ? -1 : 0)"
      .sum;
    return player1_wins == 0
      ? Result.draw
      :  player1_wins > 0
         ? Result.player1_win
         : Result.player2_win;
}
void main() {
    writeln("Let's play dice");
    try {
        while (true) {
            write("How many turns? ");
            uint turns = to!uint(readln().strip());
            auto result = play(turns, delegate uint(uint) {
                    return to!uint(
                            dice(0.0,
                                 16.67,
                                 16.67,
                                 16.67,
                                 16.67,
                                 16.67,
                                 16.67));
                });
            if (result == Result.draw) {
                writeln("Drawn!");
            } else {
                writefln("Player %d wins.", result);
            }
        }
    } catch (Exception) {
        writeln();
    }
}

Listing 4

extern crate itertools;
extern crate rand;
use std::io::{self, Write};
use itertools::Itertools;
use rand::distributions::{Range, Sample};
enum Result {
    PlayerWon(u8),
    Draw
}
fn play(
    turns: u32,
    dice_roll: &mut FnMut()->u8) -> Result
{
    let player1_wins: i32 =
        itertools::repeat_call(dice_roll)
        .tuples::<(_, _)>()
        .map(|p|
             if p.0 > p.1 { 1 }
             else if p.0 < p.1 { -1 }
             else { 0 }
        )
        .take(turns as usize)
        .sum();
    if player1_wins != 0 {
        Result::PlayerWon(
            if player1_wins > 0 { 1 }
            else { 2 }
        )
    }
    else { Result::Draw }
}
fn main() {
    println!("Let's play dice");
    let mut dice = Range::new(1u8, 7u8);
    let mut rng = rand::thread_rng();
    let mut buffer = String::new();
    loop {
        print!("How many turns? ");
        io::stdout()
            .flush()
            .expect("Could not flush stdout.");
        buffer.clear();
        match io::stdin().read_line(&mut buffer) {
            Ok(_) => match buffer.trim().parse::<u32>() {
                Ok(turns) => match play(
                    turns,
                    &mut ||{ dice.sample(&mut rng) }
                ) {
                    Result::PlayerWon(p) =>
                        println!("Player {} wins.", p),
                    Result::Draw =>
                        println!("Drawn!"),
                },
                Err(_) => break
            },
            Err(_) => break
        }
    }
}

Jason Spencer

The reason for all games ending in a draw is that the random number generator actually creates a deterministic sequence of numbers, and since the calls to std::generate copy the generator object, that is to say its internal state, the die throws generated for both players are identical. There is, however, another bug with invalidated iterators that potentially corrupts the result calculation, and may cause a crash or non-draw result.

The issues with the code are as follows:

zipit.h

    template <typename T>
    auto begin(T & one, T & two) -> 
    zipit<decltype(std::begin(std::declval<T&>()))>
    {
      return { std::begin(one), std::begin(two) };
    }

dice_game.cpp

So that should get the right result with the existing program design, but I’d make some design changes also.

Points for discussion

Random number generation

Random number generators fall into two categories – deterministic and non-deterministic. Since CPUs are deterministic state machines they cannot, without special hardware or external input, produce true entropy. ‘Random’ numbers are therefore typically produced by pseudo-random algorithms that are actually deterministic. If you know the internal state and the algorithm, you can predict the next number exactly. This is how the rand() library call works (that’s typically a linear congruential generator [4]), as well as mt19937 which is available since C++11. To change the sequence produced every time the program is run it is typical (although perhaps not always correct) to bootstrap the generator from the current time. Since C++11 std::random_device can be used to seed the deterministic generator. std::random_device takes entropy from an implementation dependant source (for example RDRAND on Intel/AMD, or /dev/urandom under linux – both use external events such as interrupts to accumulate entropy) to try and generate non-deterministic random numbers, but the non-determinism is not guaranteed by std::random_device (26.5.6.2 in [5]).

Creating custom iterators

We’re not so much creating a custom iterator here, as we are an iterator adaptor. And we also cannot assume the underlying iterator type – it may be a forward input, forward output, bidirectional, or random iterator [6]. And then there are the const and reverse variants. But I think the intention of zipit is still to behave like an STL iterator. One thing is for sure though, writing iterators isn’t easy. There’s a fair amount of responsibility that goes into writing one.

If we try to model an input iterator then according to [6] zipit is still missing a post-increment operator and an operator->. The post-increment operator isn’t that difficult and can be expressed in terms of the pre-increment operator:

  zipit operator++(int)
  {
    auto temp = *this;
    ++*this;
    return temp;
  }

operator-> makes my head hurt, however. It should allow iter->member ... but how do we do that while managing two iterators?

  zipit<Iter> const * operator->() const
  {
    return this;
  }

works, but then usage is ugly:

  std::vector<std::string> sv1 = {"a","b","c"};
  std::vector<std::string> sv2 = {"c","b","a"};
  auto sv_zip = begin(sv1,sv2);
  std::cout << sv_zip->first->length() << '\n';

and possibly incorrect. Should usage be sv_zip->first.length()? Or sv_zip.first->length()? The latter won’t call operator-> and is available for free as long as the inheritance from std::pair is public.

Alas it’s hard to know what to return from a zip iterator when pointers or references are involved.

If we want to take it further, and model an output iterator then we must make the iterator dereferenceable as an lvalue (i.e. operator* returns a type convertible to T &). This leads to a similar problem.

Bear in mind also that the iterator we’re storing is not necessarily a pointer, it’s something that behaves like a pointer, but may not be a raw pointer. Case in point – vector<bool>::iterator is a proxy class that accesses the underlying storage through something sufficiently advanced (magic?). Dereferencing it returns a vector<bool>::reference.

Using this for inspiration we can make an attempt at operator* for output iterators:

  // references cannot be stored in std::pair, 
  // so we create a wrapper
  template <typename T> class ref_wrap {
    private:
      T & p;
    public:
      explicit ref_wrap (T & t) : p{t} {}
      operator T & () const noexcept {
        return p; }
      T & operator= (const T & t) const {
        p = t; return p; }
  };
  // convert the iterator type to a reference 
  // type
  template <typename T> struct
    iter_reference_type { typedef typename
    std::iterator_traits<T>::reference type; };
  template <typename T> struct
    iter_reference_type < T * > {
      typedef T & type; };
  // optionally wrap the reference in our 
  // wrapper
  template <typename T> struct
    wrapped_reference_type { typedef T type; };
  template <typename T> struct
    wrapped_reference_type < T & > { typedef 
      ref_wrap< T > type; };
  template <typename Iter> struct
    wrapped_iter_reference_type {
    typedef typename wrapped_reference_type< \
      typename iter_reference_type<Iter>::type \
      >::type type; };
  // a second template parameter is added to 
  // zipit in case the reference deduction 
  // doesn't work for some custom iterator
  template <typename Iter,
    typename wrapped_reference = typename 
  wrapped_iter_reference_type<Iter>::type >
  class zipit : public std::pair<Iter, Iter>
  {
  ...
  public:
    typedef wrapped_reference reference;
    std::pair<reference, reference> operator*()
      const {
      return std::make_pair(
        reference(*this->first), 
        reference(*this->second) );
    }
  ...
  };

Here we have a simple class to wrap a reference, which allows us to store it in an std::pair. std::reference_wrapper cannot be used since it is implicitly convertible from T and any assignment then simply changes the reference, not the referred to value.

The SFINAE is required to extract the true value reference type. T * converts to T &, std::vector<T>::iterator converts to either the proxy object, or if it is a raw pointer to T &. Raw references then get wrapped and packed in a pair. Proxy objects don’t get wrapped, they just get put into a pair.

The same wrapper could be used to implement operator-> but since we have to return a pointer to the std::pair<reference, reference> we might have to allocate the pair object on the heap and return as a unique_ptr. And that’s really ugly. I want no part of it.

…and of course plenty more features are needed if the iterator is to model one of the other iterator types.

…and don’t forget to specialise std::iterator_traits (deriving from std::iterator was deprecated in C++17).

…and since C++11 the iterators must be swappable through std::swap… so specialise that too.

All in all, implementing iterators can lead to bouts of anxiety and heavy drinking. So drop zipit from the code – just use Boost.ZipIterator and accept its semantics, or std::inner_product without a zip iterator, or std::vector<std::pair<int,int>>, or don’t store the dice throws.

References

[1] http://open-std.org/JTC1/SC22/WG21/docs/papers/2017/p0515r0.pdf

[2] https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2017

[3] http://libcxx.llvm.org/cxx1z_status.html

[4] https://en.wikipedia.org/wiki/Linear_congruential_generator

[5] The C++11 Programming Language standard – ISO/IEC 14882:2011(E)

[6] http://www.cplusplus.com/reference/iterator/

Balog Pal

This entry starts with a really good teaser. Dice game that is always drawn? In real life I would ask “did you check your RNG sequence? I’d bet you either keep generating the same number or the exact same sequence for the two players.” It’s worth a look just to prove my gut right or wrong.

Looking at the code I see some iterator magic that I save for later. Then more magic with the new <random> that I’m not up to date with. However, in the past I read a lot of problems related to RNG. It reminds me an old story from around the millennium. I had to write an application that used winsock sockets for communication. When it was ‘ready’ I went to the forums to ask some peers’ opinion. And the people, instead of commenting on the code just pointed me to fetch the ‘winsock lame list’ and serve myself. As anyone would guess I found a good amount of matches… And random looks like a similar can of worms, maybe someone compiled a similar list? Let’s try a web search for ‘lame list for rng’. Wow, guess what, first hit is titled ‘The C++ <random> Lame List – Elbeno’ [1]. It even refers to the ws lame list as the muse. Unfortunately, it lacks the explanations but even just the points can help a lot.

Matching the code against the list I see hits on 6, 7, 11, 14 and we get half-dozen good hints on what to avoid as we try to fix. And especially 14 looks like a killer that may match my first clue. But let’s stash that too, and put the code in a compiler so we can instrument it with some output at interesting points. And while at it, try some static analysis.

In Visual Studio 2017 (15.5.5) default W3 compile gives one warning on total += – conversion double to int. Maybe Wall tells more. Huh, indeed. 900 warnings! Too bad all but 4 come from the standard headers. I wonder if the VS team ever tried the dogfood method that others use to create actually usable stuff. The 3 warnings on our code are int signed/unsigned mismatch that are not very interesting till we not press the bounds. But the int turns that comes directly from user input is just used to size the vector… we’ll try to play with -1 later. <evil grin>

Now let’s see analyze. We’re warned that begin() and end() can be declared noexcept. Also ‘The reference argument generator for function play can be marked as const”. That in general we treat by adding the const, but for this case it is an indication of a problem: we expect the generator to mutate, if it could be const, we messed up. Must be that missing ref() from point 14.

Lastly, a warning “Do not dereference a invalid pointer (lifetimes rule 1). return of += was invalidated by end of scope for allotemp.2” that honestly beats me, especially as op -= has identical code without warning.

Clang invoked through the ‘clang power tools’ extension also picked that ref should be const [see the clang-tidy check google-runtime-references] and said that in call play() the first arg is not initialized, what is not really true for our case, if we got there, cin reported okay, so it did complete >> properly.

Let’s just run it as is to see those draws. No luck here, we hit an assert: “vector iterator not dereferencable” in operator*, debugger shows this->first being null. Wow. While at it, I take a peek into the sequences sitting in player 1: 3 1 3 6, and 2: same. Was I right or was I right? But I really want to see the output, maybe in release build that assert will not stand in the way? Indeed, now the program executes, and I get … ‘Player 1 wins’ on all my tries. Let’s hope we didn’t also launch some nukes on another continent.

Originally, I did not want to bother with the code before class randomize at all, but must push back the related rant even further, to at least get UB out of the way. We don’t have too much code before the assert, just the begin() so let’s look at that: guess what, it takes out vectors by value, gets the iterator, stores it, then the copies are destroyed for good leaving us with the invalid iterators. Same story with end(). Let’s add the missing &s and see what happens. Yey, no more assert, and we’re finally drawn for all attempts. And if we use std::ref(generator) in the two generate() calls, then we get various results too.

So, once we have the program working as desired, what’s left besides to throw it ALL away and create properly? From scratch, as we learnt from the food industry, that if you already mixed the bad things with the good, it is hopeless to separate them later, instead we only add what is good. It does not stop us from reusing what we like from the original source be it idea or a code fragment.

For start we paste the original main(). It looks okay-ish for the original aim. We just need a bounds check on turns before passing it on. Call to play() is fine, just make sure generator is passed by ref. And we will need the random source. Having it on the stack will trigger #11. We might ignore that reasoning it is a single case and we are positive to have enough stack space. Or use some alternative, like making it static or have a dynamic instance with make_unique. This program does not want threads, so we can ignore the related points, but commit them to memory.

Now let’s make a proper random source. While at it, let’s also rename it accordingly, random_source is way better description than randomize. The list approves mt19937 as a good start, I have to look up how to use it. Along with other hints that we’ll need random_device and uniform_int_distribution. The latter on cppreference.com has the code for our case. All together I came up with this:

  class random_source
  {
    static unsigned int get_seed()
    {
      try {
        std::random_device rd;
        return rd();
      }
      catch (std::exception const&)
      {
        return static_cast<unsigned int> (time(
          nullptr));
      }
    }
    std::mt19937 mt{get_seed()};
    std::uniform_int_distribution<> dis{1, 6};
  public:
    int operator()() {
      return static_cast<int>(dis(mt)); }
  };

The description of random_device is pretty vague, most of it is left to the implementation, but it is intended to do the best to summon some entropy. If it can’t and we get the exception the OP should decide what to do, I just dumped in an alternative source that is less lame as the 2nd violin. Time returns unspecified type, but it is numeric and for unsigned anything goes. The other cast in the op() is benign as the number is in the known range. I was thinking to change the return type instead to unsigned, but for the use plain int is the natural thing. Or is it? Maybe byte is more fitting. That can be rearranged later. Moving on to play().

The aim was to do play N rounds, in each make both player throw dice, decide the round outcome and accumulate the result. One way to do that is

  int turn(random_source &generator)
  // returns score for player 1: -1 0 or +1
  {
  // return sign( generator() - generator() );  
  // too bad no stock sign function....
    auto const d1 = generator();
    auto const d2 = generator();
    return d1 < d2 ? -1 : d1 > d2;
  }
 int game(int turns, random_source &generator) 
  // the difference between player 1 and 2's wins
  {
    int total = 0;
    for (int i = 0; i < turns; ++i)
      total += turn(generator);
    return total;
  }

And in the original play() function we just

  auto const total = game(turns, generator);

and print the result. And we are done. For good. No kidding. Fun thing that it now even works with the original main without bound check and prints draw for -1 instead of crash or assert.

Someone may object that we were supposed to criticize or fix zipit… Were we actually? It breaks one of my fundamental rule for reviews (and dev process): application programmers are forbidden to write blacklisted stuff, that starts with known timewasters like ‘logger class’, ‘string class’ ‘refcounting pointer’, and also contains ‘collection’ and ‘iterator’. For the collection I can be convinced in theory if someone has a problem that requires a really new type of collection not covered by either standard or the many libraries we work with. And for iterator if one opens with “I have this code full of algorithm use, I needed an iterator for that special collection to drive them.” Not the case here. Even for that case I’d ask whether it is really needed right now or we may wait a little till ranges finally arrive.

If the more advanced version of the program have actual need to keep record of turns, it can be simply done in a pair<int, int> per round, collection of those and running whatever processing with tools that work (WELL!) out of the box.

And those who really need to implement their own, can start with web search for ‘how to write c++ iterator’, the first hit [2] is more than promising.

An extra stab at header separation: this simple code does not need a header at all, now, or probably ever. But if some parts are extracted, the header shall be self-containing, have all needed #include lines, not rely on the client.

[Web Editor's Note: Article is continued here.]

Notes: 

More fields may be available via dynamicdata ..