Journal Articles
Browse in : |
All
> Journals
> CVu
> 301
(10)
All > Topics (2) All > Journal Columns > Code Critique (70) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Code Critique Competition 110
Author: Bob Schmidt
Date: 04 March 2018 00:00:00 +00:00 or Sun, 04 March 2018 00:00:00 +00:00
Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.
Body:
Please note that participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.
Note: If you would rather not have your critique visible online, please inform me. (Email addresses are not publicly visible.)
Last issue’s code
Thanks to Jason Spencer for the random number generator suggestion leading to this issue’s critique!
I’m trying to write a very simple dice game where the computer simulates two players each throwing dice. The higher score wins and after a (selectable) number of turns the player who’s won most times wins the game. (I’m going to make the game cleverer once it’s working.) But the games always seem to be drawn and I can’t see why. Here is what the program produces:
dice_game Let's play dice How many turns? 10 Drawn! How many turns? 8 Drawn! How many turns? ^D
What’s going wrong, and how might you help the programmer find the problem? As usual, there may be other suggestions you might make of some other possible things to (re-)consider about their program.
- Listing 1 contains zipit.h
- Listing 2 contains dice game.cpp
// Class to 'zip' together a pair of iterators template <typename T> class zipit : public std::pair<T, T> { zipit &operator+=(std::pair<int,int> const &rhs) { this->first += rhs.first; this->second += rhs.second; return *this; } |
Listing 1 |
public: using std::pair<T, T>::pair; zipit &operator+=(int n) { return *this += std::make_pair(n, n); } zipit &operator-=(int n) { return *this += std::make_pair(-n, -n); } zipit &operator++() { return *this += 1; } zipit &operator--() { return *this += -1; } auto operator*() { return std::make_pair( *this->first, *this->second); } |
Listing 2 |
Critiques
James Holland
In main()
, the user has created an object named generator
of type randomize
and then has sensibly passed it by reference to play()
. Within play()
the randomize
object is then passed to the two generate()
functions by value. Passing by value makes a copy of the object and this where problems begin. The randomize
objects contain state that is used in a predetermined way to generate the pseudo random numbers. Two randomize
objects that start in the same state will produce the same sequence of numbers. The two calls to generate()
each make use of identical copies of randomize
and so both will produce exactly the same sequence of random numbers. This is why all runs of the dice game end in a draw.
What we would like to happen is for both calls of generate()
to use the same randomize
object. In this way when one call to generate()
has obtained a set of random numbers, the other call to generate()
will continue to use the same randomize
object and so will obtain a different sequence of numbers.
One way to achieve this is to pass the randomize
object to generate()
by reference thus ensuring there is only ever one randomize
object. This can be done by employing the standard library reference wrapper ref()
. This wrapper conveys references to function templates that take the parameters by value. Fortunately, generate()
is such a function. The calls to generate()
simply become as shown below.
std::generate(player1.begin(), player1().end(), std::ref(generator); std::generate(player2.begin(), player2().end(), std::ref(generator);
While we are talking about random number generators, it is recommended not to use one of the standard library generators, such as MT19937
, on its own but instead to use it with a distribution, also available from the standard library. There are problems associated with dividing the output from a generator by a constant and taking the reminder. Using a distribution will ensure a high quality series of random numbers.
Running dice_game
will now probably produce the desired result. I say ‘probably’ because the student’s software contains a serious flaw. The free functions begin()
and end()
pass their values by value. This means their parameters are copied. The functions begin()
and end()
construct a zipit
object from iterators that point to copies of player1
and player2
arrays. When the functions return, the copies of player1
and player2
will be destroyed. The zipit
object will be returned to its caller but the iterators it contains will be invalid as they point to objects that no longer exist. It may be the case that the memory locations where the copies of player1
and player2
existed maintain their values for as long as the program requires them. It is in this way that the program appears to work correctly. The behaviour of software should not be left to chance and so this bug must be corrected. Fortunately, this is a relatively simple matter.
Instead of passing player1
and player2
to begin()
and end()
by value, they should be passed by reference. In this way no copies are made and the iterators returned in the zipit
object will point to the original (and still existing) versions of player1
and player2
. The program is now running with a firm footing!
Although the software now works correctly, it seems quite complicated for what it does. Perhaps there is another way of structuring the program that is shorter and easier to understand. The student’s software involves the creation of two vectors that contain the results of rolling the dice, one for each of the players. There is also quite an elaborate arrangement of operator functions that are required to manipulate iterators that point to the two vectors. Instead of having a pair of vectors, each element of which contains a single value, I propose having a single vector that contains a pair of dice values, one for each player. Iterating through a single vector is much simpler that trying to iterate through two vectors at the same time. There is no danger of getting out of step. I suggest my version of the software is also considerably simpler as none of the code within zipit.h is required. I have kept the interface to Play()
the same as in the student’s code and so main()
can remain the same. I have modified the Randomize
class to take advantage of the standard library’s uniform integer distribution as shown below.
#include <algorithm> #include <functional> #include <iostream> #include <random> #include <vector> class Randomize { public: Randomize():d(1, 6){} std::pair<int, int> operator()() { return {d(e), d(e)}; } private: std::mt19937 e; std::uniform_int_distribution<> d; }; void play(int turns, Randomize & generator) { std::vector<std::pair<int, int>> players(turns); std::generate(players.begin(), players.end(), std::ref(generator)); int total{}; for (const auto & turn : players) { const auto p1_score = turn.first; const auto p2_score = turn.second; if (p1_score != p2_score) { const auto diff = p1_score - p2_score; total += diff > 0 ? 1 : -1; } } if (total > 0) { std::cout << "Player 1 wins\n"; } else if (total < 0) { std::cout << "Player 2 wins\n"; } else { std::cout << "Drawn!\n"; } } int main() { Randomize generator; int turns; std::cout << "Let's play dice\n"; while (std::cout << "How many turns? ", std::cin >> turns) { play(turns, generator); } }
Randomize
now simulates rolling the dice for both players within one call of operator()()
. I felt that copysign()
is a slightly strange function to use in this context as it converts its integer parameters to double
s before copying the sign of the second parameter to the first. The resulting double is then converted back to an integer when added to total
. The same effect can be simply achieved by a conditional operator as I have used. No conversion to and from double
is required.
Paul Floyd
There are two main problems with this code. The first is the cause of the draws, the second results in crashes.
For the cause of the draws, we have a misuse of the interface of std::generate
:
template< class ForwardIt, class Generator > void generate( ForwardIt first, ForwardIt last, Generator g );
(from http://en.cppreference.com/w/cpp/algorithm/generate)
So in the original code, where the call to this is
std::generate(player1.begin(), player1.end(), generator); std::generate(player2.begin(), player2.end(), generator);
then the variable generator
is passed to std::generate
by value (the template deduces a type of randomize
).
This is borne out by nm
(output slightly reformatted for printing):
nm -C cc109 | grep generate 00000000004013d3 W void std::generate <__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int>>>, randomize>( __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int>>>, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int>>>, randomize)
This means that the original copy does not get updated by the first call, so the second call to std::generate
gets a copy of the same object. The fix for this would be either to make the call to the template function explicit (not nice) or to just use std::ref
so that the template will be initialised with a reference type rather than passing by value. Now with a reference the variable generator
does have its state modified by the first call.
Here’s nm
again when using std::ref
:
nm -C cc109 | grep generate 00000000004013ca W void std::generate <__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int>>>, std::reference_wrapper<randomize>>( __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int>>>, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int>>>, std::reference_wrapper<randomize>)
At first I thought that was the end of the story. But then I ran the code through AddressSanitizer (compiled with clang++ and the -fsanitize=address
option), and I saw:
Let's play dice How many turns? 10 =================================================== ==2657==ERROR: AddressSanitizer: heap-use-after-free on address 0x6040000003d0 at pc 0x0001085aeca3 bp 0x7ffee7654010 sp 0x7ffee7654008 READ of size 4 at 0x6040000003d0 thread T0 #0 0x1085aeca2 in zipit<std::__1::__wrap_iter<int*> >::operator*() .zipit.h:32 #1 0x1085ad4f0 in play(int, randomize&) dice_game.cpp:30 #2 0x1085af3e6 in main dice_game.cpp:59 #3 0x7fff5c710114 in start (libdyld.dylib:x86_64+0x1114) 0x6040000003d0 is located 0 bytes inside of 40-byte region [0x6040000003d0,0x6040000003f8) freed by thread T0 here: #0 0x1086226ab in wrap__ZdlPv (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x646ab) #1 0x1085af8f4 in std::__1::__vector_base<int, std::__1::allocator<int> >::~__vector_base() vector:445 #2 0x1085af4c4 in std::__1::vector<int, std::__1::allocator<int> >::~vector() iterator:1386 #3 0x1085ae384 in std::__1::vector<int, std::__1::allocator<int> >::~vector() iterator:1386 #4 0x1085ad0b5 in play(int, randomize&) dice_game.cpp:27 #5 0x1085af3e6 in main dice_game.cpp:59 #6 0x7fff5c710114 in start (libdyld.dylib:x86_64+0x1114) previously allocated by thread T0 here: #0 0x1086220ab in wrap__Znwm (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x640ab) #1 0x1085b0279 in std::__1::vector<int, std::__1::allocator<int> >::allocate(unsigned long) vector:925 #2 0x1085b2631 in std::__1::vector<int, std::__1::allocator<int> >::vector(std::__1::vector<int, std::__1::allocator<int> > const&) vector:1200 #3 0x1085ae35c in std::__1::vector<int, std::__1::allocator<int> >::vector(std::__1::vector<int, std::__1::allocator<int> > const&) vector:1193 #4 0x1085acfcd in play(int, randomize&) dice_game.cpp:27 #5 0x1085af3e6 in main dice_game.cpp:59 #6 0x7fff5c710114 in start (libdyld.dylib:x86_64+0x1114)
So there is an error in the dereference operator for a vector that is created and destroyed on the same line. That smells like a local variable being created and destroyed.
The problem is with the zipit begin()
and end()
functions.
template <typename T> auto begin(T one, T two) -> zipit<typename T::iterator> …
This is instantiated here
for (auto it = begin(player1, player2);
Where player1
and player2
are the vectors of int
. So T
is a vector of int
, and one
and two
are passed by copy. The local copy gets destroyed at the end of the call, leaving the returned pair of iterators dangling.
A third opinion from nm
:
nm -C cc109 | grep begin 00000000004015a0 W zipit<std::vector<int, std::allocator<int>>::iterator> begin<std::vector<int, std::allocator<int>>>( std::vector<int, std::allocator<int>>, std::vector<int, std::allocator<int>>) 0000000000401b9c W std::vector<int, std::allocator<int>>::begin() const 0000000000401356 W std::vector<int, std::allocator<int>>::begin()
The fix for this is to make the begin
/end
arguments references:
template <typename T> auto begin(T& one, T& two) -> zipit<typename T::iterator> { return {one.begin(), two.begin()}; } template <typename T> auto end(T& one, T& two) -> zipit<typename T::iterator> { return {one.end(), two.end()}; }
OK, now for the minor nits. I get a warning about the unused cout
in the while
condition. This can be silenced by a cast
while((void)(std::cout << "How many turns? "), std::cin >> turns) { play(turns, generator); } }
A bit ugly.
There are no include guards, and the cpp file does not include zipit.h before the other files. This means that users that include zipit.h will have the misfortune of playing ‘guess the header dependency’ (or fixing the header).
Lastly, the read of the int
for the number of turns does not prevent against reading negative values. This then gets converted to size_t
in the vector constructor. If the allocation succeeds then it’s going to be a very long game.
Russel Winder
First and foremost why try to invent a custom zip capability, and risk getting it wrong? Where are the tests? I am sure there is a good criticism to be made of the code in zipit.h, but I shall just circumvent all that and say: use boost::combine
to undertake the zip and delete zipit.h.
The original code also has output scattered around. OK so only in two functions, but still, the design principle applies: keep all input and output in one place and have functions return values to represent the result. Doing this enhances testability, so must be a good thing. In this case play
should not do output but should return a ‘who won’ result. Seems like a good place for an enumerated type. The output of the result then happens in main
.
So ignoring the time it took to make those changes and get the code to actually work (because the developer failed to read the documentation correctly), we find it still delivers a draw in all cases. A bit of a WTF moment, but surely one it was intended one would have. So what is the problem? If you print out the value of the two std::vector
s of rolls, you find that you always get the same sequence of values. Always. Fairly obvious then the same start state is the case for all calls to the Mersenne Twister generator. It turns out that std::generate
seems to clone the generator passed so the player1 generation and the player2 generation are identical since the randomize
class creates a new generator for each instance initialised exactly the same. A cure for this is to make the mt
instance shared by making it static. Since this is a single threaded application this does not seem to be a problem, even if global shared state is deemed anathema by some.
With this change in place, and resulting in the program shown in Listing 1, the program seems to work as originally intended.
Of course, one has yet again to ask the question whether C++ is the best programming language to use for this application. Let us assume no. So what to choose? Well to investigate this let’s look at Python, D and Rust.
Python, see Listing 2, has a built in zip
function. Also the sum
function acting over an iterable makes for a much nicer expression of the core part of the algorithm - which is rolling the dice and checking for who wins. Even though Python is a dynamic language at run time, type annotations have been used so that using the mypy command, we can be sure that all the function calls are correctly statically typed. The random numbers are provided via the random
package, using a Mersenne Twister algorithm, which is ‘global’ so there are no problems about generating the same sequence of random numbers. It would seem that Python makes it harder to make errors in this sort of application.
D, see Listing 3, also has a builtin zip
function, so no difficulties on that front. Comparing the C++, Python, and D, one gets the sneaking suspicion that D is the best language for writing this code: less faff. Well except maybe for the random number generator. Rather than using a random number generator creating integers from the set {1, 2, 3, 4, 5, 6} with each integer result assumed to have equal probability, The std.random
dice
function is used to create a random number generator of the values with an explicit probability. Equal probabilities are used here, but it allows for experimentation with ‘loaded dice’. Another ‘feature’ of the code worth noting is the function call chaining approach as opposed to function application as used in C++ and Python. This is blurring the ‘function application’ and ‘method call on an object’ that is quite rigid in C++ and Python. It is an aspect of trying to be as declarative as possible in what is an imperative language.
Of course Rust, see Listing 4, is reputedly the language of the moment, set to replace C, and perhaps C++, as the language of first choice for new projects. Immediately obvious is the very different way enumerations work. C++, Python, and D have very similar ways of defining and working with enumerations, Rust takes a very different approach. In particular an enumeration value can carry data of undefined value, just defined type. Rust, like D, emphasises being declarative, hence the creation of pipelines of data manipulation as seen in the D code. Of course Rust tries to go further and mandates the use of monads to create what is arguably a functional programming language (disguising the imperative language). This is seen most clearly in main
. The code seems to a bit bigger, but there an appeal that error handling is enforced but in a quite pleasing way; especially highlighting the nice use of enumerations.. After the interesting dice emulation of D, in this Rust code returns to the generate an integer in a range assuming equal probabilities. The random number generator is not global, but it is shared, so no possibility of generating the same sequence for each player.
So is C++ the best language for this application? Probably not, but for something like this using the language you are most ‘at home’ with is likely the best choice. Unless you are challenging yourself as a programmer by using a programming language you are not entirely familiar with. Personally I quite favour the D code.
Why is Go not in this list of languages likely to be better than C++? It should be, but I ran out of time to complete that version of the code. Bad project management I admit.
Nota bene: the code presented here does not represent a reasonable formatting of the code, and indeed violates many de facto and de jure code formatting guidelines. However, the journal formatting structure requires badly formatted code to be presented. :-(
PS There is clearly a difference in the languages behaviour here since the different languages produce different frequencies of draws. Hummm…
[Editor: since the programs all model the same die throwing game then statistically significant differences imply at least one is buggy.]
Listing 1
#include <algorithm> #include <chrono> #include <iostream> #include <numeric> #include <random> #include <boost/range/combine.hpp> #include <boost/tuple/tuple.hpp> class randomize { private: static std::mt19937 mt; public: int operator()() { return mt() % 6 + 1; } }; std::mt19937 randomize::mt { (unsigned long int)std::chrono::system_clock::now() .time_since_epoch() .count() }; enum class result { draw = 0, player1_win = 1, player2_win = 2, }; typedef boost::tuple<int, int> int_pair; result play(int const turns, randomize & generator) { std::vector<int> player1(turns); std::vector<int> player2(turns); std::generate(player1.begin(), player1.end(), generator); std::generate(player2.begin(), player2.end(), generator); auto pairs = boost::combine(player1, player2); int_pair const t_t = std::accumulate( pairs.begin(), pairs.end(), int_pair{0, 0}, [](int_pair t, int_pair i) { int a, b; boost::tie(a, b) = i; return int_pair{ boost::get<0>(t) + (a > b ? 1 : (a < b ? -1 : 0)), 0}; }); int const total = boost::get<0>(t_t); return total == 0 ? result::draw : (total > 0 ? result::player1_win : result::player2_win); } int main() { randomize generator; int turns; std::cout << "Let's play dice" << std::endl; while ( std::cout << "How many turns? ", std::cin >> turns) { auto const result = play(turns, generator); if (result == result::draw) { std::cout << "Drawn!" << std::endl; } else { std::cout << "Player " << (int)result << " wins" << std::endl; } } }
Listing 2
#!/usr/bin/env python3 from enum import Enum import random from typing import Tuple class Result(Enum): draw = 0 player1_win = 1 player2_win = 2 def player1_win(p: Tuple[int, int]) -> int: if p[0] > p[1]: return 1 elif p[0] < p[1]: return -1 return 0 def play(turns: int, gen) -> Result: player1_wins = sum( player1_win(p) for p in zip( (gen() for _ in range(turns)), (gen() for _ in range(turns)) )) if player1_wins > 0: return Result.player1_win elif player1_wins < 0: return Result.player2_win return Result.draw def main() -> None: print("Let's play dice") random.seed() while True: try: turns = int(input('How many turns? ')) except ValueError: return result = play( turns, lambda: random.randint(1, 6) ) if result == Result.draw: print('Drawn!') else: print('Player {} wins.'.format(result.value)) if __name__ == '__main__': try: main() except (EOFError, KeyboardInterrupt): print()
Listing 3
#!/usr/bin/env rdmd import std.algorithm: map, sum; import std.array: array; import std.conv: to; import std.random: dice; import std.range: iota, zip; import std.stdio: readln, write, writefln, writeln; import std.string: strip; enum Result { draw = 0, player1_win = 1, player2_win = 2, } Result play(uint turns, uint delegate(uint) generator) { auto player1_wins = zip( iota(turns).map!generator(), iota(turns).map!generator()) .map!"a[0] > a[1] ? 1 : (a[0] < a[1] ? -1 : 0)" .sum; return player1_wins == 0 ? Result.draw : player1_wins > 0 ? Result.player1_win : Result.player2_win; } void main() { writeln("Let's play dice"); try { while (true) { write("How many turns? "); uint turns = to!uint(readln().strip()); auto result = play(turns, delegate uint(uint) { return to!uint( dice(0.0, 16.67, 16.67, 16.67, 16.67, 16.67, 16.67)); }); if (result == Result.draw) { writeln("Drawn!"); } else { writefln("Player %d wins.", result); } } } catch (Exception) { writeln(); } }
Listing 4
extern crate itertools; extern crate rand; use std::io::{self, Write}; use itertools::Itertools; use rand::distributions::{Range, Sample}; enum Result { PlayerWon(u8), Draw } fn play( turns: u32, dice_roll: &mut FnMut()->u8) -> Result { let player1_wins: i32 = itertools::repeat_call(dice_roll) .tuples::<(_, _)>() .map(|p| if p.0 > p.1 { 1 } else if p.0 < p.1 { -1 } else { 0 } ) .take(turns as usize) .sum(); if player1_wins != 0 { Result::PlayerWon( if player1_wins > 0 { 1 } else { 2 } ) } else { Result::Draw } } fn main() { println!("Let's play dice"); let mut dice = Range::new(1u8, 7u8); let mut rng = rand::thread_rng(); let mut buffer = String::new(); loop { print!("How many turns? "); io::stdout() .flush() .expect("Could not flush stdout."); buffer.clear(); match io::stdin().read_line(&mut buffer) { Ok(_) => match buffer.trim().parse::<u32>() { Ok(turns) => match play( turns, &mut ||{ dice.sample(&mut rng) } ) { Result::PlayerWon(p) => println!("Player {} wins.", p), Result::Draw => println!("Drawn!"), }, Err(_) => break }, Err(_) => break } } }
Jason Spencer
The reason for all games ending in a draw is that the random number generator actually creates a deterministic sequence of numbers, and since the calls to std::generate
copy the generator object, that is to say its internal state, the die throws generated for both players are identical. There is, however, another bug with invalidated iterators that potentially corrupts the result calculation, and may cause a crash or non-draw result.
The issues with the code are as follows:
zipit.h
- The file requires the following includes:
<utility>
forstd::pair
andstd::make_pair
. Also<iterator>
forstd::advance
,std::begin
andstd::end
, which will be added next. An include guard for the whole file is also a good idea. - class
zipit
is derived fromstd::pair<T,T>
with public access. Consider making it protected, otherwise a user can access.first
and.second
and change their values, putting them out of sync. This might be a wanted behaviour, however. I think it breaks encapsulation and the inheritance should be protected. - Consider renaming the template parameter from
T
toIter
, to express the meaning.T
is used so often that it can get confusing. - Two
operator*
functions (one const, one non-const) aren’t required, only one – the function doesn’t change thezipit
instance – although dereferencing may then lead to a change in the pointed to variable, but that doesn’t mean thezipit
instance changes (compareconst char * ptr
andchar * const ptr
). So only the const version is needed, and the non-const should be removed. operator*
at the moment returns an rvalue, sozipit
only models an input iterator. We may want to return an lvalue to model an output iterator. See the discussion and implementation later.zipit &operator+=(std::pair<int,int> const &rhs)
should usestd::advance
instead of+=
to advancethis->first
andthis->second
. If the underlying iterator is not a random access iterator it may not have anoperator+=
overload.std::advance
will either useoperator++
oroperator+=
, preferring the latter as it has constant time complexity, while the former is linear.operator->
is indeed required to at least satisfy the requirements ofInputIterator
(see later for discussion of iterator types).- The
begin(T,T)
andend(T,T)
free functions should both have their arguments passed by non-const reference. Both of the functions extract iterators from the arguments passed and store them in azipit
instance and return that. However, since the arguments passed to these functions are currently temporary copies the iterators are invalid as soon as the function returns. Since the score is later calculated from dice throws stored in memory that has been deallocated the program either crashes or has invalid results. - In
begin(T,T)
andend(T,T)
the use ofT::iterator
assumes that the typedef exists inT
. By usingstd::begin(one)
instead ofone.begin()
and changing the return type of these functions they can be adapted to also support C-style arrays, thusly (just s/begin/end/g for theend
function):
template <typename T> auto begin(T & one, T & two) -> zipit<decltype(std::begin(std::declval<T&>()))> { return { std::begin(one), std::begin(two) }; }
dice_game.cpp
- In class
randomize
themt19937
instance should not be left un-seeded. Either pass the seed as an argument to the constructor or callmt19937::seed
in therandomize
constructor, otherwise every time the program is executed the numbers returned by generator will be the same (and therefore the game results will be the same for the same number of games played). I propose two constructors for classrandomize
– a default constructor which will use a source of entropy (some OS or hardware source, but worst case the current time) to seed, and another constructor that takes the seed as an argument. See below for a sample implementation and later for a discussion on seeding. - In
randomize::operator()
the use of%6+1
to scale the output ofmt()
may produce a non-uniform distribution. While it is good enough for this use, there’s a very tiny bias. Ifmt()
were to generate numbers uniformly distributed over the interval [0,7] then taking modulo 6 of these numbers 0 would map to 0, 1=>1, 2=>2, 3=>3, 4=>4, 5=>5, 6=>0, 7=>1, so the probability of getting a 0 or 1 output is twice the probability of the other numbers. In our case, however,mt()
will produce numbers in the range 0–232-1, so the bias is unnoticeable with a 1.4 * 10-7% higher probability of output die rolls 1,2,3. Even so, prefer usingstd::uniform_int_ distribution
to change the distribution of themt19937
instance output – it doesn’t have the bias issue and is more descriptive in terms of intent.The randomize class then would look something like this:
class randomize { std::mt19937 rng; std::uniform_int_distribution<int> D6; public: randomize(decltype(rng)::result_type seed) : rng(seed), D6(1,6) { } randomize () : randomize ( std::random_device()() ) { } int operator()() { return D6(rng); } };
It might be an idea to log the seed value generated by
std::random_device
, in case the program has to be debugged later. - In
play(...)
the last argument of bothstd::generate
calls should bestd::ref(generator)
. Sincestd::generate
takes the third argument by copystd::ref
should be used to wrap a reference to generator in anstd::reference_wrapper
instance. As discussed later the random number generator actually produces a deterministic sequence, rather than true unpredictable random numbers. Without the wrapper the first call tostd::generate
makes a temporary copy of the generator object for use within the function, and all the internal state is copied with it, afterwards inplay(...)
the internal state of the generator hasn’t changed, so in the second call tostd::generate
the sequence of numbers is repeated. The result is that both players get the exact same die throws, hence the repeated draws for the headline bug. - In
play(...)
this is personal choice, but I preferint total = 0
overint total{}
as the expected value is clear. - In
play(...)
the use of anif
and then acopysign
call is perhaps a little confusing. The intent is to do a three way comparison. While there’s a proposal for such a feature in C++ [1] it doesn’t currently exist. There are a number of ways to do this but my preferred approach is a ternary conditional expression operator – it is the most expressive, and simple to execute:total += (diff<0) ? -1 : ( (diff==0)? 0 : 1 );
Here the intent and the result values are explicit (-1,0,1) and appear in order. If
diff
is more likely to be in one of the states, 0 for example, then the two ternary conditionals could be re-ordered for performance, so the most common case is dealt with first and the second comparison doesn’t have to be executed, but that’s not the case here. copysign
is presumably from math.h, but the header isn’t included, yet this compiles (g++ 7.3), so I’m not sure where it’s from. Consider usingstd::copysign
and includecmath
ifcopysign
must be used. Otherwise I’d recommend the ternary operator approach.- In
play(...)
consider replacing the wholefor
loop over the zip iterator withstd::accumulate
:auto get_score_from_throws = [](auto p1, auto p2) { auto diff = p1 - p2; return (diff<0) ? -1 : ( (diff==0)? 0 : 1 ); }; total = std::accumulate ( begin(player1, player2), end(player1, player2), 0, [&get_score_from_throws](auto sum, const auto & it) { return sum += get_score_from_throws(it.first, it.second); } );
or use
std::inner_product
:total = std::inner_product ( std::begin(player1), std::end(player1), std::begin(player2), 0, std::plus<>(), get_score_from_throws );
The accumulate version uses the zip
iterator, and the function name is more self explanatory. Theinner_product
, while confusingly named, achieves the same thing, and doesn’t even need the zip iterator.Alternatively
std::reduce
orstd::transform_reduce
could be used, but neither libstdc++ (the STL used by g++) [2], nor libc++ (the STL used by CLang) [3] have yet to support these C++17 features, so I couldn’t produce sample code. - In
play(...)
consider making the turns argument unsigned instead ofint
. You cannot play a negative number of games. - In
main(...)
turns should be bounds checked
So that should get the right result with the existing program design, but I’d make some design changes also.
- If a zip iterator is still required consider using
Boost. ZipIterator
instead – it supports the zipping of more than two iterators and is tested and known to work. zipit::operator==
is a potential source of errors. Since there isn’t an explicit comparison operator the inherited one is used. This is typically the correct behaviour, however if the vectors passed tobegin(T,T)
andend(T,T)
are different lengths there will never be a time when the end zipit iterator is reached, since bothfirst
andsecond
have to match. Since they’re incremented together they’ll never both match when there are different length vectors. Consider tweakingend(T,T)
to set.first
and.second
to the same offset from the beginning of the vectors. The offset should be the length of the shorter vector.- With its current functionality the
randomize
class isn’t actually needed at all. Assuming no other features are required the following is a drop-in replacement:auto generator = std::bind ( std::uniform_int_distribution<unsigned>(1,6), std::mt19937(rng_seed) );
For fun, you could also create a loaded die that has double the chance of rolling a six:
auto loadedD6 = std::bind( std::discrete_distribution<unsigned>{ 0,1,1,1,1,1,2}, std::mt19937(rng_seed));
Of course, for one player to have the advantage you’d need each player to have different dice types.
With respect to seeding – it might be an idea to make the seed a command line argument, so that you can have reproducible results in testing, something along the lines of:
void usage(char * execname) { std::cerr << "Usage: " << execname << "[--seed <unsigned integer>]\n"; } int main(int argc, char *argv[]) { using namespace std::string_literals; using rng_t = std::mt19937; rng_t::result_type rng_seed = std::random_device()(); if ( argc == 3 ) { if("--seed"s != argv[1]) { usage(argv[0]); return -1; } // could throw invalid_exception or // out_of_range int seed = std::stol(argv[2]); if(seed < 0) throw std::out_of_range( "Seed value must be a positive" " integer"); rng_seed = seed; } else if(argc!=1) { usage(argv[0]); return -1; } std::cout << "RNG seed = " << rng_seed << '\n'; auto D6 = std::bind( std::uniform_int_distribution<unsigned> (1,6), rng_t(rng_seed)); ...
- Something to consider is whether we actually need to keep the die throws in memory. No game is dependant on the result of any other, so there’s no need to store the throws or even the score per round. A loop could make die throws, calculating the score and keeping a running score, without any vectors.
Of course it can be shown that for two N sided fair dice the chance of throwing the same value is 1/N, the chance the first die throws a smaller number is (N-1)/2N, same for the second die. So we could get the game result with the following generator:
const unsigned die_sides = 6; auto game_point_generator = std::bind ( \ std::discrete_distribution<unsigned>{ 1.0, (static_cast<double>(die_sides)-1.0)/2.0, (static_cast<double>(die_sides)-1.0)/2.0 }, \ std::mt19937(std::random_device()()) \ );
A call to
game_point_generator()
will generate 0 if the game is a draw, a 1 if player1 wins, and 2 if player 2 wins. - If the die throws do have to be kept in memory then consider a vector of
std::pair
s to store the results – this way there is no chance of one vector in the zip iterator being smaller than the other. We also don't need the zip iterator then. zipit
could be generalised by using atuple
to allow more than two iterators to be zipped; and there’s no reason for all iterator types to be the same.zipit
should consider SFINAE/enable_if
to enable the operator overloads based on the type of the zipped iterators.std::iterator_traits<zipit>
should be implemented so thatzipit
can be better integrated into the STL (for example if it happens to model a random access iterator and states so viaiterator_traits
thenstd::advance
will be optimised).- In the spirit of C++20
zip_it
can be implemented with constexpr modifiers.
Points for discussion
Random number generation
Random number generators fall into two categories – deterministic and non-deterministic. Since CPUs are deterministic state machines they cannot, without special hardware or external input, produce true entropy. ‘Random’ numbers are therefore typically produced by pseudo-random algorithms that are actually deterministic. If you know the internal state and the algorithm, you can predict the next number exactly. This is how the rand()
library call works (that’s typically a linear congruential generator [4]), as well as mt19937
which is available since C++11. To change the sequence produced every time the program is run it is typical (although perhaps not always correct) to bootstrap the generator from the current time. Since C++11 std::random_device
can be used to seed the deterministic generator. std::random_device
takes entropy from an implementation dependant source (for example RDRAND on Intel/AMD, or /dev/urandom under linux – both use external events such as interrupts to accumulate entropy) to try and generate non-deterministic random numbers, but the non-determinism is not guaranteed by std::random_device
(26.5.6.2 in [5]).
Creating custom iterators
We’re not so much creating a custom iterator here, as we are an iterator adaptor. And we also cannot assume the underlying iterator type – it may be a forward input, forward output, bidirectional, or random iterator [6]. And then there are the const and reverse variants. But I think the intention of zipit
is still to behave like an STL iterator. One thing is for sure though, writing iterators isn’t easy. There’s a fair amount of responsibility that goes into writing one.
If we try to model an input iterator then according to [6] zipit
is still missing a post-increment operator and an operator->
. The post-increment operator isn’t that difficult and can be expressed in terms of the pre-increment operator:
zipit operator++(int) { auto temp = *this; ++*this; return temp; }
operator->
makes my head hurt, however. It should allow iter->member
... but how do we do that while managing two iterators?
zipit<Iter> const * operator->() const { return this; }
works, but then usage is ugly:
std::vector<std::string> sv1 = {"a","b","c"}; std::vector<std::string> sv2 = {"c","b","a"}; auto sv_zip = begin(sv1,sv2); std::cout << sv_zip->first->length() << '\n';
and possibly incorrect. Should usage be sv_zip->first.length()
? Or sv_zip.first->length()
? The latter won’t call operator->
and is available for free as long as the inheritance from std::pair
is public.
Alas it’s hard to know what to return from a zip iterator when pointers or references are involved.
If we want to take it further, and model an output iterator then we must make the iterator dereferenceable as an lvalue (i.e. operator*
returns a type convertible to T &
). This leads to a similar problem.
Bear in mind also that the iterator we’re storing is not necessarily a pointer, it’s something that behaves like a pointer, but may not be a raw pointer. Case in point – vector<bool>::iterator
is a proxy class that accesses the underlying storage through something sufficiently advanced (magic?). Dereferencing it returns a vector<bool>::reference
.
Using this for inspiration we can make an attempt at operator*
for output iterators:
// references cannot be stored in std::pair, // so we create a wrapper template <typename T> class ref_wrap { private: T & p; public: explicit ref_wrap (T & t) : p{t} {} operator T & () const noexcept { return p; } T & operator= (const T & t) const { p = t; return p; } }; // convert the iterator type to a reference // type template <typename T> struct iter_reference_type { typedef typename std::iterator_traits<T>::reference type; }; template <typename T> struct iter_reference_type < T * > { typedef T & type; }; // optionally wrap the reference in our // wrapper template <typename T> struct wrapped_reference_type { typedef T type; }; template <typename T> struct wrapped_reference_type < T & > { typedef ref_wrap< T > type; }; template <typename Iter> struct wrapped_iter_reference_type { typedef typename wrapped_reference_type< \ typename iter_reference_type<Iter>::type \ >::type type; }; // a second template parameter is added to // zipit in case the reference deduction // doesn't work for some custom iterator template <typename Iter, typename wrapped_reference = typename wrapped_iter_reference_type<Iter>::type > class zipit : public std::pair<Iter, Iter> { ... public: typedef wrapped_reference reference; std::pair<reference, reference> operator*() const { return std::make_pair( reference(*this->first), reference(*this->second) ); } ... };
Here we have a simple class to wrap a reference, which allows us to store it in an std::pair
. std::reference_wrapper
cannot be used since it is implicitly convertible from T
and any assignment then simply changes the reference, not the referred to value.
The SFINAE is required to extract the true value reference type. T *
converts to T &
, std::vector<T>::iterator
converts to either the proxy object, or if it is a raw pointer to T &
. Raw references then get wrapped and packed in a pair. Proxy objects don’t get wrapped, they just get put into a pair.
The same wrapper could be used to implement operator->
but since we have to return a pointer to the std::pair<reference, reference>
we might have to allocate the pair object on the heap and return as a unique_ptr
. And that’s really ugly. I want no part of it.
…and of course plenty more features are needed if the iterator is to model one of the other iterator types.
…and don’t forget to specialise std::iterator_traits
(deriving from std::iterator
was deprecated in C++17).
…and since C++11 the iterators must be swappable through std::swap
… so specialise that too.
All in all, implementing iterators can lead to bouts of anxiety and heavy drinking. So drop zipit
from the code – just use Boost.ZipIterator
and accept its semantics, or std::inner_product
without a zip iterator, or std::vector<std::pair<int,int>>
, or don’t store the dice throws.
References
[1] http://open-std.org/JTC1/SC22/WG21/docs/papers/2017/p0515r0.pdf
[2] https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2017
[3] http://libcxx.llvm.org/cxx1z_status.html
[4] https://en.wikipedia.org/wiki/Linear_congruential_generator
[5] The C++11 Programming Language standard – ISO/IEC 14882:2011(E)
[6] http://www.cplusplus.com/reference/iterator/
Balog Pal
This entry starts with a really good teaser. Dice game that is always drawn? In real life I would ask “did you check your RNG sequence? I’d bet you either keep generating the same number or the exact same sequence for the two players.†It’s worth a look just to prove my gut right or wrong.
Looking at the code I see some iterator magic that I save for later. Then more magic with the new <random>
that I’m not up to date with. However, in the past I read a lot of problems related to RNG. It reminds me an old story from around the millennium. I had to write an application that used winsock sockets for communication. When it was ‘ready’ I went to the forums to ask some peers’ opinion. And the people, instead of commenting on the code just pointed me to fetch the ‘winsock lame list’ and serve myself. As anyone would guess I found a good amount of matches… And random
looks like a similar can of worms, maybe someone compiled a similar list? Let’s try a web search for ‘lame list for rng’. Wow, guess what, first hit is titled ‘The C++ <random> Lame List – Elbeno’ [1]. It even refers to the ws lame list as the muse. Unfortunately, it lacks the explanations but even just the points can help a lot.
Matching the code against the list I see hits on 6, 7, 11, 14 and we get half-dozen good hints on what to avoid as we try to fix. And especially 14 looks like a killer that may match my first clue. But let’s stash that too, and put the code in a compiler so we can instrument it with some output at interesting points. And while at it, try some static analysis.
In Visual Studio 2017 (15.5.5) default W3 compile gives one warning on total +=
– conversion double
to int
. Maybe Wall tells more. Huh, indeed. 900 warnings! Too bad all but 4 come from the standard headers. I wonder if the VS team ever tried the dogfood method that others use to create actually usable stuff. The 3 warnings on our code are int signed/unsigned
mismatch that are not very interesting till we not press the bounds. But the int turns
that comes directly from user input is just used to size the vector… we’ll try to play with -1 later. <evil grin>
Now let’s see analyze. We’re warned that begin()
and end()
can be declared noexcept
. Also ‘The reference argument generator
for function play
can be marked as constâ€. That in general we treat by adding the const
, but for this case it is an indication of a problem: we expect the generator to mutate, if it could be const
, we messed up. Must be that missing ref()
from point 14.
Lastly, a warning “Do not dereference a invalid pointer (lifetimes rule 1). return of +=
was invalidated by end of scope for allotemp.2
†that honestly beats me, especially as op -=
has identical code without warning.
Clang invoked through the ‘clang power tools’ extension also picked that ref should be const
[see the clang-tidy check google-runtime-references] and said that in call play()
the first arg is not initialized, what is not really true for our case, if we got there, cin
reported okay, so it did complete >>
properly.
Let’s just run it as is to see those draws. No luck here, we hit an assert: “vector iterator not dereferencable†in operator*
, debugger shows this->first
being null
. Wow. While at it, I take a peek into the sequences sitting in player 1: 3 1 3 6, and 2: same. Was I right or was I right? But I really want to see the output, maybe in release build that assert will not stand in the way? Indeed, now the program executes, and I get … ‘Player 1 wins’ on all my tries. Let’s hope we didn’t also launch some nukes on another continent.
Originally, I did not want to bother with the code before class randomize
at all, but must push back the related rant even further, to at least get UB out of the way. We don’t have too much code before the assert, just the begin()
so let’s look at that: guess what, it takes out vectors by value, gets the iterator, stores it, then the copies are destroyed for good leaving us with the invalid iterators. Same story with end()
. Let’s add the missing &
s and see what happens. Yey, no more assert
, and we’re finally drawn for all attempts. And if we use std::ref(generator)
in the two generate()
calls, then we get various results too.
So, once we have the program working as desired, what’s left besides to throw it ALL away and create properly? From scratch, as we learnt from the food industry, that if you already mixed the bad things with the good, it is hopeless to separate them later, instead we only add what is good. It does not stop us from reusing what we like from the original source be it idea or a code fragment.
For start we paste the original main()
. It looks okay-ish for the original aim. We just need a bounds check on turns
before passing it on. Call to play()
is fine, just make sure generator
is passed by ref. And we will need the random source. Having it on the stack will trigger #11. We might ignore that reasoning it is a single case and we are positive to have enough stack space. Or use some alternative, like making it static or have a dynamic instance with make_unique
. This program does not want threads, so we can ignore the related points, but commit them to memory.
Now let’s make a proper random source. While at it, let’s also rename it accordingly, random_source
is way better description than randomize
. The list approves mt19937
as a good start, I have to look up how to use it. Along with other hints that we’ll need random_device
and uniform_int_distribution
. The latter on cppreference.com has the code for our case. All together I came up with this:
class random_source { static unsigned int get_seed() { try { std::random_device rd; return rd(); } catch (std::exception const&) { return static_cast<unsigned int> (time( nullptr)); } } std::mt19937 mt{get_seed()}; std::uniform_int_distribution<> dis{1, 6}; public: int operator()() { return static_cast<int>(dis(mt)); } };
The description of random_device
is pretty vague, most of it is left to the implementation, but it is intended to do the best to summon some entropy. If it can’t and we get the exception the OP should decide what to do, I just dumped in an alternative source that is less lame as the 2nd violin. Time returns unspecified type, but it is numeric and for unsigned anything goes. The other cast in the op()
is benign as the number is in the known range. I was thinking to change the return type instead to unsigned
, but for the use plain int
is the natural thing. Or is it? Maybe byte
is more fitting. That can be rearranged later. Moving on to play()
.
The aim was to do play N rounds, in each make both player throw dice, decide the round outcome and accumulate the result. One way to do that is
int turn(random_source &generator) // returns score for player 1: -1 0 or +1 { // return sign( generator() - generator() ); // too bad no stock sign function.... auto const d1 = generator(); auto const d2 = generator(); return d1 < d2 ? -1 : d1 > d2; } int game(int turns, random_source &generator) // the difference between player 1 and 2's wins { int total = 0; for (int i = 0; i < turns; ++i) total += turn(generator); return total; }
And in the original play()
function we just
auto const total = game(turns, generator);
and print the result. And we are done. For good. No kidding. Fun thing that it now even works with the original main
without bound check and prints draw for -1 instead of crash or assert.
Someone may object that we were supposed to criticize or fix zipit
… Were we actually? It breaks one of my fundamental rule for reviews (and dev process): application programmers are forbidden to write blacklisted stuff, that starts with known timewasters like ‘logger class’, ‘string class’ ‘refcounting pointer’, and also contains ‘collection’ and ‘iterator’. For the collection I can be convinced in theory if someone has a problem that requires a really new type of collection not covered by either standard or the many libraries we work with. And for iterator if one opens with “I have this code full of algorithm use, I needed an iterator for that special collection to drive them.†Not the case here. Even for that case I’d ask whether it is really needed right now or we may wait a little till ranges finally arrive.
If the more advanced version of the program have actual need to keep record of turns, it can be simply done in a pair<int, int>
per round, collection of those and running whatever processing with tools that work (WELL!) out of the box.
And those who really need to implement their own, can start with web search for ‘how to write c++ iterator’, the first hit [2] is more than promising.
An extra stab at header separation: this simple code does not need a header at all, now, or probably ever. But if some parts are extracted, the header shall be self-containing, have all needed #include
lines, not rely on the client.
Notes:
More fields may be available via dynamicdata ..