Journal Articles
Browse in : |
All
> Journals
> CVu
> 291
(7)
All > Journal Columns > Code Critique (70) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Code Critique Competition 104
Author: Martin Moene
Date: 03 March 2017 18:30:29 +00:00 or Fri, 03 March 2017 18:30:29 +00:00
Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.
Body:
Please note that participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.
Note: If you would rather not have your critique visible online, please inform me. (Email addresses are not publicly visible.)
Last issue’s code
I am trying to keep track of a set of people’s scores at a game and print out the highest scores in order at the end: it seems to work most of the time but occasionally odd things happen...
Can you see what’s wrong?
The code – scores.cpp – is in Listing 1.
#include <functional> #include <iostream> #include <map> #include <sstream> #include <unordered_map> // Best scores std::multimap<int, std::string, std::less<>> best_scores; // Map people to their best score so far std::multimap<int, std::string>::iterator typedef entry; std::unordered_map<std::string, entry> peoples_scores; entry none; void add_score(std::string name, int score) { entry& current = peoples_scores[name]; if (current != none) { if (score <= current->first) { return; // retain the best score } best_scores.erase(current); } current = best_scores.insert({score, name}); } void print_scores() { // top down for (auto it = best_scores.end(); it-- != best_scores.begin(); ) { std::cout << it->second << ": " << it->first << '\n'; } } int main() { for (;;) { std::cout << "Enter name and score: "; std::string lbufr; if (!std::getline(std::cin, lbufr)) break; std::string name; int score; std::istringstream(lbufr) >> name >> score; add_score(name, score); } std::cout << "\nBest scores\n"; print_scores(); } |
Listing 1 |
Critique
Chris Main
This is a tricky problem as I haven’t been able to come up with any valid test data that makes ‘odd things happen’ on my platform, which is:
g++ 5.4 on Ubuntu 16.04, compiling with -std=c++11.
My first point is therefore the importance of a precise statement of the issue. Ideally this should state the actual input, the expected result and the actual result. Even if this is not possible because the issue cannot be repeatedly reproduced, the description should at least state the kind of issue, for example:
- the program crashes
- the program outputs spurious values
- the program outputs incorrect values
- the program outputs lines in the wrong order
- the program does not output as many lines as it should
Compilation failed for me because of the std::less<>
in the declaration of best_scores
. This can be fixed by changing it to std::less<int>
, or better still by removing it altogether as that is the default. I assume this was just a typo.
There is no validation of input lines, so if something other than a name and a score is entered that might cause strange results, but the way the question is posed suggests that the problem is less obvious than that.
My first thought was iterator invalidation. Multimap iterators are being held as the mapped_type
in an unordered_map
. These iterators are associated with entries in best_scores
, so I wondered whether they could become invalid when other entries are added to or removed from best_scores
. However, after some research I found that multimap iterators are not invalidated in these cases.
In add_score()
, the test for whether a person has a previous score is done by calling operator[]
and comparing the result with a default constructed iterator. Although this works, it is more idiomatic to call find()
and compare the result with the end()
iterator. The function would then look like:
auto current = peoples_scores.find(name); if (current == peoples_scores.end()) { peoples_scores.insert({name, best_scores.insert({score, name})}); } else if (current->second->first < score) { best_scores.erase(current->second); current->second = best_scores.insert({score, name}); }
I don’t think this makes any functional difference to the implementation.
In print_scores()
, the for
loop is not idiomatic, and this is where I think the problem may be. best_scores
needs to be iterated in reverse order. The correct way to do this is to use the reverse iterators from begin to end, and in fact the const
versions of these can be used:
const auto end_it = best_scores.crend(); for (auto it = best_scores.crbegin(); it != end_it; ++it) { std::cout << it->second << ": " << it->first << '\n'; }
There is no range based for
loop for reverse iterators in the C++ language itself yet. With a bit of refactoring, std::for_each
could be used instead:
#include <algorithm> struct output_score { void operator()( const std::pair<int, std::string>& score) { std::cout << score.second << ": " << score.first << '\n'; } }; void print_scores() { std::for_each(best_scores.crbegin(), best_scores.crend(), output_score()); }
The original implementation uses forward iterators from end to begin, with a post decrement in the condition. As observed, this does in fact work, but there is a potential problem with the terminating step of the loop. This decrements the iterator to a position before best_scores.begin()
, which does not exist, and technically this is undefined behaviour. Because the iterator is never dereferenced when it reaches this invalid position, I expect you would usually get away with this implementation, but maybe this is the cause of “occasionally odd things happen ...â€. I haven’t been able to find anything else.
James Holland
It may be the case that the student’s code runs without error on a particular system. This would be unfortunate as there are two features of the program that rely on undefined behaviour. The program may well fail on other platforms. The first feature that results in undefined behaviour is located in add_score()
and is the comparison of default-constructed iterators. One iterator is created by operator[]()
belonging to peoples_scores
. The other is the global none
.
The fact that the standard does not permit the comparison of default constructed iterators is a pity as it renders invalid what would otherwise be a perfectly good function. Perhaps later versions of the standard will permit such comparisons. I could not think of a reasonable way of modifying the student’s code while still employing operator[]()
. Instead, I adopted a fairly straightforward approach that first searches peoples_scores
for a record with a key of name and then adds a record or modifies the existing record accordingly.
The second undefined feature is located in print_scores()
and results in an iterator pointing to one element before the start of best_scores
. This is also not permitted under the standard. Probably the simplest way of resolving this problem is to rewrite the loop using a range-based for
loop. The original loop was designed to print best_scores
starting with the last record. The range-based for
loop can only iterate through the container starting from the first record. The desired result can be obtained by storing the elements of best_scores
with the greatest key value at the start of the container. This is achieved by changing the third template parameter of the best_scores
type to std::greater<int>
. The best scores will now be printed starting with the highest value.
One slight deficiency with the program that still exists is that people’s names are stored twice; once in peoples_scores
and once in best_scores
. It would be more efficient for best_scores
to refer the names already stored in peoples_scores
. This is a little fiddly to achieve and I do not attempt the modification here.
Finally, I present my version of the student’s code below.
#include <iostream> #include <map> #include <sstream> #include <unordered_map> using best_scores_type = std::multimap< const int, const std::string, std::greater<int>>; best_scores_type best_scores; std::unordered_map<std::string, best_scores_type::const_iterator> peoples_scores; void add_score(const std::string & name, const int score) { const auto people_position = peoples_scores.find(name); if (people_position != peoples_scores.end()) { if (score > people_position-> second->first) { best_scores.erase( people_position->second); people_position->second = best_scores.insert({score, name}); } } else { const auto best_scores_position = best_scores.insert({score, name}); peoples_scores.insert({name, best_scores_position}); } } void print_scores() { for (const auto & person : best_scores) { std::cout << person.second << ": " << person.first << '\n'; } } int main() { for (;;) { std::cout << "Enter name and score: "; std::string lbufr; if (!std::getline(std::cin, lbufr)) break; std::string name; int score; std::istringstream(lbufr) >> name >> score; add_score(name, score); } std::cout << "\nBest scores\n"; print_scores(); }
Herman Pijl
Apparently the developer wants to keep track of the high scores and simultaneously maintain a lookup that maps the user to an entry in the high score table.
The print_scores
function looks a bit suspicious.
I decide to run the program, and after the prompt appears, I press Ctrl-d to end the input. The result is a segmentation fault (I am running on Cygwin). Iterating over the elements of a container in reverse order should not be an adventure. Just use the reverse iterators that were designed exactly for this purpose (rbegin()
and rend()
) to avoid problems or even crashes as the one I had.
Reverse iteration doesn’t look natural at first sight. The ordered containers allow to specify an ordering criterion. The default ordering criterion happens to be std::less
, but nothing keeps you from using another ordering criterion from the algorithm library, e.g. std::greater
. The given code explicitly mentions the default ordering criterion and this explicit (default) choice is probably meant to be a hint to change the program. Ordering from high to low values allows to iterate over the container with the usual begin()
and end()
boundaries.
Let’s move to the next problem in this code. The peoples_scores
map attempts to keep track of the position in the best_scores
map. Unfortunately it is not such a good idea to keep a copy of an iterator in a table. Stroustrup (4th Ed. $31.2) mentions that the ordered [multi]map/sets are usually implemented as balanced binary trees (mostly red-black trees). To keep such a tree balanced (and therefore performant O(log(n))) some pruning is needed causing the traversal (=iteration) to be different after insertions/deletions.
So if you cannot use an iterator, then what can you use? Remarkably the answer is: a pointer! There is no such thing as a commandment that says: “Thou shalt not use pointers!†Some people try to avoid them dogmatically, but sometimes you have no choice.
The standard library does this too in some cases, e.g. when you construct an istream_iterator
, you typically pass an istream
to the constructor as reference, BUT the istream
iterator takes the address of that istream
(thus a pointer) and it keeps that as a private member. One of the good reasons to keep a pointer is that the istream_iterator
is an input iterator ($33.1.2) and as such it has to be copyable.
I am going to keep the const_pointer
type of the multimap, i.e. std::multimap<int, std::string>::const_pointer
will become my entry type.
The value_type
of a map is typically a (K, V) pair.
I assume that the standard allocator will allocate the nodes on the heap, so that the addresses of the value_type
will not change when a red-black tree gets reorganised.
The best_scores
and peoples_scores
maps have to be maintained ‘transactionally’, so that there is always exactly one entry in both for a particular user.
In order to add an entry to the best_scores
map, I use the emplace
method. As it returns an iterator, I deference it and use the address-of operator to find my const_pointer
:
&*best_scores.emplace(score, name);
When there is no entry for the user I can just do:
peoples_scores.emplace(name, &*best_scores.emplace(score, name));
When there is already an entry for the user, then I have to check whether the score is better than the previous score. If that is the case, then I have to erase the previous score. Unfortunately, multiple users can have the same score. So in order to find the right entry and erase it I use
// bs is reference to best_scores bs.erase(std::find_if( bs.lower_bound(prevScore), bs.upper_bound(prevScore), [&](auto & it)-> bool{ return it.second == name; }));
After that I need to add a new entry in the best_scores
table and update the reference to it in the peoples_scores
map.
itFind->second = &*bs.emplace(score, name);
It seems to work fine.
Further enhancements could be the introduction of a good old fashioned struct containing name and score and defining the extraction operator >>
on it so that so that you can write the processing as a for_each
call.
A complete solution is shown below:
#include <functional> #include <iostream> #include <map> #include <sstream> #include <unordered_map> // added #include <algorithm> #include <cassert> #include <iterator> // Best scores /*added*/ std::multimap<int, std::string, std::less<>> typedef best_scores_type; std::multimap<int, std::string, std::less<>> best_scores; auto & bs = best_scores; // new unordered map std::unordered_map<std::string, best_scores_type::const_pointer> nps; //new_peoples_scores void new_add_score(std::string name, int score) { auto itFind = nps.find(name); if (itFind != nps.end()){ // player already present int prevScore = itFind->second->first; if (prevScore < score){ bs.erase(std::find_if( bs.lower_bound(prevScore), bs.upper_bound(prevScore), [&](auto & it)-> bool{ return it.second == name;})); itFind->second = &*bs.emplace(score, name); } } else { // new player nps.emplace(name, &*bs.emplace(score, name)); } } void print_scores() { std::for_each(best_scores.rbegin(), best_scores.rend(), [](auto & it){ std::cout << it.second << ": " << it.first << '\n';}); } struct NameScoreEntry{ std::string name; int score; }; std::istream & operator>>(std::istream& is, NameScoreEntry& entry) { is >> entry.name >> entry.score; return is; } void extract(std::istream & is){ std::istream_iterator<NameScoreEntry> ist(is), eos; std::for_each(ist, eos, [&](auto i){ new_add_score(i.name, i.score);}); } int main(){ std::cout << "Enter name and score (multiple lines): "; extract(std::cin); print_scores(); }
Commentary
This is a critique that is principally about undefined behaviour – often abbreviated to UB. There are in fact three different examples of this in the code presented. You might want to stop now and see if you can find them with this hint before reading on!
The first problem is the use of the default-constructed entry
object named none
. As James noted, the standard does not allow comparisons between default-constructed iterators and so the comparison
if (current != none)
is not valid. As it happens, the code appears to run successfully with a number of combinations of compilers and flags: one of the problems with undefined behaviour is that it can be quite hard to detect.
The second problem is that the loop in print_scores
counting backwards through best_scores
decrements the iterator it
to point before the beginning of the collection. This is also not valid, and may cause runtime failures depending on exactly how multimap
is implemented in the standard library being used. Some years ago I encountered a similar problem with code that removed items from a std::list
and decremented the iterator referring to the item being removed. This had worked for years with one implementation of the standard library, but when ported to a different environment decrementing the iterator when the item being deleted was at begin()
caused failures at runtime.
The third problem is that the typedef
for entry
is incorrect. It refers to an iterator into a multimap
with a defaulted comparator but the actual map has an explicitly specified comparator of std::less<>
. This subtle difference means the resultant iterators are of different types and adds some more possibly troubling behaviour.
(Note: std::less<>
was added in C++ 14, which is why Chris Main couldn’t compile the code using -std=c++11. It was proposed by Stephan Lavavej and his proposed wording was voted into the standard without needing any editing, which is very unusual!)
There are several ways to attack these problems; but identifying them is often the hardest step. Fortunately, many compilers provide extra validation when using the standard library that can make this easy.
For example, compiling the sample code using gcc and adding the flag -D_GLIBCXX_DEBUG
will use an instrumented version of the standard library. The code as presented won’t even compile with this flag specified because of the mismatched iterator types.
Similar instrumentation is available with MSVC when using the debugging runtime library (eg adding -MDd
).
Incidentally, this is another reason to prefer using the standard library to your own hand-written code…
As the entries showed, solving the problem caused by trying to use none as a sentinel value can be resolved by the use of find
rather than the simple use of the square bracket operator.
The over-enthusiastic decrementing of the iterator can be resolved by using a more idiomatic loop: either by changing the comparator to std::greater<>
or by changing the loop to use crbegin
and crend
. The first option has the additional benefit of allowing use of range-for.
I was expecting someone to comment about the placement of typedef
in the middle of the line: placing typedef
at the beginning of the line is an extremely common coding convention. In modern C++ code though I would recommend using
over typedef
as I think it is more readable.
The winner of CC 103
I was (deliberately) a little vague in setting the critique about what exactly the symptoms were (“occasionally odd things happenâ€) – perhaps I should have been a little more forthcoming! Unfortunately, of course, when undefined behaviour occurs the symptoms are often just like this…
While Chris did not identify the undefined behaviour, his instincts for writing idiomatic code were sound and so his modifications did in fact deal with the two main pieces of UB.
James identified the two main sources of UB and he also, by changing the comparator, was able to change the code in print_scores
to use range-for, which makes it clearer.
Herman was very close to identifying the problem; but in fact the iterators are not invalidated when the map is re-balanced. His replacement of iterators with pointers was valid and did have the side-effect of removing the undefined behaviour. I did like his introduction of a simple helper struct NameScoreEntry
to assist with reading the data in: it is very easy to create such structs in C++ and there can be significant benefits for readability with having named fields.
Overall, I think James provided the best critique, so I have awarded him the prize.
Code critique 104
(Submissions to scc@accu.org by April 1st)
I was trying to write a simple template that gets the unique values from a range of values (rather like the Unix program uniq) but my simple test program throws up a problem.
test with strings a a b b c c => a b c test with ints 1 1 2 2 3 3 => 1 1 2 3
Why is the duplicate 1 not being removed?
The code is in Listing 2 (unique.h) and Listing 3 (unique.cpp).
#include <iterator> #include <vector> // get unique values in the range [one, two) template <typename iterator> std::vector<typename iterator::value_type> unique(iterator one, iterator two) { if (distance(one, two) < 2) { // no duplicates return {one, two}; } // first one can't be a duplicate std::vector<typename iterator::value_type> result{1, *one}; while (++one != two) { auto next = *one; bool is_unique = (*result.rbegin() != next); if (is_unique) { result.push_back(next); } } return result; } |
Listing 2 |
#include <iostream> #include <string> #include <vector> #include "unique.h" template <typename T> void test(std::ostream &os, std::vector<T> const &vector) { auto result = unique(vector.begin(), vector.end()); auto out = std::ostream_iterator<T>(os, " "); copy(vector.begin(), vector.end(), out); os << "=> "; copy(result.begin(), result.end(), out); os << "\n"; } int main() { std::cout << "test with strings\n"; std::vector<std::string> ptrs; ptrs.push_back("a"); ptrs.push_back("a"); ptrs.push_back("b"); ptrs.push_back("b"); ptrs.push_back("c"); ptrs.push_back("c"); test(std::cout, ptrs); std::cout << "test with ints\n"; std::vector<int> ints; ints.push_back(1); ints.push_back(1); ints.push_back(2); ints.push_back(2); ints.push_back(3); ints.push_back(3); test(std::cout, ints); } |
Listing 3 |
You can also get the current problem from the accu-general mail list (next entry is posted around the last issue’s deadline) or from the ACCU website (http://accu.org/index.php/journal). This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.
Notes:
More fields may be available via dynamicdata ..