Journal Articles
Browse in : |
All
> Journals
> CVu
> 282
(9)
All > Topics > Programming (877) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Code Critique Competition 99
Author: Martin Moene
Date: 03 May 2016 09:41:42 +01:00 or Tue, 03 May 2016 09:41:42 +01:00
Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.
Body:
Participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.
Note: If you would rather not have your critique visible online, please inform me. (Email addresses are not publicly visible.)
Last issue’s code
I am writing a simple program to index written text but it doesn’t quite work. I want to print out every word in the document with a list of each line it appears on. I’m only getting the first occurrence listed but can’t work out why.
Please explain why they have this problem... and suggest some other possible improvements to the program. The program is in Listing 1.
#include <iostream> #include <map> #include <sstream> #include <vector> int main() { using namespace std; map<string, vector<int>> index; // read and index standard input string line; int lineno{}; while (getline(cin, line)) { ++lineno; istringstream iss(line); string word; while (iss >> word) { auto start = word.find_first_not_of(":;.,'\"?!-"); auto end = word.find_last_not_of(":;.,'\"?!-"); if (start != end) word.replace(end + 1, end, ""); word.replace(0, start, ""); if (word.empty()) continue; auto iter = index.find(word); if (iter == index.end()) { index[word].push_back(lineno); } else { auto lines = iter->second; if (lines.back() == lineno) ; // ignore dups else lines.push_back(lineno); } } } // print the index for (auto entry : index) { cout << entry.first << ": "; string delim; for (auto line : entry.second) { cout << delim << line; delim = ", "; } cout << '\n'; } } |
Listing 1 |
Critiques
Paul Floyd <m >
At first I thought that this was a fairly simple problem and that there would be very little to say.
Let’s start with the reason why only the first occurrences are being detected. This is due to the code that handles the detection of the second or subsequent occurrence:
auto iter = index.find(word); if (iter == index.end()) { index[word].push_back(lineno); } else { auto lines = iter->second; if (lines.back() == lineno) ; // ignore dups else lines.push_back(lineno); }
Specifically, auto lines
infers a vector, so it makes a copy of the existing vector containing one entry. It then either does nothing or appends a line number to the end of this copy. The copy goes out of scope at the end of the closing brace, leaving the original unchanged.
This can be fairly easily fixed by making lines
a reference. Simply declaring it as auto& lines
will do the trick, or alternatively using an explicit declaration:
using MyMap = std::map<std::string, std::vector<int>>;
then
MyMap::mapped_type& lines = iter->second;
I don’t like the use of the variable index
. index
is so commonly used to mean loop index that I’d prefer to see something that doesn’t overload the meaning, like word_index
.
The next code smell was the punctuation detection. This looks like it should be in a function. The choice of punctuation seems fairly arbitrary. Personally I would use ispunct()
, and only use something different if necessary. There are a couple more things that I would avoid:
start
andend
variable names, too confusing when there are functions and members with the same name in the standard libraryreplace
with an empty string;erase
does the same thing and is more idiomatic (at least if you think ofstd::string
as being a container rather than a string)
I would write this as:
const char* myPunct = ":;.,'\"?!-"; word.erase(0, word.find_first_not_of(myPunct)); word.erase(word.find_last_not_of(myPunct)+1);
This avoids the need for local variables for the start and end of the string.
James Holland <James.Holland@babcockinternational.com>
The student has made a fair attempt at the design of this program. However, there are several problems which prevent the software working as required.
- Braces should surround the two statements following the first
if
statement. Bothreplace()
functions need to be dependant on theif
statement, not just the first as is the case with the student’s code. To avoid this type of error it is best to get into the habit of always enclosing conditional parts of anif
statement in braces even if they consist of only one statement. - The first statement of the
else
block of the thirdif
statement defines a variable namedlines
. This should be a reference type so that the vector within the map can be modified. As it stands, additional line numbers are added to a local copy of thelines vector
and not thelines vector
withinindex
. - The program does not select words that are separated only by the delimiting characters (
":;.,"\?!-"
) and not a white space. Presumably, the program should interpret a line containing"name::space"
, for example, as two words. Unfortunately, this problem requires some fairly major restructuring of the code to solve and will be discussed later.
In addition, there are some issues that are more to do with style that the correct working of the program. I consider them worth discussing, nonetheless.
- The delimiting characters have been defined twice. It would be better to declare a
const string
that is initialised to the required value and used instead of the repeated literal strings. Any required change to the delimiting characters need then only be done in one place. - The student makes use of
string
’sreplace()
member function. This is confusing because it gives the impression that some characters of thestring
are being replaced by other characters. What is really happening is that characters are being erased. It would be better to use one of the overloadederase()
functions to clarify the code. - The variables
entry
andline
that are used to print the index need not be copies; they need only beconst
references. This is unlikely to make a significant speed increase when writing to the console, but it does make it clear that no modifications are being made toindex
.
As mentioned above, the student’s program does not isolate words that are only separated by the delimiting characters. It may well be the case that there are many such words in a line of text. This suggests that a loop is required that isolates the words and only exits when there are no more in the line. Writing the code for a loop (in this case a while
loop) can be tricky but after a little thought and some trial and error, I came up with the following program.
#include <iostream> #include <map> #include <vector> int main() { using namespace std; const string delims(" \t:;.,'\"?!-"); map<string, vector<int>> index; string line; int line_number{}; while (getline(cin, line)) { ++line_number; auto start_of_word = line.find_first_not_of(delims); while (start_of_word != string::npos) { auto end_of_word = line.find_first_of(delims, start_of_word); if (end_of_word == string::npos) { end_of_word = line.length(); } const string word(line.substr( start_of_word, end_of_word - start_of_word)); auto position_of_word = index.find(word); if (position_of_word == index.end()) { index[word].push_back(line_number); } else { auto & lines = position_of_word->second; if (lines.back() != line_number) { lines.push_back(line_number); } } start_of_word = line.find_first_not_of( delims, end_of_word); } } for (const auto & entry : index) { cout << entry.first << ": "; string delim; for (const auto & line_number : entry.second) { cout << delim << line_number; delim = ", "; } cout << '\n'; } }
As well as correctly selecting words from the line of text, this program has the advantage that there is no need to ‘top and tail’ words using replace()
(or erase()
). Also, experiments show that the revised program is about 20% faster that the student’s (ignoring inputting the data and outputting the results).
Commentary
While it is very nice to have two keen and regular supporters of the code critique, can I encourage you to have a go even if you’ve never entered the competition before? You can see your name in print and it is good practice for real code reviews!
There were, as Paul noted, a fair number of problems in a relatively simple-looking piece of code…
I think between them the entrants covered most of the points pretty well. The original presenting problem was due to naive use of auto
. The design principle to bear in mind is that C++ uses value semantics in very many places (see for example Andrzej Krzemieński’s blog post at https://akrzemi1.wordpress.com/2012/02/03/value-semantics/). Hence the default behaviour of plain auto
is to make a new value even when the original item is a reference. For reasons that are unclear to me, this behaviour seems unintuitive, at least initially, to many programmers who assume the compiler will give them the same type they would have written without the presence of auto
.
When I use auto
, I find myself writing auto const &, auto *
, etc., a significant proportion of the time to either enforce or highlight (or both) the semantics of the generated variable.
The final bug was that the first call to replace uses the wrong value for the second argument (the number of characters to erase) – the code is currently written as:
word.replace(end + 1, end, "");
but the actual number of characters that need to be removed is from position end+1
to the end of the string. However, passing end
to the replace function will not cause any undefined behaviour, it just may not remove enough characters in some pathological cases. James’ solution side-steps this problem completely as using erase
means the number of trailing characters does not need to be supplied.
The winner of CC 98
Both critiques were good but I think James covered a bit more ground, and also uncovered the design flaw that the original program does not handle embedded delimiters, so he wins the prize for this issue’s critique.
Code critique 99
(Submissions to scc@accu.org by Jun 1st)
I wanted to learn a bit about C++ threading so I tried writing a thread pool example. But it sometimes crashes – I’ve managed to get it down to a small example. Sometimes I get what I expected as output, for example:
Worker done Worker done Ending thread #2 Ending thread #0 Worker done Ending thread #1 Worker done Ending thread #3 Worker done All done
But other times I get a failure, for example:
Worker done Ending thread #0 Worker done Awaiting thread #1 Worker done W <crash>
I’m not sure what to do next – can you help?
The program is in Listing 2.
#include <algorithm> using namespace std; #include <array> #include <chrono> using namespace chrono; #include <cstdlib> #include <iostream> #include <thread> static const int POOL_SIZE = 4; // Allow up to 4 active threads array<thread, POOL_SIZE> pool; // Example 'worker' -- would in practice // perform some, potentially slow, calculation void worker() { this_thread::sleep_for( milliseconds(rand() % 1000)); cout << "Worker done\n"; } // Launch the thread functoid 't' in a new // thread, if there's room for one template <typename T> bool launch(T t) { auto it = find_if(pool.begin(), pool.end(), [](thread const &thr) { return thr.get_id() == thread::id(); } ); if (it == pool.end()) { // everyone is busy return false; } *it = thread([=]() { t(); thread self; swap(*it, self); self.detach(); cout << "Ending thread #" << (it - pool.begin()) << "\n"; }); return true; } int main() { while (launch(worker)) {} // And finally run one in this thread as an // example of what we do when the pool is full worker(); for (auto & it : pool) { thread thread; swap(thread, it); if (thread.joinable()) { cout << "Awaiting thread #" << (&it - &*pool.begin()) << "\n"; thread.join(); } } cout << "All done\n"; } |
Listing 2 |
You can also get the current problem from the accu-general mail list (next entry is posted around the last issue’s deadline) or from the ACCU website (http://www.accu.org/journals/). This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.
Notes:
More fields may be available via dynamicdata ..