Journal Articles

CVu Journal Vol 28, #2 - May 2016 + Programming Topics
Browse in : All > Journals > CVu > 282 (9)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Code Critique Competition 99

Author: Martin Moene

Date: 03 May 2016 09:41:42 +01:00 or Tue, 03 May 2016 09:41:42 +01:00

Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.

Body: 

Participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.

Note: If you would rather not have your critique visible online, please inform me. (Email addresses are not publicly visible.)

Last issue’s code

I am writing a simple program to index written text but it doesn’t quite work. I want to print out every word in the document with a list of each line it appears on. I’m only getting the first occurrence listed but can’t work out why.

Please explain why they have this problem... and suggest some other possible improvements to the program. The program is in Listing 1.

#include <iostream>
#include <map>
#include <sstream>
#include <vector>
int main()
{
  using namespace std;
  map<string, vector<int>> index;
  // read and index standard input
  string line;
  int lineno{};
  while (getline(cin, line))
  {
    ++lineno;
    istringstream iss(line);
    string word;

    while (iss >> word)
    {
      auto start = 
        word.find_first_not_of(":;.,'\"?!-");
      auto end = 
        word.find_last_not_of(":;.,'\"?!-");
      if (start != end)
        word.replace(end + 1, end, "");
        word.replace(0, start, "");
      if (word.empty()) continue;
      auto iter = index.find(word);
      if (iter == index.end())
      {
        index[word].push_back(lineno);
      }
      else
      {
        auto lines = iter->second;
        if (lines.back() == lineno)
          ; // ignore dups
        else
          lines.push_back(lineno);
      }
    }
  }
  // print the index
  for (auto entry : index)
  {
    cout << entry.first << ": ";
    string delim;
    for (auto line : entry.second)
    {
      cout << delim << line;
      delim = ", ";
    }
    cout << '\n';
  }
}
			
Listing 1

Critiques

Paul Floyd <m >

At first I thought that this was a fairly simple problem and that there would be very little to say.

Let’s start with the reason why only the first occurrences are being detected. This is due to the code that handles the detection of the second or subsequent occurrence:

  auto iter = index.find(word);
  if (iter == index.end())
  {
    index[word].push_back(lineno);
  }
  else
  {
    auto lines = iter->second;
    if (lines.back() == lineno)
      ; // ignore dups
    else
      lines.push_back(lineno);
  }

Specifically, auto lines infers a vector, so it makes a copy of the existing vector containing one entry. It then either does nothing or appends a line number to the end of this copy. The copy goes out of scope at the end of the closing brace, leaving the original unchanged.

This can be fairly easily fixed by making lines a reference. Simply declaring it as auto& lines will do the trick, or alternatively using an explicit declaration:

  using MyMap = std::map<std::string,
                std::vector<int>>;

then

  MyMap::mapped_type& lines = iter->second;

I don’t like the use of the variable index. index is so commonly used to mean loop index that I’d prefer to see something that doesn’t overload the meaning, like word_index.

The next code smell was the punctuation detection. This looks like it should be in a function. The choice of punctuation seems fairly arbitrary. Personally I would use ispunct(), and only use something different if necessary. There are a couple more things that I would avoid:

I would write this as:

  const char* myPunct = ":;.,'\"?!-";
  word.erase(0, word.find_first_not_of(myPunct));
  word.erase(word.find_last_not_of(myPunct)+1);

This avoids the need for local variables for the start and end of the string.

James Holland <James.Holland@babcockinternational.com>

The student has made a fair attempt at the design of this program. However, there are several problems which prevent the software working as required.

In addition, there are some issues that are more to do with style that the correct working of the program. I consider them worth discussing, nonetheless.

As mentioned above, the student’s program does not isolate words that are only separated by the delimiting characters. It may well be the case that there are many such words in a line of text. This suggests that a loop is required that isolates the words and only exits when there are no more in the line. Writing the code for a loop (in this case a while loop) can be tricky but after a little thought and some trial and error, I came up with the following program.

  #include <iostream>
  #include <map>
  #include <vector>
  int main()
  {
    using namespace std;
    const string delims(" \t:;.,'\"?!-");
    map<string, vector<int>> index;
    string line;
    int line_number{};
    while (getline(cin, line))
    {
      ++line_number;
      auto start_of_word = 
        line.find_first_not_of(delims);
      while (start_of_word != string::npos)
      {
        auto end_of_word = 
          line.find_first_of(delims,
            start_of_word);
        if (end_of_word == string::npos)
        {
          end_of_word = line.length();
        }
        const string word(line.substr(
          start_of_word,
          end_of_word - start_of_word));
        auto position_of_word = index.find(word);

        if (position_of_word == index.end())
        {
          index[word].push_back(line_number);
        }
        else
        {
          auto & lines = position_of_word->second;
          if (lines.back() != line_number)
          {
            lines.push_back(line_number);
          }
        }
        start_of_word = line.find_first_not_of(
          delims, end_of_word);
      }
    }
    for (const auto & entry : index)
    {
      cout << entry.first << ": ";
      string delim;
      for (const auto & line_number :
        entry.second)
      {
        cout << delim << line_number;
        delim = ", ";
      }
      cout << '\n';
    }
  }

As well as correctly selecting words from the line of text, this program has the advantage that there is no need to ‘top and tail’ words using replace() (or erase()). Also, experiments show that the revised program is about 20% faster that the student’s (ignoring inputting the data and outputting the results).

Commentary

While it is very nice to have two keen and regular supporters of the code critique, can I encourage you to have a go even if you’ve never entered the competition before? You can see your name in print and it is good practice for real code reviews!

There were, as Paul noted, a fair number of problems in a relatively simple-looking piece of code…

I think between them the entrants covered most of the points pretty well. The original presenting problem was due to naive use of auto. The design principle to bear in mind is that C++ uses value semantics in very many places (see for example Andrzej Krzemieński’s blog post at https://akrzemi1.wordpress.com/2012/02/03/value-semantics/). Hence the default behaviour of plain auto is to make a new value even when the original item is a reference. For reasons that are unclear to me, this behaviour seems unintuitive, at least initially, to many programmers who assume the compiler will give them the same type they would have written without the presence of auto.

When I use auto, I find myself writing auto const &, auto *, etc., a significant proportion of the time to either enforce or highlight (or both) the semantics of the generated variable.

The final bug was that the first call to replace uses the wrong value for the second argument (the number of characters to erase) – the code is currently written as:

  word.replace(end + 1, end, "");

but the actual number of characters that need to be removed is from position end+1 to the end of the string. However, passing end to the replace function will not cause any undefined behaviour, it just may not remove enough characters in some pathological cases. James’ solution side-steps this problem completely as using erase means the number of trailing characters does not need to be supplied.

The winner of CC 98

Both critiques were good but I think James covered a bit more ground, and also uncovered the design flaw that the original program does not handle embedded delimiters, so he wins the prize for this issue’s critique.

Code critique 99

(Submissions to scc@accu.org by Jun 1st)

I wanted to learn a bit about C++ threading so I tried writing a thread pool example. But it sometimes crashes – I’ve managed to get it down to a small example. Sometimes I get what I expected as output, for example:

     Worker done
     Worker done
     Ending thread #2
     Ending thread #0
     Worker done
     Ending thread #1
     Worker done
     Ending thread #3
     Worker done
     All done

But other times I get a failure, for example:

     Worker done
     Ending thread #0
     Worker done
     Awaiting thread #1
     Worker done
     W
     <crash>

I’m not sure what to do next – can you help?

The program is in Listing 2.

#include <algorithm>
using namespace std;
#include <array>
#include <chrono>
using namespace chrono;
#include <cstdlib>
#include <iostream>
#include <thread>

static const int POOL_SIZE = 4;

// Allow up to 4 active threads
array<thread, POOL_SIZE> pool;

// Example 'worker' -- would in practice
// perform some, potentially slow, calculation
void worker()
{
  this_thread::sleep_for(
    milliseconds(rand() % 1000));

  cout << "Worker done\n";
}
// Launch the thread functoid 't' in a new
// thread, if there's room for one
template <typename T>
bool launch(T t)
{
  auto it = find_if(pool.begin(), pool.end(),
    [](thread const &thr)
    { return thr.get_id() == thread::id(); }
  );
  if (it == pool.end())
  {
    // everyone is busy
    return false;
  }
  *it = thread([=]()
  {
    t();
    thread self;
    swap(*it, self);
    self.detach();
    cout << "Ending thread #"
      << (it - pool.begin()) << "\n";
  });
  return true;
}

int main()
{
  while (launch(worker))
  {}
  // And finally run one in this thread as an
  // example of what we do when the pool is full
  worker();

  for (auto & it : pool)
  {
    thread thread;
    swap(thread, it);
    if (thread.joinable())
    {
      cout << "Awaiting thread #"
        << (&it - &*pool.begin()) << "\n";
      thread.join();
    }
  }
  cout << "All done\n";
}
			
Listing 2

You can also get the current problem from the accu-general mail list (next entry is posted around the last issue’s deadline) or from the ACCU website (http://www.accu.org/journals/). This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.

Notes: 

More fields may be available via dynamicdata ..