Journal Articles

CVu Journal Vol 28, #2 - May 2016 + Programming Topics

Browse in :

All > Journals > CVu > 282 (9)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Code Critique Competition 99

Author: Martin Moene

Date: 03 May 2016 09:41:42 +01:00 or Tue, 03 May 2016 09:41:42 +01:00

Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.

Body:

Participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.

Note: If you would rather not have your critique visible online, please inform me. (Email addresses are not publicly visible.)

Last issueâ€™s code

I am writing a simple program to index written text but it doesnâ€™t quite work. I want to print out every word in the document with a list of each line it appears on. Iâ€™m only getting the first occurrence listed but canâ€™t work out why.

Please explain why they have this problem... and suggest some other possible improvements to the program. The program is in Listing 1.

#include <iostream>
#include <map>
#include <sstream>
#include <vector>
int main()
{
  using namespace std;
  map<string, vector<int>> index;
  // read and index standard input
  string line;
  int lineno{};
  while (getline(cin, line))
  {
    ++lineno;
    istringstream iss(line);
    string word;

    while (iss >> word)
    {
      auto start = 
        word.find_first_not_of(":;.,'\"?!-");
      auto end = 
        word.find_last_not_of(":;.,'\"?!-");
      if (start != end)
        word.replace(end + 1, end, "");
        word.replace(0, start, "");
      if (word.empty()) continue;
      auto iter = index.find(word);
      if (iter == index.end())
      {
        index[word].push_back(lineno);
      }
      else
      {
        auto lines = iter->second;
        if (lines.back() == lineno)
          ; // ignore dups
        else
          lines.push_back(lineno);
      }
    }
  }
  // print the index
  for (auto entry : index)
  {
    cout << entry.first << ": ";
    string delim;
    for (auto line : entry.second)
    {
      cout << delim << line;
      delim = ", ";
    }
    cout << '\n';
  }
}

Listing 1

Critiques

Paul Floyd <m >

At first I thought that this was a fairly simple problem and that there would be very little to say.

Letâ€™s start with the reason why only the first occurrences are being detected. This is due to the code that handles the detection of the second or subsequent occurrence:

  auto iter = index.find(word);
  if (iter == index.end())
  {
    index[word].push_back(lineno);
  }
  else
  {
    auto lines = iter->second;
    if (lines.back() == lineno)
      ; // ignore dups
    else
      lines.push_back(lineno);
  }

Specifically, auto lines infers a vector, so it makes a copy of the existing vector containing one entry. It then either does nothing or appends a line number to the end of this copy. The copy goes out of scope at the end of the closing brace, leaving the original unchanged.

This can be fairly easily fixed by making lines a reference. Simply declaring it as auto& lines will do the trick, or alternatively using an explicit declaration:

  using MyMap = std::map<std::string,
                std::vector<int>>;

then

  MyMap::mapped_type& lines = iter->second;

I donâ€™t like the use of the variable index. index is so commonly used to mean loop index that Iâ€™d prefer to see something that doesnâ€™t overload the meaning, like word_index.

The next code smell was the punctuation detection. This looks like it should be in a function. The choice of punctuation seems fairly arbitrary. Personally I would use ispunct(), and only use something different if necessary. There are a couple more things that I would avoid:

start and end variable names, too confusing when there are functions and members with the same name in the standard library
replace with an empty string; erase does the same thing and is more idiomatic (at least if you think of std::string as being a container rather than a string)

I would write this as:

  const char* myPunct = ":;.,'\"?!-";
  word.erase(0, word.find_first_not_of(myPunct));
  word.erase(word.find_last_not_of(myPunct)+1);

This avoids the need for local variables for the start and end of the string.

James Holland <James.Holland@babcockinternational.com>

The student has made a fair attempt at the design of this program. However, there are several problems which prevent the software working as required.

Braces should surround the two statements following the first if statement. Both replace() functions need to be dependant on the if statement, not just the first as is the case with the studentâ€™s code. To avoid this type of error it is best to get into the habit of always enclosing conditional parts of an if statement in braces even if they consist of only one statement.
The first statement of the else block of the third if statement defines a variable named lines. This should be a reference type so that the vector within the map can be modified. As it stands, additional line numbers are added to a local copy of the lines vector and not the lines vector within index.
The program does not select words that are separated only by the delimiting characters (":;.,"\?!-") and not a white space. Presumably, the program should interpret a line containing "name::space", for example, as two words. Unfortunately, this problem requires some fairly major restructuring of the code to solve and will be discussed later.

In addition, there are some issues that are more to do with style that the correct working of the program. I consider them worth discussing, nonetheless.

The delimiting characters have been defined twice. It would be better to declare a const string that is initialised to the required value and used instead of the repeated literal strings. Any required change to the delimiting characters need then only be done in one place.
The student makes use of stringâ€™s replace() member function. This is confusing because it gives the impression that some characters of the string are being replaced by other characters. What is really happening is that characters are being erased. It would be better to use one of the overloaded erase() functions to clarify the code.
The variables entry and line that are used to print the index need not be copies; they need only be const references. This is unlikely to make a significant speed increase when writing to the console, but it does make it clear that no modifications are being made to index.

As mentioned above, the studentâ€™s program does not isolate words that are only separated by the delimiting characters. It may well be the case that there are many such words in a line of text. This suggests that a loop is required that isolates the words and only exits when there are no more in the line. Writing the code for a loop (in this case a while loop) can be tricky but after a little thought and some trial and error, I came up with the following program.

  #include <iostream>
  #include <map>
  #include <vector>
  int main()
  {
    using namespace std;
    const string delims(" \t:;.,'\"?!-");
    map<string, vector<int>> index;
    string line;
    int line_number{};
    while (getline(cin, line))
    {
      ++line_number;
      auto start_of_word = 
        line.find_first_not_of(delims);
      while (start_of_word != string::npos)
      {
        auto end_of_word = 
          line.find_first_of(delims,
            start_of_word);
        if (end_of_word == string::npos)
        {
          end_of_word = line.length();
        }
        const string word(line.substr(
          start_of_word,
          end_of_word - start_of_word));
        auto position_of_word = index.find(word);

        if (position_of_word == index.end())
        {
          index[word].push_back(line_number);
        }
        else
        {
          auto & lines = position_of_word->second;
          if (lines.back() != line_number)
          {
            lines.push_back(line_number);
          }
        }
        start_of_word = line.find_first_not_of(
          delims, end_of_word);
      }
    }
    for (const auto & entry : index)
    {
      cout << entry.first << ": ";
      string delim;
      for (const auto & line_number :
        entry.second)
      {
        cout << delim << line_number;
        delim = ", ";
      }
      cout << '\n';
    }
  }

As well as correctly selecting words from the line of text, this program has the advantage that there is no need to â€˜top and tailâ€™ words using replace() (or erase()). Also, experiments show that the revised program is about 20% faster that the studentâ€™s (ignoring inputting the data and outputting the results).

Commentary

While it is very nice to have two keen and regular supporters of the code critique, can I encourage you to have a go even if youâ€™ve never entered the competition before? You can see your name in print and it is good practice for real code reviews!

There were, as Paul noted, a fair number of problems in a relatively simple-looking piece of codeâ€¦

I think between them the entrants covered most of the points pretty well. The original presenting problem was due to naive use of auto. The design principle to bear in mind is that C++ uses value semantics in very many places (see for example Andrzej KrzemieÅ„skiâ€™s blog post at https://akrzemi1.wordpress.com/2012/02/03/value-semantics/). Hence the default behaviour of plain auto is to make a new value even when the original item is a reference. For reasons that are unclear to me, this behaviour seems unintuitive, at least initially, to many programmers who assume the compiler will give them the same type they would have written without the presence of auto.

When I use auto, I find myself writing auto const &, auto *, etc., a significant proportion of the time to either enforce or highlight (or both) the semantics of the generated variable.

The final bug was that the first call to replace uses the wrong value for the second argument (the number of characters to erase) â€“ the code is currently written as:

  word.replace(end + 1, end, "");

but the actual number of characters that need to be removed is from position end+1 to the end of the string. However, passing end to the replace function will not cause any undefined behaviour, it just may not remove enough characters in some pathological cases. Jamesâ€™ solution side-steps this problem completely as using erase means the number of trailing characters does not need to be supplied.

The winner of CC 98

Both critiques were good but I think James covered a bit more ground, and also uncovered the design flaw that the original program does not handle embedded delimiters, so he wins the prize for this issueâ€™s critique.

Code critique 99

(Submissions to scc@accu.org by Jun 1st)

I wanted to learn a bit about C++ threading so I tried writing a thread pool example. But it sometimes crashes â€“ Iâ€™ve managed to get it down to a small example. Sometimes I get what I expected as output, for example:

     Worker done
     Worker done
     Ending thread #2
     Ending thread #0
     Worker done
     Ending thread #1
     Worker done
     Ending thread #3
     Worker done
     All done

But other times I get a failure, for example:

     Worker done
     Ending thread #0
     Worker done
     Awaiting thread #1
     Worker done
     W
     <crash>

Iâ€™m not sure what to do next â€“ can you help?

The program is in Listing 2.

#include <algorithm>
using namespace std;
#include <array>
#include <chrono>
using namespace chrono;
#include <cstdlib>
#include <iostream>
#include <thread>

static const int POOL_SIZE = 4;

// Allow up to 4 active threads
array<thread, POOL_SIZE> pool;

// Example 'worker' -- would in practice
// perform some, potentially slow, calculation
void worker()
{
  this_thread::sleep_for(
    milliseconds(rand() % 1000));

  cout << "Worker done\n";
}
// Launch the thread functoid 't' in a new
// thread, if there's room for one
template <typename T>
bool launch(T t)
{
  auto it = find_if(pool.begin(), pool.end(),
    [](thread const &thr)
    { return thr.get_id() == thread::id(); }
  );
  if (it == pool.end())
  {
    // everyone is busy
    return false;
  }
  *it = thread([=]()
  {
    t();
    thread self;
    swap(*it, self);
    self.detach();
    cout << "Ending thread #"
      << (it - pool.begin()) << "\n";
  });
  return true;
}

int main()
{
  while (launch(worker))
  {}
  // And finally run one in this thread as an
  // example of what we do when the pool is full
  worker();

  for (auto & it : pool)
  {
    thread thread;
    swap(thread, it);
    if (thread.joinable())
    {
      cout << "Awaiting thread #"
        << (&it - &*pool.begin()) << "\n";
      thread.join();
    }
  }
  cout << "All done\n";
}

Listing 2

You can also get the current problem from the accu-general mail list (next entry is posted around the last issueâ€™s deadline) or from the ACCU website (http://www.accu.org/journals/). This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.

Notes:

More fields may be available via dynamicdata ..