Journal Articles

CVu Journal Vol 28, #6 - January 2017 + Programming Topics
Browse in : All > Journals > CVu > 286 (10)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Code Critique Competition 103

Author: Martin Moene

Date: 03 January 2017 21:20:42 +00:00 or Tue, 03 January 2017 21:20:42 +00:00

Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.

Body: 

Please note that participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.

Note: If you would rather not have your critique visible online, please inform me. (Email addresses are not publicly visible.)

Last issue’s code

I am trying to parse a simple text document by treating it as a list of sentences, each of which is a list of words. I'm getting a stray period when I try to write the document out:

-- sample.txt --

This is an example.

It contains two sentences.

     $ parse < sample.txt
     This is an example.
     It contains two sentences.
     .

Can you help fix this?

Listing 1 contains parse.cpp.

#include <iostream>
#include <sstream>
#include <memory>
#include <string>

// pointer type
template<typename t> using ptr =
  std::shared_ptr<t>;

// forward declarations
struct document;
struct sentence;
struct word;
document read_document(std::istream & is);
sentence read_sentence(std::istream & is);
void write_document(std::ostream & os,
  document const & d);
void write_sentence(std::ostream & os,
  sentence const & s);

// A document is a list of sentences
struct document
{
  ptr<sentence> first_sentence;
};
// A sentence is a list of words
struct sentence
{
  ptr<sentence> next_sentence;
  ptr<word> first_word;
};
struct word
{
  ptr<word> next_word;
  std::string contents;
};

// read a document a sentence at a time
document read_document(std::istream & is)
{
  sentence head;
  auto next = &head;
  std::string str;
  while (std::getline(is, str, '.'))
  {
    std::istringstream is(str);
    ptr<sentence> s(new sentence(
      read_sentence(is)));
    next->next_sentence = s;
    next = s.get();
  }
  document d;
  d.first_sentence = head.next_sentence;
  return d;
}

// read a sentence a word at a time
sentence read_sentence(std::istream & is)
{
  word head;
  auto next = &head;
  std::string str;
  while (is >> str)
  {
    ptr<word> w(new word{nullptr,str});
    next->next_word = w;
    next = w.get();
  } 
  sentence s;
  s.first_word = head.next_word;
  return s;
}

// write document a sentence at a time
void write_document(std::ostream & os,
  document const & d)
{
  for (auto s = d.first_sentence; s;
       s = s->next_sentence)
  {
    write_sentence(os, *s);
  }
}
// write sentence a word at a time
void write_sentence(std::ostream & os, sentence const & s)
{
  std::string delim;
  for (auto w = s.first_word; w;
       w = w->next_word)
  {
    std::cout << delim << w->contents;
    delim = ' ';
  }
  std::cout << '.' << std::endl;
}

int main()
{
  document d(read_document(std::cin));
  write_document(std::cout, d);
}
			
Listing 1

Critique

Paul Floyd

First, let’s see what happens when the code is compiled. Firstly clang++:

clang++ -g -Wall -Wextra -std=c++11 -stdlib=libc++ -pedantic -o cc102 cc102.cpp 
cc102.cpp:89:36: warning: unused parameter 'os' [-Wunused-parameter]
void write_sentence(std::ostream & os, sentence const & s)
                                   ^

Next, Oracle CC 12.5:

CC +w2 -g -std=c++14 -o cc102 cc102.cpp
"cc102.cpp", line 48: Warning: is hides the same name in an outer scope.
"cc102.cpp", line 88: Warning: os is defined but not used.
2 Warning(s) detected.

Both of these are relatively harmless. The ‘os’ error can be fixed by streaming to os rather than cout, i.e., change

  std::cout << delim << w->contents;

to

  os << delim << w->contents;

In this case, it makes no difference since write_sentence is called from write_document which is passed cout. However, if ever write_document were called with another ostream the program would encounter a ‘discovered bug’ (similar to ‘discovered check’ in chess).

The second warning can be removed by renaming the second is variable, e.g. to iss.

Neither of these affect the behaviour of the program as it stands. The issue is due to how std::getline works in the following loop:

 while (std::getline(is, str, '.'))
  {
    std::istringstream is(str);
    ptr<sentence> s(new sentence(
      read_sentence(is)));
    next->next_sentence = s;
    next = s.get();
  }

Basically, std::getline has 2 overloads (if I ignore the rvalue reference overloads). The first takes just an istream and a string and reads to the next newline (or eof). The second adds a character delimiter, and getline reads to this delimiter (or eof). Here we have the second overload. When it is called the first time, before the call we have the input like this:

  This is an example.\n

^ Stream pointer here

Then the read takes place and we read to the . delimiter:

  This is an example.\n

^ Stream pointer here (under newline).

So far so good. Or is it? The stream pointer is not pointing to the null termination at the end of the line but rather the newline character.

On the second call to getline, we read "\nIt contains two sentences" into str. The leading newline gets stripped when it gets streamed into the str string variable in read_sentence which is breaking the line into words.

After the second call to getline, the stream is as follows:

  It contains two sentences.\n

^ Stream pointer again under newline

There is then a third call to newline. This just reads the second newline up to eof, and this is the source of the spurious extra line in the output.

There are many ways that this could be fixed. One would be to use the getline overload without the character delimiter and to then extract the sentence up to the full stop. As a quick and simple solution, I just added an extra call to getline to eat the newline characters:

  while (std::getline(is, str, '.'))
  {
    std::istringstream iss(str);
    ptr<sentence> s(new sentence(
      read_sentence(iss)));
    next->next_sentence = s;
    next = s.get();
    std::getline(is, str);
  }

In terms of design, this code needs a lot more work to be able to handle other punctuation like question and exclamation marks, more than one sentence on a line, and sentences that span several lines.

Jim Segrave

The error in this program is trivial – the while condition std::getline(is, str, '.') will return true when you reach end of file. Adding a simple if(is.eof()) { break; } at the top of the loop will cause the while loop to terminate without trying to make a sentence from an empty read.

But there are other issues:

A minor one:

The code uses a large number of forward declarations. Rearranging the code so that struct word is defined before struct sentence and struct sentence before struct document, read_sentence() before read_document(), write_sentence() before write_document() allows all the forward declarations to be removed. If this were a multi-module program, the forward declarations would need to be in a header file, and would have some value. As it’s a single file, removing them removes duplication, so changing any of the struct definitions or function signatures has a smaller impact (DRY– don’t repeat yourself).

Another minor one:

write_document() and write_sentence() are passed an ostream parameter. write_document() simply passes it on to write_sentence(), but it’s not actually used there. It should be removed from both functions or the function should be altered to actually use the ostreams rather than cout.

Most importantly, the author has chosen to build his own singly-linked list. One has to ask ‘Why?’ While linked lists are good when you need to do inserts and deletes at random locations within the list, but otherwise they are a performance and memory waste (pointers which aren’t needed, lack of contiguous allocation).

This program simply needs to maintain the order of words encountered in sentences and sentences encountered in documents. For that, the std::vector class is ideal – it performs well, maintains the insert order and keeps the data contiguous.

If you are going to use a linked list in spite of this, there’s already a container type for this purpose which is likely to be implemented better and tested more thoroughly than a one-off version. But as I said, there’s no reason to use a linked list here.

Finally, why not use the make_shared library function when you want to create an object and get a shared pointer to it? To me at least, the resulting code is clearer.

I attach a version using vectors and make_shared (with a heavy use of auto to save typing and shared pointers for all the robustness they bring with them.

  #include <iostream>
  #include <sstream>
  #include <memory>
  #include <string>
  #include <vector>

  using word = std::string;
  using word_shp = std::shared_ptr<word>;
  using sentence_vec = std::vector<word_shp>;
  using sentence_vec_shp =
    std::shared_ptr<sentence_vec>;
  using doc_vec = std::vector<sentence_vec_shp>;
  using doc_shp = std::shared_ptr<doc_vec>;

  // read a sentence a word at a time, return a
  // vector of the words in it
  sentence_vec_shp read_sentence(
      std::istream & is) {
    std::string str;
    auto  s = std::make_shared<sentence_vec>();
    while (is >> str) {
      s->push_back(std::make_shared<word>(str));
    }
    if(s->size()) {
      return s;
    }
    // if no words found, don't return an empty 
    // vector, let it die here
    return nullptr;
  }

  // read a document a sentence at a time,
  // return a vector of the sentences in it 
  doc_shp read_document(std::istream & is) {
    std::string str;
    auto d = std::make_shared<doc_vec>();
    while (std::getline(is, str, '.')) {
      if(is.eof()) {
        break;
      }
      std::istringstream is(str);
      auto s = read_sentence(is);
      if(s) {
        d->push_back(s);
      }
    }
    if(d->size()) {
      return d;
    }
    // if no sentences found, don't return an 
    // empty vector, let it die here
    return nullptr;
  }

  // write a sentence one word at a time
  void write_sentence(
    const sentence_vec_shp & s) {
    std::string delim;
    for (auto wp: *s) {
      std::cout << delim << *wp;
      delim = ' ';
    }
    std::cout << '.' << std::endl;
  }

  // write document one sentence at a time
  void write_document(const doc_shp & d) {
    for (auto s : *d) {
      write_sentence(s);
    }
  }

  int main() {
    auto dp = read_document(std::cin);
    write_document(dp);
  }

James Holland

On first glance, the problem could be anywhere within the student’s code. In such a situation it is best to divide the code into roughly two equal parts and to attempt to discover which half contains the bug. This process is repeated until the defect is found. Fortunately, the student’s code is already divided into two parts; one that puts information into the linked list and another that reads it out.

An investigation revealed that the unwanted full stop is displayed because there are three sentences in the linked list; the last sentence being empty. It would appear that the linked list is being displayed properly and so it must be the part of the code that assembles the list that is at fault. Therefore, read_document() requires further investigation.

Closer inspection shows that the body of the while loop within read_document() is executing three times. This is despite there being only two sentences in the sample document. I think we are getting close to the root of the problem. It all hinges on what keeps the while loop executing.

The controlling clause of the while loop consists solely of the function std::getline() and so the loop will keep executing for as long as the value returned by std::getline() can be evaluated as true. std::getline() returns its first parameter which, in our case, is of type std::iostream. A Boolean function is defined for std::iostream that returns true if the stream has not failed. The function does not consider the end of file being encountered as a failure. This has the effect that when std::getline() is attempting to read the third sentence (that does not exist) no fault is reported (despite eof being set) and the loop body is executed for a third time. No characters are read into str, with the result that a blank sentence is added to the linked list. This explains why an unwanted full stop appears when the content of the linked list is displayed.

What is needed is for read_document()’s while loop to stop executing as soon as eof is encountered. Possibly the simplest way to achieve this is to modify the while loop controlling statement by calling good() as shown below.

  while (std::getline(is, str, '.').good())

good() returns false when the stream has failed or when eof is encountered. This is exactly what is required. The student’s code will now behave as expected.

There are some aspects of the use of std::shared_ptr that should be drawn to the student’s attention. When using a shared pointer to point to a newly constructed object, it is best to use std::make_shared<>() as it has the advantages of exception safety and executes more quickly. I suggest the second line of read_document()’s while loop should be replaced by

  auto s(std::make_shared<sentence>(
    sentence(read_sentence(is)))); 

and the first statement of read_sentence()’s while loop should be replaced by

  auto w(std::make_shared<word>(
    word{nullptr, str}));

It is, however, questionable as to whether std::shared_ptr is required in the student’s program. std::unique_ptr is an alternative that should be considered as it is faster and occupies less memory. Unfortunately, it is not possible to simply replace std::shared_ptr by std::unique_ptr within the students code, mainly because copying std::unique_ptr is not permitted. Some restructuring of the code would be necessary which I have not undertaken. Instead, I propose a different approach.

From reading the student’s code, it is clear that extensive use of linked lists is made. While it is an interesting challenge to construct linked lists from first principles, there is no need as the C++ standard library provides such containers ready to use. Doubly linked list have been available since C++98 and singly linked lists since C++11. The advantages of using the library linked lists include the following.

I have rewritten the student’s program to use standard library linked lists. In keeping with the student’s approach I have used std::forward, a singly linked list.

It is a simple matter to add elements to the front of an std::forward list; push_front() is provided for that. Unfortunately, that results in the list being populated in the reverse order. The last sentence of the document would be printed first and the last word of a sentence would be printed first. What is needed is for elements to be added to the back of the list. This is not quite so easy as there is no such function as push_back() for std::forward lists.

It may seem slightly odd that std::forward does not provide a function to add an item to the back of the list but this omission results from the desire for std::forward to be as memory efficient as possible. All is not lost. We simply have to keep track of where in the list to insert the next element. When there are no elements in the list, before_begin() provides the appropriate location. When there are elements in the list, it is always the location of the element at the back of the list that is required. This location is conveniently returned by insert_after(); the function used to insert elements at the back of the list. Given this, a simple while loop can be constructed that adds the required elements to the std::forward lists. I use this technique twice in my version of the program, once in read_sentence() and once in read_document() as shown below.

  #include <forward_list>
  #include <iostream>
  #include <sstream>
  #include <string>
  using Sentence =
    std::forward_list<std::string>;
  using Document = std::forward_list<Sentence>;
  Sentence read_sentence(std::istream & is)
  {
    Sentence sentences;
    auto position = sentences.before_begin();
    std::string str;
    while (is >> str)
    {
      position =
        sentences.insert_after(position, str);
    }
    return sentences;
  }
  Document read_document(std::istream & is)
  {
    Document document;
    auto position = document.before_begin();
    std::string str;
    while (std::getline(is, str, '.').good())
    {
      std::istringstream is(str);
      position = document.insert_after(position, 
        read_sentence(is));
    }
    return document;
  }
  void write_document(const std::ostream & os,
    const Document & document)
  {
    for (const auto & sentence : document)
    {
      std::string delimiter;
      for (const auto & word : sentence)
      {
        std::cout << delimiter << word;
        delimiter = ' ';
      }
      std::cout << '.' << std::endl;
    }
  }
  int main()
  {
    Document d(read_document(std::cin));
    write_document(std::cout, d);
  }

Commentary

There were a number of different problems with the code presented, and I think that between them the authors of the critiques covered most of the issues.

As both Jim and James noted, the writer of the code has implemented their own singly linked list; there is no need to do this nowadays! Additionally, there is no separation of concerns with the implementation embedded in the code – if a special sort of list were required it would probably be better to write a separate class to do the list management, templatized on the contained type.

One problem that no-one commented on is that the code contains a potential bug when exiting main. Naive use of smart pointers to build up complex data structures can cause stack overflow on destruction as the destruction of the tree of objects can consume a large amount of stack. In this case the document object d is the root object, its destruction results in calling the destructor of first_sentence, which then destroys the next_sentence member of this object, and so on, recursively down the list of objects.

The bug might be avoided if the compiler is able to perform some tail-call optimisation in the destructor, but this is not possible in all cases.

For example, I found this example crashed for large input when compiled without optimisation with g++ (64-bit) and MSVC (both 32-bit and 64-bit). Enabling optimisation resolved the crash for gcc, and for one of the MSVC builds. This is not desirable behaviour!

So, how might you solve this problem?

In order to resolve the stack overflow you need to write an explicit destructor that iterates through the objects rather than using recursion.

For example, in this case:

document::~document()
{
  while (first_sentence)
  {
    first_sentence =
      first_sentence->next_sentence;
  }
}

We also ought to make a similar change for the list of words in sentence, in case we try processing James Joyce’s Ulysses!

This is an unfortunate issue with using smart pointers for managing object graphs, especially as debugging the root cause of a stack overflow can be quite hard.

(Herb Sutter’s talk at CppCon this year touches on some other options that solve the same sort of problem.)

The winner of CC 102

I was amused to note that while all three critiques found and fixed the problem they used three slightly different techniques. It is surprisingly hard to use C++ standard input correctly and the range of possible solutions makes it less obvious when a given piece of code is correct.

Paul gave a fairly detailed explanation of the presenting problem with the original code, which would hopefully make the problem and its solution clear to the writer of the code.

Each critique went on to cover additional problems; these included

I liked James’ direction in using std::forward_list; this solution has the additional benefit of (silently) solving the stack overflow problem I discuss in my commentary. Hence, by a short head, I have awarded him the prize for this critique.

As always, thank you to all those who entered the competition!

Code critique 103

(Submissions to scc@accu.org by Dec 1st)

I am trying to keep track of a set of people’s scores at a game and print out the highest scores in order at the end: it seems to work most of the time but occasionally odd things happen...

Can you see what’s wrong?

The code – scores.cpp – is in Listing 2.

#include <functional>
#include <iostream>
#include <map>
#include <sstream>
#include <unordered_map>

// Best scores
std::multimap<int, std::string, std::less<>> best_scores;

// Map people to their best score so far
std::multimap<int, std::string>::iterator typedef entry;
std::unordered_map<std::string, entry> peoples_scores;
entry none;

void add_score(std::string name, int score)
{
  entry& current = peoples_scores[name];
  if (current != none)
  {
     if (score <= current->first)
     {
       return; // retain the best score
     }
     best_scores.erase(current);
  }
  current = best_scores.insert({score, name});
}

void print_scores()
{
   // top down
   for (auto it = best_scores.end();
        it-- != best_scores.begin(); )
   {
      std::cout << it->second << ": "
        << it->first << '\n';
   }
}

int main()
{
  for (;;)
  {
    std::cout << "Enter name and score: ";
    std::string lbufr;
    if (!std::getline(std::cin, lbufr)) break;
    std::string name;
    int score;
    std::istringstream(lbufr)
      >> name >> score;
    add_score(name, score);
  }
  std::cout << "\nBest scores\n";
  print_scores();
}
			
Listing 1

You can also get the current problem from the accu-general mail list (next entry is posted around the last issue’s deadline) or from the ACCU website (http://accu.org/index.php/journal). This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.

Notes: 

More fields may be available via dynamicdata ..