Journal Articles

CVu Journal Vol 27, #1 - March 2015 + Programming Topics + Student Code Critiques from CVu journal.
Browse in : All > Journals > CVu > 271 (11)
All > Topics > Programming (877)
All > Journal Columns > Code Critique (70)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Code Critique Competition 92

Author: Martin Moene

Date: 02 March 2015 21:16:46 +00:00 or Mon, 02 March 2015 21:16:46 +00:00

Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.

Body: 

Participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.

Note: If you would rather not have your critique visible online, please inform me. (We will remove email addresses!)

Last issue’s code

I’m trying to migrate my skill set from C to C++ so thought I’d get started with a simple program to fill in a set of strings and print them. But I’m getting a compilation problem on the call to std::copy that makes no sense to me, although I thought I’d copied it from some working code on the Internet. Can you help me get it to compile?

Can you help this programmer to get past this presenting problem and help them to identify any other issues with their code? The code is in Listing 1.

#include <iostream>
#include <iterator>
#include <set>

// compare *contents* not raw pointers
bool string_less(char const *, char const *);

template <class T, class U>
void test(std::set<T, U> s, T p)
{
  if (s.insert(p).second)
    std::cout << "Added " << p << std::endl;
  else
    std::cout << p << " already present";
}

int main()
{
  std::set<char const *,
    decltype(&string_less)> s(string_less);

  test(s, "A");
  test(s, "B");
  test(s, "AB");

  std::copy(s.begin, s.end,
    std::ostream_iterator<
      char const *>(std::cout , " "));
  return 0;
}

bool string_less(const char *s, const char *t)
{
  while (*s == *t)
  {
    s++;
    t++;
  }
  return(*s < *t);
}
			
Listing 1

Critiques

Jim Segrave

There are a few problems with this code:

The test() function is passed the set by value, rather than by reference, so the result of inserting a new string is lost when the function returns.

Every invocation of test() is done with a copy of the empty set as it was first constructed in main(). Changing this to take the set by reference:

   void test(std::set<T, U>& s, T p)
                           ^

will cause the set s in main() to be updated as each string is inserted.

A minor point – there’s no std::endl on the print when the string passed to test() is already present, so the following invocation of test() appends its output on the same line.

In main(), the invocation of copy() should be passed the return values of the member functions begin() and end(). The member name, standing alone is a pointer to the function, not an invocation of the function to get the iterators for begin and end. clang++ prints a reasonably clear pair of error messages:

x.cpp:22:15: error: reference to non-static member function must be called; did you mean to call it with no arguments?
  std::copy(s.begin, s.end, 
            ~~^~~~~
                   ()
x.cpp:22:24: error: reference to non-static member function must be called; did you mean to call it with no arguments?
  std::copy(s.begin, s.end, 
                     ~~^~~

clang++ then generates a working executable by treating it as though the parentheses had been present. It is arguable whether it should be this helpful or should treat this as a fatal error and not produce an executable.

g++ produces an accurate but rather opaque pair of error messages. Within them is the information you need, but it can be hard to see what it’s trying to tell you is wrong:

/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/include/g++-v4/bits/stl_algobase.h:450:5: note: _OI std::copy(_II, _II, _OI) [with _II = std::_Rb_tree_const_iterator<const char*> (std::set<const char*, bool (*)(const char*, const char*)>::*)()const noexcept (true); _OI = std::ostream_iterator<const char*>]
     copy(_II __first, _II __last, _OI __result)
     ^
/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/include/g++-v4/bits/stl_algobase.h:450:5: note:   no known conversion for argument 1 from '<unresolved overloaded function type>' to 'std::_Rb_tree_const_iterator<const char*> (std::set<const char*, bool (*)(const char*, const char*)>::*)()const noexcept (true)'

Which does tell you that it can’t convert the pointer to function end into an iterator. The compile fails.

When this is fixed and the print succeeds, there is no std::endl, so the list of set members will spill onto the command line prompt at program exit.

Finally, the string_less() function has a severe problem:

  bool string_less(const char *s, const char *t)
  {
    while (*s == *t)
    {
      s++;
      t++;
    }
    return(*s < *t);
  }

What happens when the strings s and t are identical? The while loop carries on past the terminating nul character and your program is off into undefined behaviour. After fixing the parameter to test() to be a reference, try duplicating the first invocation of test(s, "A"). On my machine this led to an immediate SIG_SEGV and core dump.

The fix is simple – before checking if *s == *t, check that *s is not zero:

  while(*s && (*s == *t))

And to close, not a critique, but whenever I see something like:

  test(s, "A");
  test(s, "AB");
  tets(s, "B");

I think to myself that there ought to be a better way to do this so that adding or removing tests is less work.

If we use an array of pointers to const char, we can iterate through the array running the test on each char *:

  const char * test_vec[] = {"A", "A", "AB",
   "B", "", "ABC", "", "foo", "\xe2\x82\xac"};
  for(auto sp: test_vec) {
    test(s, sp);
  }

Now adding or removing test cases involves only adding or removing a quoted string in test_vec’s initialiser list.

My resulting version of this program:

  #include <iostream>
  #include <iterator>
  #include <set>
  
  // compare *contents* not raw pointers
  bool string_less(char const *, char const *);
  
  template <class T, class U>
  void test(std::set<T, U>& s, T p)
  {
    if (s.insert(p).second)
      std::cout << "Added " << p << std::endl;
    else
      std::cout << p << " already present"
        << std::endl;
  }
  
  int main()
  {
    std::set<char const *,
      decltype(&string_less)> s(string_less);
    const char * test_vec[] = {"A", "A", "AB",
     "B", "", "ABC", "", "foo", "\xe2\x82\xac"};
    for(auto sp: test_vec) {
      test(s, sp);
    }
    std::copy(s.begin(), s.end(),
              std::ostream_iterator<
                char const *>(std::cout , " "));
    std::cout << std::endl;
    return 0;
  }
  
  bool string_less(const char *s, const char *t)
  {
    while (*s && (*s == *t)) {
      s++;
      t++;
    }
    return(*s < *t);
  }

Tom Björkholm

When reading the code of Code Critique 91 there are five problems in the code that immediately come to mind: 1. The function test() inserts into a local copy of the set 2. There is not much use having test() as a template here 3. The set stores pointers not objects. 4. The parenthesis are missing on s.begin() and s.end() 5. string_less() comparison will fail for identical strings

Let’s look into them in order.

1. The function test() inserts into a local copy of the set

As the function test is defined

  void test(std::set<T, U> s, T p)

the arguments are passed by value. I.e. inside the function test the variables s and p hold copies of the values in the calling function. Adding a value to the copy of the set inside test, does not affect the set in main. The set must be passed by reference void test(std::set<T, U> & s, T p)

2. There is not much use having test() as a template here

The function test is only called with one type. Templates are used if you want to express code so that it can be used with many types. As there is only one type here, you can just as well write it as normal function:

  void test(std::set<const char *,
    decltype(&string_less)> & s, const char * p)

Especially for a novice C++ programmer it is easier to understand the C++ compiler errors in a normal function than in a template function.

We will get back to this function as changes below will introduce more changes to it.

3. The set stores pointers not objects

The C++ standard containers (like set) are intended to store values. Here the set store pointer (i.e. pointers to characters interpreted as C style strings). This is not good. If the pointed to objects (characters) goes out of scope, we will be left with dangling pointers (pointing to the memory location where the objects once was).

In this particular case the code does not crash (when using the clang compiler on OS X), because the C style strings happen to be compile time constants that remain at their memory location.

To fix this use the C++ std::string to store the strings (real values), instead of just pointers. This is easy.

Just change the first line of main() to

  std::set<std::string> s;

Add the line

  #include <string>

change the test function to

  void test(std::set<std::string> & s,
    const std::string & p)

and change the ostream_iterator to

  std::ostream_iterator<std::string>

You will see the complete code together with other fixes below.

4. The parenthesis are missing on s.begin() and s.end()

The algorithm std::copy takes iterators as arguments. The member functions begin() and end() of the set return an iterator to the beginning and to the end (one past the last element) of the set. Thus, you have to call begin() and end() to get the iterators to pass to std::copy. When you leave out the parenthesis you pass the function pointers to copy instead of the iterators. Write that line as

  std::copy(s.begin(), s.end(),
    std::ostream_iterator<std::string>(std::cout,
    " "));

5. string_less() comparison will fail for identical strings

If you have followed my advice to let the set store string objects instead of character pointers, then the function string_less will not be needed at all. If you for some reason want to have a string_less for C style strings, then the loop must also handle the case of identical strings. Add a test for terminating null byte.

  while ((*s == *t) && (*s != 0))

However, I do recommend that the set should store std::string instead of pointers. Using std::set<std::string> the complete code will be:

  #include <iostream>
  #include <iterator>
  #include <set>
  #include <string>
  void test(std::set<std::string> & s,
    const std::string & p)
  {
      if (s.insert(p).second) {
          std::cout << "Added " << p << std::endl;
      } else {
          std::cout << p << " already present"
            << std::endl;
      }
  }
  int main()
  {
    std::set<std::string> s;
    test(s, "A");
    test(s, "B");
    test(s, "AB");
    test(s, "A");
    std::copy(s.begin(), s.end(),
      std::ostream_iterator<std::string>(
        std::cout, " "));
    return 0;
  }

Or if you want to have test() as a template function (so that you can brag that you have written a template function yourself ;-)

  template <typename T , typename U> void test(
    T & s, const U & p) {
    if (s.insert(p).second) {
      std::cout << "Added " << p << std::endl;
    } else {
      std::cout << p << " already present"
        << std::endl;
    }
  }

James Holland

I think either the student made a mistake when copying (by retyping) the code or the Internet code was of very poor quality. Either way, the problem with the std::copy function lies in the first two parameters. The names are correct but they should be member functions of s not data members. This is simply corrected by adding an empty parameter list to the names as shown below.

  std::copy(s.begin(), s.end(),
    std::ostream_iterator<char const *>
      (std::cout, " "));

One can spend ages staring at the code in the hope of finding this type of error. As the student suggested, the error message was not all that helpful. At least now the code compiles without error and runs. There are, however, some remaining problems.

One thing I found disconcerting is the definition of string_less(). If two identical strings are compared, the string_less() will attempt to access memory beyond the end of the strings. When attempting to confirm this, I was surprised to find that it was never called. So I gave up, for the time being, worrying about correcting string_less() and started to track down why it was never called. The problem lies with the test() function or, more specifically, with its first parameter. The set is being passed into test() by value. In other words, a copy of the set is being made and it is the copy into which test() inserts the string. After test() displays whether or not the string was inserted, the copy of the set is destroyed, leaving the original set without the string being inserted. In fact the original set always remains empty. This has two consequences. Firstly, test() will always report that the string is being added. This is because, as the original set never has a string inserted, the copy will always be empty on entry to test(). Secondly, and more importantly, it explains why string_less() never gets called. When inserting a string into an empty set, there is no need to compare it with existing strings, as there are none, and so string_less() is not called.

To correct the problem with test(), the set should be passed by reference, not by value. This will result in the set (and now there is only one set) being inserted with strings as expected. The signature of test() is now as shown below.

  void test(std::set<T, U> & s, T p)

This brings me back to my concerns I had regarding the behaviour of string_less(). Correctly writing this function from first principles is a bit tricky and so I won’t attempt it here. Fortunately, there is a standard function that almost does what we want, namely strcmp(). This function compares two strings and returns a negative integer if the first string comes before the second string, zero if the two strings are equal and a positive number if the first string comes after the second string. I am sure the student knows that, being a C programmer. All that needs to be done is to determine whether the value returned from strcmp() is less than zero. string_less() can, therefore, be rewritten as shown below.

  bool string_less(const char *s, const char *t)
  {
    return strcmp(s, t) < 0;
  }

The test() function is still not quite right as the statement that informs us that the string is already present should print a ‘new line’ after printing the text. Once this is corrected, the program is in a state where it will work as expected.

The code is, however, quite involved and not all that intuitive. The main reason for this is that text is represented by C-style strings. This causes difficulties, especially when storing them in a container, such as an std::set, as is done here. It is far better to use C++’s std::string for representing text. The main advantage is that std::string has the < operator built in and so std::strings can be easily compared. This removed the need to write a bespoke comparison function as was the case in the original code.

As the student is just beginning to learn C++, it is best to start with a simple example. I feel that defining test() as a function template only serves to complicate matters at this stage. Templates can be tackled at a later date. It would appear from the code example that the student is interested in the use and storage of strings. This is a good place to start and the example can be modified to demonstrate this. All that remains to do is to remove the template clause from test(), to change template parameter of the ostream_iterator object passed to the copy function from char const * to const std::string, and to add the #include directive for std::string. The resultant program is listed below.

  #include <string>
  #include <iostream>
  #include <iterator>
  #include <set>
  void test(std::set<std::string> & s,
    std::string p)
  {
    if (s.insert(p).second)
      std::cout << "Added " << p << std::endl;
    else
      std::cout << p << " already present"
        << std::endl;
  }
  int main()
  {
    std::set<std::string> s;
    test(s, "A");
    test(s, "B");
    test(s, "AB");
    test(s, "AB");
    std::copy(s.begin(), s.end(),
      std::ostream_iterator<std::string>
        (std::cout, " "));
    return 0;
  }

Getting code examples from questionable websites is probably not the best way to learn C++. One site that is a good starting point is isocpp.org from where excellent books, for novices and for those with more experience, are suggested. Everyone should join ACCU, of course, for news of conferences, book reviews and much more.

Alex Paterson

Quick solution: fixing the code

Ok, moving quickly on this one, because solving the coding errors isn’t the real issue here:

The call to s.begin should be s.begin() in the std::copy line. Ditto for s.end.

The implementation of string_less does not check for the end of the string. To be correct it should check for null, assuming that we are dealing with null-terminated strings. E.g.

  bool string_less(const char *s, const char *t)
  {
    while ((*s == *t) && (*s != 0) && (*t != 0))
    {
      s++;
      t++;
    }
    return(*s < *t);
  }

However, there is already a method that does this for us, strcmp, so we can reduce string_less to:

  bool string_less(const char *s, const char *t)
    { return strcmp(s,t) < 0; }

Finally, the test() method should take its first parameter (the container) by reference (&) and the second parameter (the value) by either const-reference (const&) or rvalue (&&). As it currently stands, the method is taking a copy of the container and adding the string to the copy, which has no effect on the container declared in main().

Long solution: fixing the design

Right, now let’s get onto the real problem. I know this is probably only a piece of code for experimental purposes, but in the C++ world, we really should be taking advantage of RAII (Resource Acquisition Is Initialisation) and moving beyond dealing with naked pointers (a.k.a. raw or dumb pointers) – they should only be used as a last resort, say when we need to interface with an API for an external library or operating system call. Don’t get me wrong, syntactically it’s C++ alright, but dealing with naked pointers like this means that it is not quite in the true spirit of C++.

What’s wrong with naked pointers?

The main issue with naked pointers is making sure that the memory is freed only after we’re finished with it and not before; freeing memory before we’re finished using it is an error and will result in undefined behaviour, whilst not freeing the memory at all leaks memory. Whilst this might sound trivial when considering memory allocation within a single function, it becomes more of a problem when memory allocation and deallocation is split across different functions, classes or libraries.

Moving away from pointers is a crucial concept to move from the low-level world of assembly and C into a the higher level abstraction of C++ and object-oriented programming in general. In the case of strings, the standard C++ library allows us to switch from const char * to std::string to automatically manage the dynamic allocation and deallocation of strings. Similarly, it also allows us to deal with strings in a more abstract sense, so we can iterate over the characters of a string until we reach the ‘end’ rather than previously checking for a ‘null character’. Specifically to this case, we can take advantage of the fact that operator< is defined for the std::string class, so we don’t need to write our own function.

Encapsulation

Encapsulation is a concept that is more prevalent in C++ than C, I guess mainly due to C++ being an object-oriented language. In the problem code extract, the test method is separated from the container, but really in an object-oriented domain, it should be tied to the collection type that it refers to, as shown below.

Whilst this adds many extra lines, it helps to provide encapsulation, where implementation detail is hidden and provides the key concepts of high cohesion and low coupling. The cohesion comes from having the test method and print output encapsulated in the TestContainer type. The low coupling comes from removing the container type information from the main method as well as the detail of how to print out the container to std::cout.

To aid readability (and therefore maintainability), the conditional in the test method could be replaced with a separate method, making the logic clear. I must admit that I wasn’t aware that set::insert returned a pair<> that could be used to determine whether the value was added or not, so it is probably beneficial to highlight this in the code to aid other programmers who read it (and ourselves when we read it again months or years later).

Improved code, OO style

  #include <iostream>
  #include <iterator>
  #include <set>
  template<typename ValueType>
  class TestContainer
  {
  public:
    TestContainer& insert(const ValueType& v)
    {
      if (WasInserted(m_set.insert(v)))
        std::cout << "Added " << v << std::endl;
      else
        std::cout << v << " already present"
          << std::endl;
      return *this;
    }
    template<typename StreamType>
    void print_to_stream(StreamType& s) const
    {
      std::copy(
        m_set.begin(),
        m_set.end(),
        std::ostream_iterator<std::string>
          (s, " "));
    }
  private:
    //! Simply extracts the result of the set
    //! insertion, which indicates whether the
    //! value was added to the set or not.
    template<typename T>
    bool WasInserted(const T& insertResult)
    { return insertResult.second; }
    std::set<ValueType> m_set;
  };
  int main()
  {
    TestContainer<std::string> s;
    s.insert("A").insert("B").insert("AB");
    s.print_to_stream(std::cout);
    return 0;
  }

Improved code, functional style

Of course, a more pure functional approach would keep the container type and the methods related to it separate.

  #include <iostream>
  #include <iterator>
  #include <set>

  //! Return the second element from the pair<>
  //! value that was returned from a call to
  //! set::insert. This indicates whether the
  //! value passed to insert was added or not.
  template <typename T>
  bool WasInserted(const T& t)
  { return t.second; }
  template <class T, class V>
  T test(const T& tOldContainer, const V& val)
  {
    T tNewContainer(tOldContainer);
    if (WasInserted(tNewContainer.insert(val)))
      std::cout << "Added " << val << std::endl;
    else
      std::cout << val << " already present"
        << std::endl;
    return tNewContainer;
  }

  //! Copy the elements of a container to a
  //! stream. One use is to print the elements
  //! of a container to std::out.
  template<typename StreamType,
    typename ContainerType>
  void print_to_stream(StreamType& s,
    const ContainerType& c)
  {
    std::copy(
      begin(c),
    end(c),
    std::ostream_iterator<std::string>
      (s, " "));
  }
  int main()
  {
    std::set<std::string> s;
    s = test(s, "A");
    s = test(s, "B");
    s = test(s, "AB");
    print_to_stream(std::cout, s);
    return 0;
  }

Commentary

This problem hinges on the difference between a pointer to a member function and the value resulting from calling a member function. As Jim found, clang++ tries to be helpful and guesses what was meant. While this is correct, in this case, I share Jim’s concern that this may not be helpful in general.

Of course, there were other problems with the code. Passing the set into the test function by value not by reference was wrong here – but the program is correct code in both cases and it is hard for automated tools to detect this sort of semantic issue. One of the slight concerns over the use of auto in recent C++ code is the way it can hide whether the actual type is a value or a reference in cases where this matters a great deal.

What no-one really challenged about the program is that it isn’t actually a useful test. The program writes output and returns 0 – whether or not the bug is fixed. A more useful test would report success or failure, so it is obvious without needing to read the test in detail whether or not the code is behaving as expected.

James did point out that the test did not exercise the string_less function in the original code, and his solution does try testing "AB" twice – but without testable success criteria it is not obvious what output from the program is actually correct.

I would encourage the programmer to decide what it means to pass the test here and to ensure the code unambiguously reports success only if this occurs.

The final niggle I have with the code is the use of std::copy to print the set. The ostream_iterator will place a copy of the delimiter after each element output, including the last one, and in some environments this trailing space can cause problems. This becomes a bigger problem when the delimiter is visible, for example trying to use a comma.

Unfortunately C++ doesn’t yet have a standard way to fix this although there are some active proposals for delimited iterators so watch this space!

The winner of CC91

All four critiques identified the major problem and suggested how to fix it and all also noted the problem with string_less on equal strings.

Two people suggested implementing string_less with strcmp (although no-one pointed out this may also be more efficient than the hand-rolled string code, especially for larger strings, as library writers are likely to deliver an optimised implementation).

Several people recommended using std::string instead of character pointers and I would generally follow this recommendation myself too. Alex gave a good summary of why this is generally better.

Jim improved the test generation by making it data driven, which makes it very easy to change. It is all too easy to write test code using copy and paste resulting in a lot of unnecessary duplication.

In addition to the critique itself, James also pointed the programmer to some further useful sources of information; so I have awarded him the prize for this critique by a short head.

Code Critique 92

(Submissions to scc@accu.org by April 1st)

I’m trying to use the new(ish) unordered_map. and my simple test program works with some compilers and options but not with others. I can’t see why it wouldn’t work consistently – are some of the compilers just broken? I do get a compiler warning about conversion from string constant to cstring – is that what’s wrong?

Can you help this programmer to get past this presenting problem and help them to identify any other issues with their code? The code is in Listing 2.

#include <iostream>
#include <string>
#include <unordered_map>
typedef char *cstring;
typedef std::unordered_map<cstring, cstring> AddressMap;
// Hold email addresses
class Addresses
{
public:
  // add addr for name
  void add(cstring name, cstring addr)
  {
    addresses[name] = addr;
  }
  // get addr from name or null if not found
  cstring get(cstring name)
  {
    if (addresses.count(name))
    {
      return addresses[name];
    }
    return 0;
  }

private:
  AddressMap addresses;
};

int main()
{
  Addresses addr;

  addr.add("roger",
    "rogero@howzatt.demon.co.uk");
  addr.add("chris",
    "chris@gmail.com");

  cstring  myadd = addr.get("roger");
  if (myadd == 0)
  {
    std::cout << "Didn't find details"
      << std::endl;
    return 1;
  }
  std::cout << "Found " << myadd << std::endl;
}
			
Listing 2

You can also get the current problem from the accu-general mail list (next entry is posted around the last issue’s deadline) or from the ACCU website (http://www.accu.org/journals/). This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.

Notes: 

More fields may be available via dynamicdata ..