Journal Articles

CVu Journal Vol 29, #6 - January 2018 + Student Code Critiques from CVu journal.
Browse in : All > Journals > CVu > 296 (7)
All > Journal Columns > Code Critique (70)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Code Critique Competition 109

Author: Bob Schmidt

Date: 03 January 2018 16:52:09 +00:00 or Wed, 03 January 2018 16:52:09 +00:00

Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.

Body: 

Please note that participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.

Note: If you would rather not have your critique visible online, please inform me. (Email addresses are not publicly visible.)

Last issue’s code

I’ve got a problem with an extra colon being produced in the output from a program. I’ve stripped down the files involved in the program a fair bit to this smaller example. Here is what the program produces:

    test_program Example "With space"
    1:: 1001:Example
    2:: "1002:With space"

I can’t see where the two colons after each initial number come from as I only ask for one.

Please can you help the programmer find the cause of their problem and suggest some other possible things to consider about their program.

namespace util
{
  class Record {
  public:
    Record(uint64_t id,
      std::string value = {});
    std::string to_string() const;
    // other methods elided
  private:
    uint64_t const id;
    std::string value;
  };
  inline
  std::string to_string(Record const &r)
  {
    return r.to_string();
  }
}
			
Listing 1
#include <cstdint>
#include <sstream>
#include <string>
#include "record.h"
util::Record::Record(uint64_t id, 
  std::string value)
  : id(id), value(value) {}
std::string util::Record::to_string() const
{
  std::ostringstream oss;
  oss << id << ":" << value;
  return oss.str();
}
			
Listing 2
#pragma once
#include <string>

namespace util
{
  // provide 'escaped' textual representation
  // of value
  // - any double quotes need escaping with \    
  // - wrap in double quotes if has any spaces
  template <typename T>
  std::string escaped_text(T t)
  {
    using namespace std;
    
    auto ret = to_string(t);
    for (std::size_t idx = 0;
      (idx = ret.find(idx, '"')) !=
        std::string::npos;
      idx += 2)
    {
      ret.insert(idx, "\\", 1);
    }
    if (ret.find(' ') != std::string::npos)
    {
      ret = '"' + ret + '"';
    }
    return ret;
  }
}
			
Listing 3
#include <cstdint>
#include <iostream>
#include "record.h"
#include "escaped.h"

using namespace util;
template <typename K, typename V>
void output(K key, V value)
{
  std::cout << escaped_text(key) << ": "
    << escaped_text(value) << '\n';
}
int main(int argc, char **argv)
{
  static uint64_t first_id{1000};
  for (int idx = 1; idx != argc; ++idx)
  {
    Record r{++first_id, argv[idx]};
    output(idx, r);
  }
}
			
Listing 4

Critiques

Paul Floyd

The issue presented in this Code Critique all centres around one line.

  auto ret = to_string(t);

This is called from templatized escaped_text. This is itself called, with both of the templatized arguments, from the double templatized output.

This means that escaped_text gets instantiated with t having type int and also with t having type Record. What the author of the code probably intended was for the int specialization to call std::to_string and for the Record specialization to call util::to_string. The arguments match so it should work, right? No, wrong. This is not the full picture of how function call overload resolution works.

I won’t go into the whole gory details (though if you are interested, almost everything I ever learnt about overload resolution I got from a set of fantastic videos by Stephan T. Lavajev on the Microsoft Channel 9 site [1], and there are the usual cppreferences [2] [3]). In short, overload resolution performs the following steps

As you can see, name lookup occurs before selecting the best match. In this case it is ‘unqualified lookup’ since there is no scope operator indicating which scope to search for to_string. The lookup searches scopes from the current to the global scope until it finds one or more names. Once it has found a name in a scope it stops.

Obviously the lookup finds util::to_string which is in the same namespace as the call. But what about std::to_string? From the cppreference description:

For the purpose of unqualified name lookup, all declarations from a namespace nominated by a using directive appear as if declared in the nearest enclosing namespace which contains, directly or indirectly, both the using-directive and the nominated namespace.

The nearest enclosing namespace that contains both std:: and util:: is the global namespace, i.e., std::to_string is considered to be in the global namespace. This means that the lookup stops in the util namespace and only considers util::to_string.

Unfortunately, Record has a non-explicit constructor that can convert from uint64_t

  Record(uint64_t id,
    std::string value = {});

This means that for the second call to escaped_text, a Record is passed and it produces a string of the format "id:name". For the first call to escaped_text, instead of std::to_string generating the string representation of the index, the index is implicitly converted to a Record with a default value for the name, the empty string. This produces the output "index:[empty_string]" or just simply "index:". And here we have the extra colon.

To fix the code, it would be possible to remove the util namespace so that the overload resolution really does take place between std::to_string and util::to_string. I don’t like that, and I’d rather suggest just simplifying output to call std::to_string and also to make the Record constructor explicit.

References

[1] https://channel9.msdn.com/Series/C9-Lectures-Stephan-T-Lavavej-Core-C-

[2] http://en.cppreference.com/w/cpp/language/lookup

[3] http://en.cppreference.com/w/cpp/language/overload_resolution

James Holland

The student should not be surprised that two colons appear in the output. The first colon comes from Record’s member function to_string(). The second colon comes from the output() function called directly from main(). If only one colon is required, I suggest removing the one in output().

Although the code now works correctly with the supplied examples, there are some features that need attention. From reading the code it is clear that should a supplied argument contain a quotation mark, a backslash is to be inserted just before it. It is a pity that this feature has not been tested, as far as is known, because there is a flaw in the mechanism that is responsible for doing this. It transpires that the arguments of find() (within escaped_text()) are swapped. The first parameter of find() is of type char, the second is of type size_t. The function call should, therefore, be ret.find('"', idx). The student’s code is attempting to find a NUL character starting at character position 34 (the ASCII code for the quotation mark character) as opposed to a quotation mark starting at position 0. It is, perhaps, unfortunate that the compiler is willing to cast a size_t to a char and a char to a size_t without comment. Incidentally, it is my understanding that starting a search beyond the end of the string is permissible, find() will simply return std::string::npos to indicate that the character was not found. Testing code to show that it meets its requirements is always important.

Some other features of the student’s code that are immediately apparent include the following.

I am sure there are many other improvements to be made but I think this is enough to be going on with.

Jason Spencer

The reason for the two colons being printed is to do with function name lookup of to_string in escaped_text.

The output on each line is "<okey>: <ovalue>" (as per output<K,V> in test_program.cc), where ovalue is what is returned by Record::to_string(value) (via util::to_string via escaped_text) (after some formatting changes to make it ‘escaped’ if necessary). What is printed for okey, on the other hand, is expected by the student to be the result of std::to_string(unsigned long long int) (via escaped_text), which is expected to be chosen since T = uint64_t in escaped_text<T>(T t), and probably the reason why using namespace std was included.

However, due to the function lookup rules util::to_string is first checked as a suitable function, and is found to be suitable as uint64_t can be implicitly converted to a Record thanks to the Record ctor’s default second argument. The output is therefore:

  <id_of_okey>:<value_of_okey>:
  <id_of_ovalue>:<value_of_ovalue>

value_of_okey is an empty string as per the default second argument in the ctor, and the printed output is now

  <id_of_okey>:: <id_of_ovalue>:<key_of_ovalue>

The solution is discussed below.

There are also other issues with the program:

Aside from the corrections above I’d recommend bigger changes to the code – and they are all to do with escaped_text.

Firstly, move the conversion of type T to std::string out of escaped_text. Let escaped_text have one responsibility escaped_text escaping the text. The conversion can then be much more flexible. Consider using a std::stringstream and the stream operators for the conversion, rather than to_string, as they are supported by more STL types (e.g. std::bitset), and more likely to be supported by user defined types.

The output of std::to_string can also often be unexpected – usually in terms of precision. Streams also have I/O manipulators, so there’s a lot of flexibility in formatting (the base of output integers, capitalisation, number of decimal points, padding). On the subject of I/O manipulators, there’s actually one in C++14/17 that will do the escaping for us:

  template <typename K, typename V> void 
  output(const K & key, const V & value) {
    using std::to_string;
    std::cout << std::quoted(to_string(key))
      << ": " << std::quoted(to_string(value))
      << '\n';
  }

Note however, that std::quoted doesn’t return an std::string, so it isn’t exactly equivalent, but it is designed to be very efficient, and it can also be used on input streams to unquote an incoming string.

Secondly, the original escaped_text has a worst-case time complexity of O(N^2). The loop iterates through the entire input string, and potentially, if the string were composed entirely of double quotes, could call std::string::insert(...) that many times, which itself is linear in time complexity wrt string length. This gets worse when you consider insert might trigger a dynamic memory allocation in the string. This probably isn’t too bad if the string is small (and fits within the SSO [1] buffer), but if it is expected to be long then a two-pass algorithm could be considered:

1) First count the number of double quotes appearing in the input string (as unsigned number_of_quotes), and test for the presence of spaces (as bool has_whitespace)

2a) Create a new empty string that reserves input_string_length + number_of_quotes + (has_whitespace ? 2 : 0) chars.

2b) Iterate over the input string again, copying chars to the new string and inserting quotes and slashes as required.

This in now linear in time and should have at most a single allocation.

A sample implementation might look like:

  std::string escaped_text(
    const std::string & in) {
    unsigned number_of_quotes = 0;
    bool has_whitespace = false;
    const std::string escape_prefix { "\\" };
    for ( const char c : in ) {
      if(c=='"') ++number_of_quotes;
      if(c==' ') has_whitespace = true;
    }
    std::string out;
    out.reserve( in.length() 
      + escape_prefix.length()*number_of_quotes
      + (has_whitespace?2:0) );
    if ( has_whitespace ) out.push_back('"');
    for ( const char c : in ) {
      if(c=='"') out.append ( escape_prefix );
      out.push_back(c);
    }
    if ( has_whitespace ) out.push_back('"');
      return out;
  }

There are, of course other, more generic ways to do this (see regular expressions later), but this is an O(N) solution that has no C++11 requirements (aside from the range for loop which can be factored out).

There are a number of points of further investigation when considering the student’s program:

  1. Name lookup in C++

    Name lookup in C++ has a number of nuances depending on the exact calling conditions. The call to to_string here is an unqualified lookup – that is there is no scope specifier preceding it (ie class name or namespace). See basic.lookup.unqual in [2] for the details behind the selection criteria and namespace.udecl in [2] describes the use of using to bring std::to_string into the lookup space.

    Bear in mind also that you cannot just add your own to_string implementation of your UDT to the std namespace. The C++ specification allows adding specialisations of templates to the std namespace (e.g. std::swap, std::hash), but adding overloads of existing functions is considered undefined behaviour (see namespace.std in [2]). std::to_string is not a template but a series of overloads for arithmetic plain-old datatypes (int, float and variations).

  2. Whitespace, escape sequences and regular expressions. In ASCII, there are more whitespace characters [3] than just space (' '), tab ('\t') and newline ('\n') – there are also the less commonly seen carriage return ('\r'), vertical tab ('\v') and formfeed ('\f'). UTF-8 and UTF-16 add a whole lot more. You could check for all known whitespace directly, or you could rely on a regex (regular expression)[4] class to match the whitespace (single byte char example below, use std::wregex instead to match std::wstring contents):
          std::string escaped_text(
            const std::string & in) {
            std::regex doublequote_regex(R"(")");
            std::regex whitespace_regex("[[:space:]]");
            const char * 
              prefix_pattern_with_backslash_fmt =
              "\\$&";
            std::string ret = std::regex_replace(in,
              doublequote_regex, 
              prefix_pattern_with_backslash_fmt);
            if ( std::regex_search(ret,
              whitespace_regex) )
              ret = '"' + ret + '"';
            return ret;
          }

    This example uses ECMAScript[5] type regexes (the default type for C++ regexes), but there are other types[6].

    There are many ways to escape a string – which method is used typically depends on which elements of the char set are reserved or otherwise have a special meaning. You wouldn’t want to see a newline in a URL, for example, or a colon, or a bell, null or otherwise unprintable character – but they could appear as part of the data in a GET HTTP request URL (whether you should include such data in a GET URL is a discussion for another time). The escaping being done by the student here is reminiscent of CSV escaping, but there is no formal spec for CSV escaping, and some may argue that this isn’t proper CSV escaping; for example, a comma should also be treated as special. I’d argue all non-printable and whitespace characters should also be escaped, as well as the backslash.

    Irrespective of the type of escaping, regular expressions are a great way to match patterns in strings and potentially perform some operation on them.

    Have a look also at Boost.Xpressive for an even more sophisticated library of string manipulation. It allows lambdas to be called where a regex matches so you can do hex conversion, for example, and spaces can be URL encoded to %20.

  3. Testing

    If I may be blunt: there are many gaping holes in the testing here. There’s no test for the escaping of quotes in the input, for example, which would have caught the swapped args in ret.find(idx, '"') in the for loop in escaped_text. The functionality in the student’s program has a relatively good separation, apart from the conversion being in escaped_text. This makes it prime for unit testing [7] [11], and the related Test Driven Development [8] [11]. Using the header-only Catch2 [9] testing framework it’s trivial to check expected output against a given input:

      #define CATCH_CONFIG_MAIN
      #include <catch.hpp>
      #include <string>
      
      const char * no_changes_tests [] = {
        "",
        "abcdef", "a_b_c_d_e_f", "_a_b_c_d_e_f_", 
        "123456", "1_2_3_4_5_6", "_1_2_3_4_5_6_", 
        "a1b1c", "_a_1_b_1_c_",
        "_", "1_", "_1", "_1_", "__", "11"
        };
      const char * quotes_added_tests [] = {
        " ", "  ", "   ",
        "a b c d e f", " a b c d e f ",
        "1 2 3 4 5 6", " 1 2 3 4 5 6 ",
        "a 1 b 1 c", " a 1 b 1 c", "a 1 b 1 c ",
        " a 1 b 1 c "
        };
    
      TEST_CASE( "correctness", "[escaped_text]" ) {
        SECTION( "No change to string" ) {
          for ( const char * test_string :
            no_changes_tests )
              REQUIRE ( escaped_text(test_string) == \
                test_string );
      }
      SECTION( "Quotes should be added" ) {
        for ( const char * test_string : \
          quotes_added_tests )
          REQUIRE ( escaped_text(test_string) == \
          std::string("\"") + test_string + '"' );
      }
      SECTION("Existing quotes should be escaped") 
      {
        REQUIRE( escaped_text( R"(")" ) == \
          R"(\")");
        REQUIRE( escaped_text( R"("")" ) == \
          R"(\"\")");
        REQUIRE( escaped_text( R"(" ")" ) == \
          R"("\" \"")");
        REQUIRE( escaped_text( R"( ")" ) == \
          R"(" \"")");
        REQUIRE( escaped_text( R"(" )" ) == \
          R"("\" ")");
        REQUIRE( escaped_text( R"( " )" ) == \
          R"(" \" ")");
      }
    }

For increased coverage, tests for non space whitespace escaping should also be added. Long string testing also, etc. Writing good tests can be somewhat of an art [11].

I think that’s it for now. I hope I don’t sound too nit-picky, but the compiler can mis-understand intent, and the CPU is unforgiving, and even if that’s all fine, the user (API user or end user) will find a way of breaking or abusing things. There’s also the whole broken windows argument [10].

References

[1] https://stackoverflow.com/questions/10315041/meaning-of-acronym-sso-in-the-context-of-stdstring/10319672#10319672

[2] The C++11 Programming Language standard - ISO/IEC 14882:2011(E)

[3] https://en.wikipedia.org/wiki/Whitespace_character

[4] https://www.regular-expressions.info/

[5] http://ecma-international.org/ecma-262/5.1/#sec-15.10

[6] http://www.cplusplus.com/reference/regex/regex_constants/#syntax_option_type

[7] https://en.wikipedia.org/wiki/Unit_testing and https://martinfowler.com/bliki/UnitTest.html

[8] https://www.agilealliance.org/glossary/tdd/

[9] http://catch-lib.net/

[10] https://pragprog.com/the-pragmatic-programmer/extracts/software-entropy

[11] The Clean Coder: A Code of Conduct for Professional Programmers by Robert C. Martin ISBN 978-0137081073

Commentary

The presenting problem is caused by an interaction between two different things: C++ name lookup rules and implicit constructors.

The idiomatic way to ensure the correct version of swap, for instance, is used in a template is write code like this:

  {
    using std::swap;
    swap(a, b);
  }

As the critiques pointed out, this has the right characteristics when there is a swap visible in another namespace whereas using namespace std does not.

There is good reason for this: bringing in the whole namespace allows access to a potentially large number of identifiers and if the added symbosl were all added into the immediate scope there is a strong likelihood of accidentally hijacking a call to a function in the current namespace in favour of one in the referenced namespace that happens to be a better match. So the rule for using namespaces adds the names to an enclosing scope, where they will be found if necessary when no closer symbol matches.

The second problem that went into the issue experienced by the user was that a constructor with defaulted arguments can provide an implicit conversion from one type to another. This can easily be prevented by making this constructor explicit so preventing most of the places where an implicit conversion occurs – notably for function arguments. It is worth considering the habit of making at least single argument constructors explicit by default – some static analysis tools will recommend this automatically. (It can of course be argued that making the second argument to the Record object optional might not make sense from a design perspective, but single argument constructors are common.)

Header files generally ought to have some kind of include guard to prevent errors caused by duplicate definitions. While #pragma once is a non-standard pragma, as James pointed out, it is in practice supported by the vast majority of modern compilers and does have two main advantages over ‘traditional’ include guards when the environment allows its use. Firstly it avoids adding additional macro identifiers into the program’s scope and secondly the macro identifiers used for include guards sometimes end up duplicating each other (or are used incorrectly), which can cause some very confusing problems.

One problem in the example that no-one commented on was that the eighth line in the file escaped.h contains a comment ending with a backslash, hence turning it and the following line into a multi-line comment. While benign in the current code, this does occasionally have the consequence of accidentally commenting out a line of code, for example:

  // check for \
  if (s.find('\\') != std::string::npos)
  {
    // code
  }

where the if statement is commented out by the trailing backslash and so the following code is executed unconditionally. (Fortunately, many IDEs will show the affected line in comment markup, giving a visual cue.)

The Winner of CC 108

I liked Paul’s links to information about name lookup: both the introduction and the reference. There are a lot of useful resources available to help with C++ programming.

James made several simplification suggestions – such as not requiring the output function to be a template. It is often a good thing to pass through code, once it is believed to be functionally complete, and remove some of the ‘cruft’ that tends to creep in. One of the advantages of code reviews is that additional pairs of eyes, seeing the code for the first time, often notice little details like these which those familiar with code are no longer surprised by.

Jason also discusses a number of improvements to the program – including a small design change to separate the escaping of text and the conversion of the types to a string. This sort of low-level refactoring enables each piece of code to focus on a single task and, when used well, can result in code that is much simpler to understand and test.

There is a bit of an open question over the best type for argument passing. Is it best to pass by value or by const reference? As with many things in C++, ‘it depends’. What are some of the issues we must consider? On the one hand, passing by value requires an accessible copy/move constructor and, depending on whether the argument is a temporary or not, may require an actual copy. On the other hand passing by value is simpler for scalar types and, when passing temporary objects, can end up being more efficient than passing the temporary by reference.

Overall the entrants found a lot of issues to discuss in a relatively small critique, but I think that Jason provided the best set of answer to this issue’s problem.

Code Critique 109

(Submissions to scc@accu.org by Feb 1st)

I’m trying to write a very simple dice game where the computer simulates two players each throwing dice. The higher score wins and after a (selectable) number of turns the player who’s won most times wins the game. (I’m going to make the game cleverer once it’s working.) But the games always seem to be drawn and I can’t see why. Here is what the program produces:

    dice_game
    Let's play dice
    How many turns? 10
    Drawn!
    How many turns? 8
    Drawn!
    How many turns? ^D

What’s going wrong, and how might you help the programmer find the problem? As usual, there may be other suggestions you might make of some other possible things to (re-)consider about their program.

// Class to 'zip' together a pair of iterators
template <typename T>
class zipit : public std::pair<T, T>
{
  zipit &operator+=(std::pair<int,int> const &rhs)
  {
    this->first += rhs.first;
    this->second += rhs.second;
    return *this;
  }
public:
  using std::pair<T, T>::pair;
  zipit &operator+=(int n)
  {
    return *this += std::make_pair(n, n);
  }
  zipit &operator-=(int n)
  {
    return *this += std::make_pair(-n, -n);
  }
  zipit &operator++()
  {
    return *this += 1;
  }
  zipit &operator--()
  {
    return *this += -1;
  }
  auto operator*()
  {
    return std::make_pair(
      *this->first, *this->second);
  }
  auto operator*() const
  {
    return std::make_pair(
      *this->first, *this->second);
  }
  // Hmm, operator-> ??
};
template <typename T>
auto begin(T one, T two)
  -> zipit<typename T::iterator>
{
  return {one.begin(), two.begin()};
}
template <typename T>
auto end(T one, T two)
  -> zipit<typename T::iterator>
{
  return {one.end(), two.end()};
}
			
Listing 5
#include <algorithm>
#include <iostream>
#include <random>
#include "zipit.h"

class randomize
{
  std::mt19937 mt;
public:
  int operator()() { return mt() % 6 + 1; }
};
void play(int turns, randomize &generator)
{
  std::vector<int> player1(turns);
  std::vector<int> player2(turns);

  std::generate(player1.begin(),
    player1.end(), generator);
  std::generate(player2.begin(),
    player2.end(), generator);

  int total{};

  for (auto it = begin(player1, player2);
       it != end(player1, player2); ++it)
  {
    if ((*it).first != (*it).second)
    {
      auto diff = *it.first - *it.second;
      total += copysign(1.0, diff);
     }
  }
  if (total > 0)
  {
    std::cout << "Player 1 wins\n";
  }
  else if (total < 0)
  {
    std::cout << "Player2 wins\n";
  }
  else
  {
    std::cout << "Drawn!\n";
  }
}
int main()
{
  randomize generator;

  int turns;
  std::cout << "Let's play dice\n";
  while (std::cout << "How many turns? ",
    std::cin >> turns)
  {
    play(turns, generator);
  }
}
			
Listing 6

You can also get the current problem from the accu-general mail list (next entry is posted around the last issue’s deadline) or from the ACCU website (http://accu.org/index.php/journal). This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.

Notes: 

More fields may be available via dynamicdata ..