Title: Code Critique Competition 117

Author: Bob Schmidt

Date: 03 May 2019 23:24:43 +01:00 or Fri, 03 May 2019 23:24:43 +01:00

Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.

Body:

Please note that participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.

Note: if you would rather not have your critique visible online please inform me. (Email addresses are not publicly visible.)

Last issueâ€™s code

Iâ€™m trying to extract the count of letters in a file and while the program itself seems to work OK, Iâ€™m not getting the debugging output I expected. I was expecting to see a debug line of each letter being checked during the final loop, but I only get some of the letters shown.

For example, using sample.txt that consists of â€œThis is a some sample letter inputâ€ I get:

    letter sample.txt
    Seen T
    Seen h
    Seen n
    Seen o
    Seen r
    Seen u
    Commonest letter: e (4)
    Rarest letter: T (1)

Why are only some of the letters being shown as Seen?

Can you solve the problem â€“ and then give some further advice?

Listing 1 contains logging.h and Listing 2 is letter.cpp.

#include <iostream>
namespace
{
  class null_stream_buf
  : public std::streambuf
  {
  } nullbuf;
  std::ostream null(&nullbuf);
}
#ifdef NDEBUG
#define WRITE_IF(X) if (false) null
#else
#define WRITE_IF(X) ((X == true) ? std::cout 
  : null)
#endif

Listing 1

#include "logging.h"
#include <fstream>
#include <map>

// get the most and least popular chars
// in a file
int main(int argc, char **argv)
{
  WRITE_IF(argc < 2)
    << "No arguments provided";
  if (argc > 1)
  {
    std::ifstream ifs(argv[1]);
    std::map<char, int> letters;
    char ch;
    while (ifs >> ch)
      ++letters[ch];
     
    std::pair<char, int> most =
      *letters.begin();
    std::pair<char, int> least =
      *letters.begin();
    for (auto pair : letters)
    {
      WRITE_IF(pair.second) << "Seen "
        << pair.first << std::endl;
      if (pair.second > most.second)
      {
        most = pair;
      }
      else if (pair.second < least.second)
      {
        least = pair;
      }
    }
    std::cout << "Commonest letter: "
      << most.first << " ("
      << most.second << ")\n";
    if (most != least)
      std::cout << "Rarest letter: "
        << least.first << " ("
        << least.second << ")\n";
  }
}

Listing 2

Critique

Hans Vredeveld

I can see a couple of issues with this code, one of them resulting in only part of the letters being shown as Seen. Letâ€™s go through the code starting with logging.h.

The first thing we encounter is a class null_stream_buf and variables nullbuf and null that are defined in an unnamed namespace. The idea behind an unnamed namespace is that you can define things in a translation unit for use in that translation unit only without having to think about whether a name is already used in another translation unit. The compiler will create unique names for them in each translation unit, so there is no ODR violation. The result of this is that we will have as many instances of null and nullbuf in our binary as there are translation units. A waste of space in the binary.

If null_stream_buf were a buffer for some device (or file), the result would be a messy output. Messages written to null in different translation units would end up in different buffers and only when the underlying library code decides to write the buffer to the device would the data be written. With different, independent, buffers, the order the data is written will appear random and the data will typically contain partial messages. This is similar to when two console applications write simultaneously to the console (or a single log file).

To get the expected result of the messages appearing in the output in the order they were written to the stream, we have to get rid of the unnamed namespace. The class null_stream_buf has to be defined in the global namespace (or in a named namespace), and the variables nullbuf and null have to be declared extern in the global namespace (or in a named namespace). Also, nullbuf and null each have to be defined in just one implementation file.

Now we have only one output stream null and one buffer nullbuf in our application, but there is still an issue with null_steam_buf. It doesnâ€™t override any of the virtual functions of std::streambuf. In particular, it doesnâ€™t override int_type overflow(int_type ch) in std::streambuf. The default implementation in std::streambuf always returns end-of-file. As a result, as soon as a single character is written to null, the fail-bit and bad-bit will be set. Basically, null will tell us that itâ€™s full and canâ€™t accept anything any more, but we happily continue to stream data to it. For a null stream, this is very strange. We expect it to always accept data and just discard it. To get this behaviour, we have to implement a public override for the overflow member function:

  int_type overflow(int_type ch) { return ch; }

As an aside, note that, while the code is legal C++, clang++ (version 5.0.1) with all warnings enabled warns that nullbuf and null both require global and exit-time destructors. The warnings tell us that we may have a problem when there are multiple global variables in different translation units. We could get rid of these warnings by removing the global variables.

Moving to the next lines in logging.h, we come to the WRITE_IF macro. When NDEBUG is defined, nothing is logged because of the if (false). It doesnâ€™t matter what ostream object follows. It could be std::cout instead of null.

When NDEBUG is not defined, compiling with g++ -Wall gives us a warning on WRITE_IF(argc < 2), suggesting we put parentheses around the argument. The parentheses can best be put around X in the definition of WRITE_IF. The real problem with this macro can be seen in its other use in the application, WRITE_IF(pair.second). After macro substitution, we have the comparison (pair.second == true). However, pair.second is of type int and that means that we do an integer comparison and the bool constant true gets promoted to the int 1. The result is that we only print the characters that occur only once in the input file. What we need is a conversion to bool of pair.second, which we get with

  #define WRITE_IF(X) ((X) ? std::cout : null)

With this modification, we will stream to std::cout when X evaluates to true and to null when it evaluates to false. But, why stream to a null output stream that discards everything that is streamed to it at all? When we change the definition to

  #define WRITE_IF(X) if (X) std::cout

we only send data to a stream when we want to see it. If we now change the definition in the case that NDEBUG is defined to

  #define WRITE_IF(X) if (false) std::cout

we can remove the global variables null and nullbuf, and the class null_stream_buf.

Now that we have WRITE_IFâ€™s definition right, letâ€™s have a quick look at the statements where it is used in letter.cpp. The first time it is used, some text is output without a newline at the end. The second time, the newline is output through std::endl. While I prefer that the logging framework takes care of outputting the newline at the end, WRITE_IF is such a simple logging macro that this is not realistic. In this case it is up to the user to end each statement with outputting a '\n', but not through std::endl, as Chris Sharpe described so well in Overload 149.

Letâ€™s continue with letter.cpp and in particular the while-statement that reads the file. I keep on wondering whether this while-statement does what the codeâ€™s author intended. The condition of the while-statement, ifs >> ch, uses formatted input to read a character from the file. The effect of this is that it will skip over whitespace, but not over e.g. punctuation. Do we really want to ignore spaces, but count commas and full stops? As the map containing the data is called letters, I suspect that punctuation should also be ignored. Then an extra check with std::isalpha is in place:

  while (ifs >> ch)
    if (isalpha(ch)) ++letters[ch];

Next is the for-loop. I donâ€™t like the use of pair as a variable name, as it can cause confusion with the well-known std::pair. My first suggestion would be to name it letter, but whether that is a good name is also open for debate. Instead we can replace the entire for-loop with a call to std::minmax_element:

  auto [least, most] =
   std::minmax_element(letters.begin(), 
     letters.end(),
     [](const auto& a, const auto& b) {
       return a.second < b.second;
     });

For this to work, we need to #include <algorithm>. With this change we lose the debug output of which letter we have seen, but in my opinion it has also lost its meaning.

Finally a note about the includes. The std::cout in letter.cpp is only made known through the include of logging.h. I strongly believe that any identifier used in a file should be declared in the file itself or brought in by an include in the file itself or, for an implementation file, its associated header file. This means that letter.cpp should #include <iostream> directly. For the same reason I would prefer to #include <utility> for std::pair (not needed any more when using std::minmax_element) in letter.cpp, and to #include <ostream> and #include <streambuf> in logging.h for std::ostream and std::streambuf (not needed any more after the rewrite of WRITE_IF).

James Holland

There are two features to the software than I thought could be the cause of the problem and, therefore, worth investigating. The first is the null_stream_buff class; the second is the macro WRITE_IF.

Although null_stream_buff may be a somewhat novel mechanism, it appeared to be working as desired, namely to produce no output when called. My attention then turned to the macro. The use of all but the simplest macros is discouraged as they can lead to some very obscure bugs that can be very difficult to diagnose. I was hopeful that the problem with the software debug output had something to do with the use of such macros.

I decided that the best way to proceed was to discover what the source code looked like after being preprocessed. This was accomplished by using the -E compiler switch and observing the resulting output. The statement of interest is the WRITE_IF macro within the for-loop of main() and is shown below.

  WRITE_IF(pair.second) << "Seen "
    << pair.first << std::endl;

After preprocessing, the code the compiler sees is as follow.

  ((pair.second == true) ? std::cout:null)
    << "Seen " << pair.first << std::endl;

It can now be seen that whether std::cout or null is called depends on the comparison of pair.second with true. The type of pair.second is int and represents the number of occurrences of a particular letter within the text file. The type of true is bool. The problem now boils down to how the C++ compares for equality an int and a bool. From observing the program output, it can be seen that debug information is produced only when the value of pair.second is 1. This suggests that, in effect, the boolean value of true is converted to an int of value 1. The comparison, therefore, consists of comparing the value of pair.second with 1. Output will, therefore, occur only when pair.second is equal to 1. This is consistent with observations.

The above analysis suggests ways of correcting the problem. One way is to convert pair.second to a boolean value with a static_cast before the comparison is made. A value 0 will be converted to false and any other value will be converted to true. Now, false == true has a value of false and null will be called; true == true has a value of true and std::cout will be called. This is as required and is borne out in practice by compiling and running the modified code shown below.

  WRITE_IF(static_cast<bool>(pair.second))
    << "Seen " << pair.first << std::endl;

The comparison of a boolean value with true is a redundant operation. This suggests a simpler way of correcting the program. Instead of casting pair.second to bool, the macro defined in logging.h can be changed from

  WRITE_IF(X)((X == true) ? std::cout:null)

  WRITE_IF(X)((X) ? std::cout:null).

With this arrangement pair.second (an int) will be inherently converted to bool with a value true if pair.second is anything other than 0. This is as required and produces the required result.

Taking a slightly wider view of the program, it would seem preferable to modify the if statement that warns that pair.second is zero. The modification would add an else clause that prints the letter and the number seen. This would remove the need for the macro.

It is always worth considering whether a â€˜hand-rolledâ€™ loop can be replaced with an existing algorithm from the standard library. In the code presented, a for-loop scans through a map to determine the minimum and maximum of mapâ€™s values. There is a standard library algorithm named minmax_element that sounds as if it could to the same job as the for-loop. With the use of a lambda, as shown below, it can.

  const auto [least, most] =
    std::minmax_element(letters.cbegin(), 
      letters.cend(),
      [](const auto & a, const auto & b) {
        return a.second < b.second;
      });

It should be noted that if more than one minimum or maximum element exists, minmax_element() returns the first minimum and the last maximum element. The elements are stored alphabetically within letters and therefore, in this case, minmax_element() returns T as the rarest and s as the commonest. This is different from the code presented which returns the first commonest, namely e. If the software is to find the first commonest, two other algorithms (both in the standard library) can be used in place of minmax_element() (with a slight loss in efficiency) namely min_element() and max_element(). I have assumed the first commonest letter is to be returned as in the original code resulting in my version of the program shown below.

  #include <algorithm>
  #include <fstream>
  #include <iostream>
  #include <map>
  int main(int argc, char **argv)
  {
    if (argc < 2)
    {
      std::cout << "No arguments provided";
    }
    else if (argc > 1)
    {
      std::fstream ifs(argv[1]);
      std::map<char, int> letters;
      char ch;
      while (ifs >> ch)
        ++letters[ch];
      const auto compare_function =
        [](const auto & a, const auto & b) {
          return a.second < b.second;
        };
      const auto least =
        std::min_element(letters.cbegin(), 
          letters.cend(), compare_function);
      const auto most =
        std::max_element(letters.cbegin(),
          letters.cend(), compare_function);
      if (most != letters.cend())
      {
        std::cout << "Commonest letter: " <<
          most->first << " (" << most->second <<
          ")\n";
        if (most != least)
        {
         std::cout << "Rarest letter: " <<
          least->first << " (" << least->second
          << ")\n";
        }
      }
    }
  }

I have also added some protection against the input file being empty.

Balog PÃ¡l

Eh, stream-based logging. I hope this is one of the last specimens, we finally get a sensible text formatting library in C++ standard and can leave this monster behind along with all the related quirks. But that is besides the point, letâ€™s read the code.

We have a logging.h that looks like a header-only library designed for this specific purpose
namespace {} in header: this is normally frowned upon, but fits with the purpose. We can live with this much bloat
Iâ€™m puzzled on the empty subclassing. On a review, Iâ€™d require a good explanation on why we have that null_stream_buf class instead of just using std::streambuf. If it has some ADL-related reason to exist, that must appear in comments
3 names are used up for no good reason, internal purpose only. Especially null stands out. A snap-in like this should use only non-clashing names beyond its public interface
We got to the main feature, that is a macro. I postpone discussion in the idea.
The name WRITE_IF looks suboptimal too
The macro is conditional on NDEBUG. That is a fishy choice, as NDEBUG controls assert, and this feature is not related.
The first version could be just null, the if(false) part serves to save runtime processing of the <<s that follow. The condition is completely eliminated
The second version has a == true comparison that is a major blunder â€“ later we will see that it the actual source of misbehavior
The main code starts in weird way, we have a WRITE_IF followed by another check of the same condition â€“ carefully phrased differently. The normal way would emit message on the else. And not even with this conditional facility.
I accept the reading of the file into the map though it could be arranged with some algorithm without a loop.
The stream-related error handling is missing, but on problem we just result an empty map that should be okay.
We run ahead dereferencing *begin without first checking for empty map that is a no-no.
A manual search for element is presented instead of using std::minmax_element that would just do the job.

Certainly the last would kill our chance to use WRITE_IF and notice the misbehavior too... not sure to count that as pro or a con. The way we use it in the loop, we pass the int with the count. Honestly it beats me why we do that, as the method of our creating the map ensures it will not be zero.

The int is fine as a boolean expression and would work as intended if the macro just used it that way, just having (X). With the == true added, the promotions work â€˜upwardsâ€™, instead of forcing the count to a bool and compare with true, it stays the int and true is promoted to integer 1. Then == will pass only letters with exactly one instance in the input. Which matches the behavior we see.

Beyond that the loop looks correct though massively wasting resources: for has auto instead of const& making copy of every entry in the map and then making copies into most and least. With a pair<char, int> it might not be noticeable, but should be avoided. And as usual, using the stock algorithm would prevent this kind of mess-up and many others.

Finally, the deferred discussion on WRITE_IF. The design goes half-way in many points with respect to compile and runtime. Iâ€™d prefer an all-or-nothing approach. Iâ€™d rather work with a facility like

  TRACE_IF( bool_cond, message_string );

that, if arranged with a macro can compile to nothing. Getting rid of both the condition and the message instructions. Or if we prefer to always have the code, can be a simple function. No switcheroo with streams and whatnot.

The â€˜weakâ€™ point is the string part, that in the presented version is assembled with the << method. That some people still consider a good idea, while others not so much. My practice showed that even a Sprintf() function that uses the traditional sprintf interface with all the potentially related problems (I think we can ask a warning on all of them), and returns the favorite flavor of string is much better. But there are modern ways that combine the benefits of type safety and formatting with extras. One of them is supposedly getting into the standard too.

Peter Sommerlad

I will first comment the code directly line by line and then provide a more idiomatic solution.

  logging.h

This header file misses an include guard. If this file is included in the same translation unit twice, the global variables with internal linkage nullbuf and null will be doubly-defined and thus the code will no longer compile.

  #include <iostream>

This include is not recommended, the code in this file requires the definitions of std::streambuf and std::ostream, both are available through the <ostream> header. However, the macro will require std::cout in case of debug-level compilation. But the macro definition will not require the definition of this global variable, but the macro expansion eventually. I teach my students to only include <iostream> in the file where the main() function is defined and only use the global variables std::cin and std::cout within the main function and write all code that does I/O taking a parameter of std::istream& or std::ostream& respectively. That principle allows writing unit tests for these functions, which are almost impossible to formulate, when the global stream variables are used directly.

  namespace

Defining an anonymous namespace in a header file is not recommended, because wherever that header file is included all definitions exist with internal linkage. If done within the implementation file, I could live with that.

  {
    class null_stream_buf
    : public std::streambuf
    {
    } nullbuf;
    std::ostream null(&nullbuf);

This is an interesting attempt to introduce the null-object pattern for ostream. However, using global variables to achieve that might not be the best way. Especially since formatting operations still will be performed, just no output actually happens.

For modern code formatting, I would always use braced-initialization for initializing variables, this removes the mental gymnastics to ensure that a variable is really initialized, since the default for trivial types without braces might mean uninitialized (for automatic storage duration variables), using uninitialized variables can lead to undefined behavior. While we have global variables here, it is still a practice recommended.

On the other side, there is some problem in the macro logic below.

  }
  #ifdef NDEBUG
  #define WRITE_IF(X) if (false) null

defining a macro taking a parameter (X) and not using it can be strange. Also having a statement that is incomplete and not using braces for a block is very brittle, especially when surrounding code contains other conditional statements and a following else might not be attached to the if in the surrounding code but to the if from the macro.

On the other hand, having that if statement should guarantee that the output will actually never be generated, because the code path is not taken and the optimizer will eliminate that code.

  #else
  #define WRITE_IF(X) ((X == true) ? std::cout 
  : null)

Here is the bug that leads to only outputting letters with the count 1 (that is the integral conversion value of true). Whenever I see code that compares a (potential bool) value with either true or false that code is unprofessional (regardless of the language).

If the macro parameter is meant to be a bool value, then it should read (X)?..:..

Note that a macro parameter should always be in parenthesis when expanded to avoid passing â€˜interestingâ€™ strings that, when expanded, change the syntax of the expression or statement.

Again, we have a partial expression of type ostream&

If we want to stick with a macro for the debugging, I would suggest to pass all the output to be logged as parameter and not to introduce a conditional even in the debug case.

  #endif
  
  --- letter.cpp ---
  #include "logging.h"

It is good practice to include the â€˜ownâ€™ header first, because that guarantees that the own header is self-contained.

  #include <fstream>
  #include <map>
  
  // get the most and least popular chars in a file
  int main(int argc, char **argv)
  {

If we follow the unix pipes and filters practice, if no file is given standard input (std::cin) should be processed. This will make the program logic not more complex, but will actually provide a more useful program.

A big problem of this program is that almost all code resides within the main function and is thus not unit-testable. BAD! For very simple example programs that are obviously correct, that might be OK, but as we see from the bugs, this program is not simple enough.

  WRITE_IF(argc < 2)
    << "No arguments provided";
  if (argc > 1)
  {

There is no check that the file named argv[1] actually exists. From C++17 on we have std::filesystem that allows us to have such a sanity check. This can still cause a race condition when a file is deleted between the check and the actually opening, but it usually allows providing better error messages. A side effect is, that on filesystems with interesting character encodings for file names the string represented by argv[1] can actually be converted into std::path to provide platform independent (unicode!).

  std::ifstream ifs(argv[1]);

no error checking, but OK, just no input here.

  std::map<char, int> letters;
  char ch;
  while (ifs >> ch)
    ++letters[ch];

This is quite idiomatic and automatically skips whitespace by using operator>>.

However, I would extract filling the map into a separate function, that returns the map by value. Such a function can be independently tested with unit tests.

When there is no input, and thus the map is empty, the following code dereferencing the begin() iterator of the map is undefined behavior.

It might be better to use iterators instead of the pair values for keeping track of the found elements. But using iterators then requires the check, if the iterators are valid (for example not equal to the mapâ€™s end() iterator.

  std::pair<char, int> most =
    *letters.begin();
  std::pair<char, int> least =
    *letters.begin();

Later on, writing your own loop for a perfectly ready-made linear search algorithm is a code smell and it can be wrong. The algorithm in the standard library to use here would be minmax_element taking a comparison function object (a lambda) that compares the counters in the mapâ€™s pair::second member.

  for (auto pair : letters)
  {
    WRITE_IF(pair.second) << "Seen "
      << pair.first << std::endl;

Here the debug macro is used and passing the actual count that is then compared with the value of true, results in only those letters to be printed that have a count of one. Using std::endl instead of '\n' is a smell, since in some cases it will result in unneeded operating system interaction, because of flushing the buffer prematurely.

Also test output is bad for a filter program in a pipeline, since that extra output will taint the processing of potentially later filters.

  if (pair.second > most.second)
    {
      most = pair;
    }
    else if (pair.second < least.second)
    {
      least = pair;
    }
  }

we will get rid of the whole loop above, so that we do not need to have the debug output at all.

  std::cout << "Commonest letter: "
    << most.first << " ("
    << most.second << ")\n";
  if (most != least)

An if statement where the body lasts over several lines without braces is dangerous. if the single statement is split for whatever reason, the logic might be wrong. Also the

    std::cout << "Rarest letter: "
      << least.first << " ("
      << least.second << ")\n";
    }
  }

In contrast to my usual practice, I provide a solution without accompanying unit tests, but I have refactored the code, so that unit tests might be added easily, by splitting the code besides main into a separate compilation unit that can be linked to test cases.

As we will see, main() contains (almost) no branching logic and thus can not be wrong. All the other functions do just one thing, that should be testable and they are not dependent on global variables.

I will address the logging macro separately, since I believe it is no longer needed.

  #include <fstream>
  #include <string>
  #include <map>
  #include <algorithm>
  using charmap=std::map<char,int>;
  charmap countChars(std::istream &in) {
    std::map<char, int> letters { };
    char ch { };
    while (in >> ch) {
      ++letters[ch];
    }
    return letters;
  }
  using mapiter=charmap::const_iterator;
  void printresult(
    std::ostream &out,
    std::string text,
    mapiter res,
    mapiter noresult) {
    if (res != noresult) {
      out << text << res->first << " ("
        << res->second << ")\n";
    }
  }
  #include <iostream>
  // get the most and least popular chars in a file
  int main(int argc, char **argv) {
    std::ifstream ifs {
      argc > 1 ?
      std::ifstream { argv[1] }
      : std::ifstream { 0 } };
    auto const letters { countChars(ifs) };
    auto const least_most {
      minmax_element(letters.begin(),
        letters.end(),
        [](auto l, auto r) {
          return l.second < r.second;
        }) };
    printresult(std::cout,
      "Commonest letter: ",
      least_most.second, letters.end());
    printresult(std::cout, "Rarest letter: ",
      least_most.first, letters.end());
  }

What needs to be explained?

I included <iostream> directly in front of main, to demonstrate that the other functions are independent of the global stream objects. While this is not very idiomatic, it was my test to make sure that I did not leave std::cout somewhere in the extracted code and thus make it untestable.

Let us start with main(). I extended the functionality to be able to operate the program as a filter. The first statement initializing ifs is using a trick. Since iostream objects in general are neither copy-able nor move-able it is tricky to provide a factory that either returns std::cin (or a reference to it) or otherwise a std::ifstream that opens the file given as an optional argument. File stream objects can be moved, so it is possible to initialize an std::ifstream variable from a temporary. Here I use the trick, that an ifstream can either open a file, given as a string or path argument or binds to an already open file descriptor. Since 0 is the file descriptor of the standard input of a program, I create an ifstream object using that file descriptor when no file name is given. (I should have implemented a factory with my filesystem readability check first, but I refrained from it for brevity. This is left as an exercise to the student.)

The code reading the stream is extracted into a separate function and thus becomes testable.

Instead of searching for the letter with the lowest and highest count, I use the minmax_element algorithm available since C++11. To avoid having to spell std::pair<char,int> I use a generic lambda doing the comparison on the counts. Note that we get a slight deviation from the logic, because the â€˜commonestâ€™ (largest) letter count is the last one of potentially similar counts. The original algorithm behaved slightly different. Nevertheless, the logic is OK, because we do not have a specification (or test case) demonstrating what should happen if we have multiple â€˜commonestâ€™ letters.

Since the output needs to check for validity (the input might have been empty) and do so twice, I extracted the output code into the printresult function.

For the logging functionality, I refer to a stackoverflow entry, that demonstrates different approaches to achieve logging: https://stackoverflow.com/questions/19415845/a-better-log-macro-using-template-metaprogramming

But my guideline, especially for beginners, when you need logging or interactive debugging to understand what your code does, your code is usually very badly structured. So, simplify it into smaller parts, write unit tests against these parts, or even better write the tests first, this will help you to grow your software according to your needs instead of guesswork and wasting time (see Pete Goodliffeâ€™s article in CVu that presented that problem).

Thatâ€™s it. I hope I did not forget anything crucial, because I was too lazy to write test cases. Shame on me.

Silas S. Brown

The reason why only some of the letters are being shown as Seen is that your WRITE_IF(X) macro tests for (X == true) when X is an integer. The integer value of true is 1, so only letters that occurred exactly once (and no more than once) are being printed as Seen.

Instead of checking that X is equal to true, check that X is not equal to false, i.e. (X != false).

But you really need to put extra parentheses around that X, i.e. ((X) != false), because the macro is expanded without regard to C++ syntax and you could break it if you ever wrote, for example, WRITE_IF(A & B) expecting to get the bitwise AND of A and B: you would instead get the bitwise AND of A and either a 0 or a 1 depending on whether B differs from false, since != has a higher precedence than the bitwise & operator. Ouch. (This particular problem would have been flagged up by GCC if you had passed the -Wall (warnings all) parameter to the g++ command. I really donâ€™t understand why -Wall is not the default.)

In this particular case, we could side-step the need for extra parentheses by noting that if ((X) != false) is equivalent to if (X) (since C++, unlike languages such as Java, does let you put non-boolean types into an if expression, and automatically applies an != 0 when it sees a numeric type), but I still made the point about the extra parentheses because youâ€™ll need to know that if you do any more things with macros.

But ideally I donâ€™t want you to be doing any more with macros at all. To misquote Doc Brown from Back To The Future: â€œwhere weâ€™re going, we donâ€™t need macros.â€ Try this:

  static inline auto& WRITE_IF(bool X) {
  #ifdef NDEBUG
    return null;
  #else
    return X ? std::cout : null;
  #endif
  }

If your compiler is too old for auto& to work, then write std::ostream& instead, but do consider upgrading your compiler because auto is a really useful feature of modern C++ that saves us from having to figure out exactly what type somethingâ€™s going to be when the compiler can do that job.

(The usual convention is that a name written in all capitals implies a macro, so really we should be changing it to lower case now that itâ€™s a function. But Iâ€™ve left it as capitals for now so you can drop the above into the existing code without having to change anything else and itâ€™ll still work.)

Here are just some of the advantages of that function-based approach:

The compiler can check that our function makes sense, even before we try to use it. (Macros are not checked until they are used, and if thereâ€™s a problem the error messages are usually harder to understand, that is if theyâ€™re not missed altogether as in your case.)
We can specify a type for the parameter X. In this case, we say we want a bool, so anything else will be converted to a bool. If you had written X==true in this version, it would not have been a problem.
We donâ€™t have to worry about extra parentheses (and I havenâ€™t even mentioned multiple evaluations of the same variable).
Itâ€™s easier to span multiple lines, and as the above shows, putting the #ifdef NDEBUG inside the function saves us from writing the first line twice (which is good, because writing something twice means there are two places to fix it if it turns out to be wrong, and if one of those is not even reached by the compiler due to an #ifdef then it could easily be missed).

In the old days, a counterpoint to all of this might have been â€˜but what about the overhead of having a function callâ€™. But with modern optimising compilers, especially as we said â€˜static inlineâ€™, I would be amazed if the actual machine code generated were any different from doing it with a macro, whether or not NDEBUG is defined.

Another improvement that is possible (now that we have a function) is to hide the null stream inside that function, by making it a static instead of a global. What I mean is to change the start of the function to:

  static inline auto& WRITE_IF(bool X) {
    static std::ostream null(&nullbuf);

and remove the std::ostream null(&nullbuf); from your namespace block. This has the advantage of not creating a thing called null thatâ€™s just â€˜floating aroundâ€™ your fileâ€™s global namespace, which could be accidentally referred to in the wrong context. After all thereâ€™s nothing in the name to reinforce the idea that itâ€™s a null stream, rather than a null something else. Yes, you could change the name, but Iâ€™d rather show you how to encapsulate it in the function so you no longer have to worry about a good global name.

Chris Trobridge

Considering the output of the program:

The program is only outputting letters that occur exactly once;
The program counting capitals separately. Thereâ€™s no reason to count each case separately;
There is more than one most common and least common (case insensitive) letter and it is misleading to just print the first one.

Considering logging.hâ€¦

Line 15:

  #define WRITE_IF(X) ((X == true) ? std::cout 
  : null)

Comparison against true is tautology at best but dangerous as bool true is promoted to an integer value of 1, when compared against an int, while any non-zero integer converts to bool true.

I donâ€™t like an object being named as a (conditional) verb.

Considering letter.cppâ€¦

Lines 10â€“11:

  WRITE_IF(argc < 2)
  << "No arguments provided";

This is an error message and would typically be output unconditionally to std::cerr.

Line 18:

  ++letters[ch];

This is case sensitive and ch should be converted to lowercase first.

Line 26:

  WRITE_IF(pair.second ) << "Seen "

As noted previously WRITE_IF compares its argument against true, which is promoted to 1 and hence only letters that have a frequency of one are displayed.

This may be fixed by removing the comparison in the definition of WRITE_IF above by replacing (X == true) with (X).

Replacing the macro invocation with WRITE_IF(pair.second > 0) would also fix the issue and express the programmerâ€™s intention better.

The rest of the for loop looks for the most and least common letters. This most and least could be replaced with vectors of char to capture all the most and least popular letters.

Finally line 40:

  if (most != least)
    std::cout << "Rarest letter: "
      << least.first << " ("
      << least.second << ")\n";

This is interesting in as much as it only prints out the rarest letter if it is not equal to the most common. Both the most common and least common letters are set to the first letter in the map and are only changed if there is at least one other letter more common/rarer than the initial letter. The rarest letter will be printed unless all the letters in the file appear with equal frequency.

Making the changes I have suggested I get the following output:

  Seen a 2
  Seen e 4
  Seen h 1
  Seen i 3
  Seen l 2
  Seen m 2
  Seen n 1
  Seen o 1
  Seen p 2
  Seen r 1
  Seen s 4
  Seen t 4
  Seen u 1
  Most common letters: e s t (4)
  Rarest letters: h n o r u (1)

Amended letter.cpp:

  #include 'logging.h'
  #include <fstream>
  #include <map>
  #include <vector>

  // get the most and least popular chars in a file
  int main(int argc, char **argv)
  {
  if (argc < 2)
    {
      // Require at least one argument
      std::cerr << 'No arguments provided\n';
    }
    if (argc > 1)
    {
      std::ifstream ifs(argv[1]);
      std::map<char, int> letters;
      char ch;
      while (ifs >> ch)
        // Ignore case of letters when counting
        ++letters[tolower(ch)];
      // The most common letter must have a
      // frequency >= 0
      int mostFrequency = 0;
      std::vector<char> most;

      // The rarest letter must have a
      // frequency <= INT_MAX
      int leastFrequency = INT_MAX;
      std::vector<char> least;

      for (auto pair : letters)
      {
        // Diagnostics: display the frequency of
        // all letters seen in the file
        WRITE_IF(pair.second > 0) << 'Seen '
          << pair.first << ' ' << pair.second
          << std::endl;

        if (pair.second > mostFrequency)
        {
          // This letter is more common than any
          // seen before so reset
          // the most common list to this letter
          mostFrequency = pair.second;
          most = { pair.first };
        }
        else if (pair.second == mostFrequency)
        {
          // This letter is as common as the most
          // common seen so far
          // so add to the most common list
          most.push_back(pair.first);
        }

        if (pair.second < leastFrequency)
        {
          // This letter is less common than any
          // seen before so reset the most rarest
          // list to this letter
          leastFrequency = pair.second;
          least = { pair.first };
        }
        else if (pair.second == leastFrequency)
        {
          // This letter is as rare as the rarest
          // seen so far so add to the rarest list
          least.push_back(pair.first);
        }
      }

      if (!most.empty())
      {
        // There is at least one letter so print
        // out the list of most common letters and
        // their frequency
        std::cout << 'Most common letters: ';
        for (auto ch : most)
          std::cout << ch << ' ';
        std::cout << '(' << mostFrequency
          <<  ')\n';
      }
      if (!least.empty() &&
        mostFrequency != leastFrequency)
      {
        // There is at least one letter that is not
        // the most common so print out the list of
        // rarest letters and their frequency
        std::cout << 'Rarest letters: ';
        for (auto ch : least)
          std::cout << ch << ' ';
        std::cout << '(' << leastFrequency
          << ')\n';
      }
    }
  }

Commentary

I think the six entrants between them covered pretty well all the ground, so there seems little left for me to add!

One thing that wasnâ€™t explicitly discussed was the performance downside of using a map where the input is a small discrete set of values. I suggest an array indexed by the (unsigned) char value might be simpler and faster or, in a more general case, perhaps an unordered_map.

Tying the logging to NDEBUG, as PÃ¡l suggested, may be a mistake: especially for the error message written if the argument count is wrong. While some of the logging frameworks out there do this, others use other ways to enable/disable the logging.

Note that the empty class null_stream_buf is required since the default constructor of std::streambuf is protected.

It might also be worth a little more explanation of the problem with an anonymous namespace in a header: everything inside the namespace will have a unique name for each translation unit that includes the header. The result can be that multiple copies of entities from the header may end up in the resultant binary, differing only in their namespace.

If this sort of thing is needed, in C++17 you can use a named namespace and inline variable declarations to ensure only one definition will be included in the built program.

While I like Peterâ€™s solution that uses std::cin if no filename is provided please note that his solution relies on a non-standard extension that allows construction of an ifstream from a file handle of 0.

Finally, in correspondence received that was not an actual critique as such, it was suggested you could write the below command in less time it takes to compile the C++ program:

  cat sample.txt |fold -w1|grep -v ' ' \
   |sort|uniq -c|sort -n -k 1 \
   |gawk 'NR==1; END{print}'

The winner of CC 116

All the entrants correctly identified the bug in the WRITE_IF macro with the expression X == true. Explicitly checking against true or false is, as Chris said, a tautology at best. While some languages require this, C++ does not.

I liked Silasâ€™ approach of fixing the problem by replacing the macro with an inline function â€“ in C++17 there are even more places where this may be possible.

A couple of people ensured the code was valid if the file failed to open, Peter also suggested checking the file exists using std::filesystem.

Most of the critiques addressed the â€˜C++â€™ issues in the code; Hans noticed that the code incorrectly treated punctuation as letters and Chris recognised in addition to this that the code treated upper and lower case letters differently. (I suspect an international solution might need to deal with Unicode rather than the char-based code presented...).

While many of the critiques suggested using minmax_element to replace a hand-written loop (I was pleased to see this!) and some people noted that this might change which element was found where there was a duplication, only Chris suggested that this reflected a problem with the code itself in that it incorrectly failed to handle the case where multiple letters tied as most or least frequent.

After reading again through each critique and weighing the various slightly different approaches each took, in my opinion Chris gave the most helpful answers to the writer of the code and so I have awarded him the prize for this issueâ€™s critique. But thanks to all who took part; one of the things I appreciate is looking at the different ways that different programmers can view the same code, and the variety of possible approaches they suggest to improving it.

Code Critique 117

(Submissions to scc@accu.org by June 1st)

Iâ€™m trying to do some simple statistics for some experimental data but I seem to have got an error: I donâ€™t quite get the results I am expecting for the standard deviation. I canâ€™t see whatâ€™s wrong: Iâ€™ve checked and I donâ€™t get any compiler warnings. I ran a test with a simple data set as follows:

     echo 1 2 3 4 5 6 7 8 9 | statistics
     mean: 5, sd: 1.73205

The mean is right but I think I should be getting a standard deviation of something like 2.16 for the set (without outliers this is [2,8].)

Can you solve the problem â€“ and then give some further advice?

Listing 3 contains statistics.h and Listing 4 is statistics.cpp.

namespace statistics
{
  // get rid of the biggest and smallest
  template <typename T>
  void remove_outliers(T& v)
  {
    using namespace std;
    auto min = min_element(v.begin(), v.end());
    auto max = max_element(v.begin(), v.end());
    v.erase(remove_if(v.begin(), v.end(),
      [min, max](auto v) {
        return v == *min || v == *max; 
      }),
      v.end());
  }
  template <typename T>
  auto get_sums(T v)
  {
    typename T::value_type sum{}, sumsq{};
    for (auto i : v)
    {
      sum += i;
      sumsq += i * i;
    }
    return std::make_pair(sum, sumsq);
  }
  template <typename T>
  auto calc(T v)
  {
    remove_outliers(v);
    auto sums = get_sums(v);
    auto n = v.size();
    double mean = sums.first / n;
    double sd = sqrt((sums.second -
      sums.first * sums.first / n) / n - 1);
    return std::make_pair(mean, sd);
  }
}

Listing 3

#include <algorithm>
#include <cmath>
#include <iostream>
#include <iterator>
#include <vector>
#include "statistics.h"
void read(std::vector<int>& v)
{
  using iter = std::istream_iterator<int>;
  std::copy(iter(std::cin), iter(),
    std::back_inserter(v));
}
int main()
{
  std::vector<int> v;
  read(v);
  auto result = statistics::calc(v);
  std::cout << "mean: " << result.first
    << ", sd: " << result.second << '\n';
}

Listing 4

You can also get the current problem from the accu-general mail list (next entry is posted around the last issueâ€™s deadline) or from the ACCU website (http://accu.org/index.php/journal).

This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.

Roger Orr Roger has been programming for over 20 years, most recently in C++ and Java for various investment banks in Canary Wharf and the City. He joined ACCU in 1999 and the BSI C++ panel in 2002.

Notes:

More fields may be available via dynamicdata ..

Journal Articles