Journal Articles
Browse in : |
All
> Journals
> CVu
> 312
(7)
All > Journal Columns > Code Critique (70) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Code Critique Competition 117
Author: Bob Schmidt
Date: 03 May 2019 23:24:43 +01:00 or Fri, 03 May 2019 23:24:43 +01:00
Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.
Body:
Please note that participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.
Note: if you would rather not have your critique visible online please inform me. (Email addresses are not publicly visible.)
Last issue’s code
I’m trying to extract the count of letters in a file and while the program itself seems to work OK, I’m not getting the debugging output I expected. I was expecting to see a debug line of each letter being checked during the final loop, but I only get some of the letters shown.
For example, using sample.txt that consists of “This is a some sample letter input†I get:
letter sample.txt Seen T Seen h Seen n Seen o Seen r Seen u Commonest letter: e (4) Rarest letter: T (1)
Why are only some of the letters being shown as Seen
?
Can you solve the problem – and then give some further advice?
Listing 1 contains logging.h and Listing 2 is letter.cpp.
#include <iostream> namespace { class null_stream_buf : public std::streambuf { } nullbuf; std::ostream null(&nullbuf); } #ifdef NDEBUG #define WRITE_IF(X) if (false) null #else #define WRITE_IF(X) ((X == true) ? std::cout : null) #endif |
Listing 1 |
#include "logging.h" #include <fstream> #include <map> // get the most and least popular chars // in a file int main(int argc, char **argv) { WRITE_IF(argc < 2) << "No arguments provided"; if (argc > 1) { std::ifstream ifs(argv[1]); std::map<char, int> letters; char ch; while (ifs >> ch) ++letters[ch]; std::pair<char, int> most = *letters.begin(); std::pair<char, int> least = *letters.begin(); for (auto pair : letters) { WRITE_IF(pair.second) << "Seen " << pair.first << std::endl; if (pair.second > most.second) { most = pair; } else if (pair.second < least.second) { least = pair; } } std::cout << "Commonest letter: " << most.first << " (" << most.second << ")\n"; if (most != least) std::cout << "Rarest letter: " << least.first << " (" << least.second << ")\n"; } } |
Listing 2 |
Critique
Hans Vredeveld
I can see a couple of issues with this code, one of them resulting in only part of the letters being shown as Seen
. Let’s go through the code starting with logging.h.
The first thing we encounter is a class null_stream_buf
and variables nullbuf
and null
that are defined in an unnamed namespace. The idea behind an unnamed namespace is that you can define things in a translation unit for use in that translation unit only without having to think about whether a name is already used in another translation unit. The compiler will create unique names for them in each translation unit, so there is no ODR violation. The result of this is that we will have as many instances of null
and nullbuf
in our binary as there are translation units. A waste of space in the binary.
If null_stream_buf
were a buffer for some device (or file), the result would be a messy output. Messages written to null
in different translation units would end up in different buffers and only when the underlying library code decides to write the buffer to the device would the data be written. With different, independent, buffers, the order the data is written will appear random and the data will typically contain partial messages. This is similar to when two console applications write simultaneously to the console (or a single log file).
To get the expected result of the messages appearing in the output in the order they were written to the stream, we have to get rid of the unnamed namespace. The class null_stream_buf
has to be defined in the global namespace (or in a named namespace), and the variables nullbuf
and null
have to be declared extern
in the global namespace (or in a named namespace). Also, nullbuf
and null
each have to be defined in just one implementation file.
Now we have only one output stream null
and one buffer nullbuf
in our application, but there is still an issue with null_steam_buf
. It doesn’t override any of the virtual functions of std::streambuf
. In particular, it doesn’t override int_type overflow(int_type ch)
in std::streambuf
. The default implementation in std::streambuf
always returns end-of-file. As a result, as soon as a single character is written to null
, the fail-bit and bad-bit will be set. Basically, null
will tell us that it’s full and can’t accept anything any more, but we happily continue to stream data to it. For a null stream, this is very strange. We expect it to always accept data and just discard it. To get this behaviour, we have to implement a public override for the overflow member function:
int_type overflow(int_type ch) { return ch; }
As an aside, note that, while the code is legal C++, clang++ (version 5.0.1) with all warnings enabled warns that nullbuf
and null
both require global and exit-time destructors. The warnings tell us that we may have a problem when there are multiple global variables in different translation units. We could get rid of these warnings by removing the global variables.
Moving to the next lines in logging.h, we come to the WRITE_IF
macro. When NDEBUG
is defined, nothing is logged because of the if (false)
. It doesn’t matter what ostream
object follows. It could be std::cout
instead of null
.
When NDEBUG
is not defined, compiling with g++ -Wall gives us a warning on WRITE_IF(argc < 2)
, suggesting we put parentheses around the argument. The parentheses can best be put around X
in the definition of WRITE_IF
. The real problem with this macro can be seen in its other use in the application, WRITE_IF(pair.second)
. After macro substitution, we have the comparison (pair.second == true)
. However, pair.second
is of type int
and that means that we do an integer comparison and the bool
constant true
gets promoted to the int
1. The result is that we only print the characters that occur only once in the input file. What we need is a conversion to bool
of pair.second
, which we get with
#define WRITE_IF(X) ((X) ? std::cout : null)
With this modification, we will stream to std::cout
when X
evaluates to true
and to null
when it evaluates to false
. But, why stream to a null output stream that discards everything that is streamed to it at all? When we change the definition to
#define WRITE_IF(X) if (X) std::cout
we only send data to a stream when we want to see it. If we now change the definition in the case that NDEBUG
is defined to
#define WRITE_IF(X) if (false) std::cout
we can remove the global variables null
and nullbuf
, and the class null_stream_buf
.
Now that we have WRITE_IF
’s definition right, let’s have a quick look at the statements where it is used in letter.cpp. The first time it is used, some text is output without a newline at the end. The second time, the newline is output through std::endl
. While I prefer that the logging framework takes care of outputting the newline at the end, WRITE_IF
is such a simple logging macro that this is not realistic. In this case it is up to the user to end each statement with outputting a '\n'
, but not through std::endl
, as Chris Sharpe described so well in Overload 149.
Let’s continue with letter.cpp and in particular the while
-statement that reads the file. I keep on wondering whether this while
-statement does what the code’s author intended. The condition of the while
-statement, ifs >> ch
, uses formatted input to read a character from the file. The effect of this is that it will skip over whitespace, but not over e.g. punctuation. Do we really want to ignore spaces, but count commas and full stops? As the map containing the data is called letters, I suspect that punctuation should also be ignored. Then an extra check with std::isalpha
is in place:
while (ifs >> ch) if (isalpha(ch)) ++letters[ch];
Next is the for
-loop. I don’t like the use of pair
as a variable name, as it can cause confusion with the well-known std::pair
. My first suggestion would be to name it letter
, but whether that is a good name is also open for debate. Instead we can replace the entire for
-loop with a call to std::minmax_element
:
auto [least, most] = std::minmax_element(letters.begin(), letters.end(), [](const auto& a, const auto& b) { return a.second < b.second; });
For this to work, we need to #include <algorithm>
. With this change we lose the debug output of which letter we have seen, but in my opinion it has also lost its meaning.
Finally a note about the includes. The std::cout
in letter.cpp is only made known through the include of logging.h. I strongly believe that any identifier used in a file should be declared in the file itself or brought in by an include in the file itself or, for an implementation file, its associated header file. This means that letter.cpp should #include <iostream>
directly. For the same reason I would prefer to #include <utility>
for std::pair
(not needed any more when using std::minmax_element
) in letter.cpp, and to #include <ostream>
and #include <streambuf>
in logging.h for std::ostream
and std::streambuf
(not needed any more after the rewrite of WRITE_IF
).
James Holland
There are two features to the software than I thought could be the cause of the problem and, therefore, worth investigating. The first is the null_stream_buff
class; the second is the macro WRITE_IF
.
Although null_stream_buff
may be a somewhat novel mechanism, it appeared to be working as desired, namely to produce no output when called. My attention then turned to the macro. The use of all but the simplest macros is discouraged as they can lead to some very obscure bugs that can be very difficult to diagnose. I was hopeful that the problem with the software debug output had something to do with the use of such macros.
I decided that the best way to proceed was to discover what the source code looked like after being preprocessed. This was accomplished by using the -E
compiler switch and observing the resulting output. The statement of interest is the WRITE_IF
macro within the for
-loop of main()
and is shown below.
WRITE_IF(pair.second) << "Seen " << pair.first << std::endl;
After preprocessing, the code the compiler sees is as follow.
((pair.second == true) ? std::cout:null) << "Seen " << pair.first << std::endl;
It can now be seen that whether std::cout
or null
is called depends on the comparison of pair.second
with true
. The type of pair.second
is int
and represents the number of occurrences of a particular letter within the text file. The type of true
is bool
. The problem now boils down to how the C++ compares for equality an int
and a bool
. From observing the program output, it can be seen that debug information is produced only when the value of pair.second
is 1. This suggests that, in effect, the boolean value of true
is converted to an int
of value 1. The comparison, therefore, consists of comparing the value of pair.second
with 1. Output will, therefore, occur only when pair.second
is equal to 1. This is consistent with observations.
The above analysis suggests ways of correcting the problem. One way is to convert pair.second
to a boolean value with a static_cast
before the comparison is made. A value 0 will be converted to false
and any other value will be converted to true
. Now, false == true
has a value of false
and null
will be called; true == true
has a value of true
and std::cout
will be called. This is as required and is borne out in practice by compiling and running the modified code shown below.
WRITE_IF(static_cast<bool>(pair.second)) << "Seen " << pair.first << std::endl;
The comparison of a boolean value with true
is a redundant operation. This suggests a simpler way of correcting the program. Instead of casting pair.second
to bool
, the macro defined in logging.h can be changed from
WRITE_IF(X)((X == true) ? std::cout:null)
to
WRITE_IF(X)((X) ? std::cout:null).
With this arrangement pair.second
(an int
) will be inherently converted to bool
with a value true
if pair.second
is anything other than 0. This is as required and produces the required result.
Taking a slightly wider view of the program, it would seem preferable to modify the if
statement that warns that pair.second
is zero. The modification would add an else
clause that prints the letter and the number seen. This would remove the need for the macro.
It is always worth considering whether a ‘hand-rolled’ loop can be replaced with an existing algorithm from the standard library. In the code presented, a for
-loop scans through a map
to determine the minimum and maximum of map
’s values. There is a standard library algorithm named minmax_element
that sounds as if it could to the same job as the for
-loop. With the use of a lambda, as shown below, it can.
const auto [least, most] = std::minmax_element(letters.cbegin(), letters.cend(), [](const auto & a, const auto & b) { return a.second < b.second; });
It should be noted that if more than one minimum or maximum element exists, minmax_element()
returns the first minimum and the last maximum element. The elements are stored alphabetically within letters and therefore, in this case, minmax_element()
returns T
as the rarest and s
as the commonest. This is different from the code presented which returns the first commonest, namely e
. If the software is to find the first commonest, two other algorithms (both in the standard library) can be used in place of minmax_element()
(with a slight loss in efficiency) namely min_element()
and max_element()
. I have assumed the first commonest letter is to be returned as in the original code resulting in my version of the program shown below.
#include <algorithm> #include <fstream> #include <iostream> #include <map> int main(int argc, char **argv) { if (argc < 2) { std::cout << "No arguments provided"; } else if (argc > 1) { std::fstream ifs(argv[1]); std::map<char, int> letters; char ch; while (ifs >> ch) ++letters[ch]; const auto compare_function = [](const auto & a, const auto & b) { return a.second < b.second; }; const auto least = std::min_element(letters.cbegin(), letters.cend(), compare_function); const auto most = std::max_element(letters.cbegin(), letters.cend(), compare_function); if (most != letters.cend()) { std::cout << "Commonest letter: " << most->first << " (" << most->second << ")\n"; if (most != least) { std::cout << "Rarest letter: " << least->first << " (" << least->second << ")\n"; } } } }
I have also added some protection against the input file being empty.
Balog Pál
Eh, stream-based logging. I hope this is one of the last specimens, we finally get a sensible text formatting library in C++ standard and can leave this monster behind along with all the related quirks. But that is besides the point, let’s read the code.
- We have a logging.h that looks like a header-only library designed for this specific purpose
- namespace
{}
in header: this is normally frowned upon, but fits with the purpose. We can live with this much bloat - I’m puzzled on the empty subclassing. On a review, I’d require a good explanation on why we have that
null_stream_buf
class instead of just usingstd::streambuf
. If it has some ADL-related reason to exist, that must appear in comments - 3 names are used up for no good reason, internal purpose only. Especially
null
stands out. A snap-in like this should use only non-clashing names beyond its public interface - We got to the main feature, that is a macro. I postpone discussion in the idea.
- The name
WRITE_IF
looks suboptimal too - The macro is conditional on
NDEBUG
. That is a fishy choice, asNDEBUG
controls assert, and this feature is not related. - The first version could be just
null
, theif(false)
part serves to save runtime processing of the<<
s that follow. The condition is completely eliminated - The second version has a
== true
comparison that is a major blunder – later we will see that it the actual source of misbehavior - The main code starts in weird way, we have a
WRITE_IF
followed by another check of the same condition – carefully phrased differently. The normal way would emit message on theelse
. And not even with this conditional facility. - I accept the reading of the file into the map though it could be arranged with some algorithm without a loop.
- The stream-related error handling is missing, but on problem we just result an empty map that should be okay.
- We run ahead dereferencing
*begin
without first checking for empty map that is a no-no. - A manual search for element is presented instead of using
std::minmax_element
that would just do the job.
Certainly the last would kill our chance to use WRITE_IF
and notice the misbehavior too... not sure to count that as pro or a con. The way we use it in the loop, we pass the int
with the count. Honestly it beats me why we do that, as the method of our creating the map ensures it will not be zero.
The int
is fine as a boolean expression and would work as intended if the macro just used it that way, just having (X)
. With the == true
added, the promotions work ‘upwards’, instead of forcing the count to a bool
and compare with true
, it stays the int
and true
is promoted to integer 1. Then ==
will pass only letters with exactly one instance in the input. Which matches the behavior we see.
Beyond that the loop looks correct though massively wasting resources: for has auto
instead of const&
making copy of every entry in the map and then making copies into most
and least
. With a pair<char, int>
it might not be noticeable, but should be avoided. And as usual, using the stock algorithm would prevent this kind of mess-up and many others.
Finally, the deferred discussion on WRITE_IF
. The design goes half-way in many points with respect to compile and runtime. I’d prefer an all-or-nothing approach. I’d rather work with a facility like
TRACE_IF( bool_cond, message_string );
that, if arranged with a macro can compile to nothing. Getting rid of both the condition and the message instructions. Or if we prefer to always have the code, can be a simple function. No switcheroo with streams and whatnot.
The ‘weak’ point is the string part, that in the presented version is assembled with the <<
method. That some people still consider a good idea, while others not so much. My practice showed that even a Sprintf()
function that uses the traditional sprintf
interface with all the potentially related problems (I think we can ask a warning on all of them), and returns the favorite flavor of string is much better. But there are modern ways that combine the benefits of type safety and formatting with extras. One of them is supposedly getting into the standard too.
Peter Sommerlad
I will first comment the code directly line by line and then provide a more idiomatic solution.
logging.h
This header file misses an include guard. If this file is included in the same translation unit twice, the global variables with internal linkage nullbuf
and null
will be doubly-defined and thus the code will no longer compile.
#include <iostream>
This include is not recommended, the code in this file requires the definitions of std::streambuf
and std::ostream
, both are available through the <ostream>
header. However, the macro will require std::cout
in case of debug-level compilation. But the macro definition will not require the definition of this global variable, but the macro expansion eventually. I teach my students to only include <iostream>
in the file where the main()
function is defined and only use the global variables std::cin
and std::cout
within the main
function and write all code that does I/O taking a parameter of std::istream&
or std::ostream&
respectively. That principle allows writing unit tests for these functions, which are almost impossible to formulate, when the global stream variables are used directly.
namespace
Defining an anonymous namespace in a header file is not recommended, because wherever that header file is included all definitions exist with internal linkage. If done within the implementation file, I could live with that.
{ class null_stream_buf : public std::streambuf { } nullbuf; std::ostream null(&nullbuf);
This is an interesting attempt to introduce the null-object pattern for ostream
. However, using global variables to achieve that might not be the best way. Especially since formatting operations still will be performed, just no output actually happens.
For modern code formatting, I would always use braced-initialization for initializing variables, this removes the mental gymnastics to ensure that a variable is really initialized, since the default for trivial types without braces might mean uninitialized (for automatic storage duration variables), using uninitialized variables can lead to undefined behavior. While we have global variables here, it is still a practice recommended.
On the other side, there is some problem in the macro logic below.
} #ifdef NDEBUG #define WRITE_IF(X) if (false) null
defining a macro taking a parameter (X)
and not using it can be strange. Also having a statement that is incomplete and not using braces for a block is very brittle, especially when surrounding code contains other conditional statements and a following else
might not be attached to the if
in the surrounding code but to the if
from the macro.
On the other hand, having that if
statement should guarantee that the output will actually never be generated, because the code path is not taken and the optimizer will eliminate that code.
#else #define WRITE_IF(X) ((X == true) ? std::cout : null)
Here is the bug that leads to only outputting letters with the count 1 (that is the integral conversion value of true
). Whenever I see code that compares a (potential bool
) value with either true
or false
that code is unprofessional (regardless of the language).
If the macro parameter is meant to be a bool
value, then it should read (X)?..:..
Note that a macro parameter should always be in parenthesis when expanded to avoid passing ‘interesting’ strings that, when expanded, change the syntax of the expression or statement.
Again, we have a partial expression of type ostream&
If we want to stick with a macro for the debugging, I would suggest to pass all the output to be logged as parameter and not to introduce a conditional even in the debug case.
#endif --- letter.cpp --- #include "logging.h"
It is good practice to include the ‘own’ header first, because that guarantees that the own header is self-contained.
#include <fstream> #include <map> // get the most and least popular chars in a file int main(int argc, char **argv) {
If we follow the unix pipes and filters practice, if no file is given standard input (std::cin
) should be processed. This will make the program logic not more complex, but will actually provide a more useful program.
A big problem of this program is that almost all code resides within the main
function and is thus not unit-testable. BAD! For very simple example programs that are obviously correct, that might be OK, but as we see from the bugs, this program is not simple enough.
WRITE_IF(argc < 2) << "No arguments provided"; if (argc > 1) {
There is no check that the file named argv[1]
actually exists. From C++17 on we have std::filesystem
that allows us to have such a sanity check. This can still cause a race condition when a file is deleted between the check and the actually opening, but it usually allows providing better error messages. A side effect is, that on filesystems with interesting character encodings for file names the string represented by argv[1]
can actually be converted into std::path
to provide platform independent (unicode!).
std::ifstream ifs(argv[1]);
no error checking, but OK, just no input here.
std::map<char, int> letters; char ch; while (ifs >> ch) ++letters[ch];
This is quite idiomatic and automatically skips whitespace by using operator>>
.
However, I would extract filling the map into a separate function, that returns the map by value. Such a function can be independently tested with unit tests.
When there is no input, and thus the map is empty, the following code dereferencing the begin()
iterator of the map is undefined behavior.
It might be better to use iterators instead of the pair values for keeping track of the found elements. But using iterators then requires the check, if the iterators are valid (for example not equal to the map’s end()
iterator.
std::pair<char, int> most = *letters.begin(); std::pair<char, int> least = *letters.begin();
Later on, writing your own loop for a perfectly ready-made linear search algorithm is a code smell and it can be wrong. The algorithm in the standard library to use here would be minmax_element
taking a comparison function object (a lambda) that compares the counters in the map’s pair::second
member.
for (auto pair : letters) { WRITE_IF(pair.second) << "Seen " << pair.first << std::endl;
Here the debug macro is used and passing the actual count that is then compared with the value of true
, results in only those letters to be printed that have a count of one. Using std::endl
instead of '\n'
is a smell, since in some cases it will result in unneeded operating system interaction, because of flushing the buffer prematurely.
Also test output is bad for a filter program in a pipeline, since that extra output will taint the processing of potentially later filters.
if (pair.second > most.second) { most = pair; } else if (pair.second < least.second) { least = pair; } }
we will get rid of the whole loop above, so that we do not need to have the debug output at all.
std::cout << "Commonest letter: " << most.first << " (" << most.second << ")\n"; if (most != least)
An if
statement where the body lasts over several lines without braces is dangerous. if the single statement is split for whatever reason, the logic might be wrong. Also the
std::cout << "Rarest letter: " << least.first << " (" << least.second << ")\n"; } }
In contrast to my usual practice, I provide a solution without accompanying unit tests, but I have refactored the code, so that unit tests might be added easily, by splitting the code besides main
into a separate compilation unit that can be linked to test cases.
As we will see, main()
contains (almost) no branching logic and thus can not be wrong. All the other functions do just one thing, that should be testable and they are not dependent on global variables.
I will address the logging macro separately, since I believe it is no longer needed.
#include <fstream> #include <string> #include <map> #include <algorithm> using charmap=std::map<char,int>; charmap countChars(std::istream &in) { std::map<char, int> letters { }; char ch { }; while (in >> ch) { ++letters[ch]; } return letters; } using mapiter=charmap::const_iterator; void printresult( std::ostream &out, std::string text, mapiter res, mapiter noresult) { if (res != noresult) { out << text << res->first << " (" << res->second << ")\n"; } } #include <iostream> // get the most and least popular chars in a file int main(int argc, char **argv) { std::ifstream ifs { argc > 1 ? std::ifstream { argv[1] } : std::ifstream { 0 } }; auto const letters { countChars(ifs) }; auto const least_most { minmax_element(letters.begin(), letters.end(), [](auto l, auto r) { return l.second < r.second; }) }; printresult(std::cout, "Commonest letter: ", least_most.second, letters.end()); printresult(std::cout, "Rarest letter: ", least_most.first, letters.end()); }
What needs to be explained?
I included <iostream>
directly in front of main
, to demonstrate that the other functions are independent of the global stream objects. While this is not very idiomatic, it was my test to make sure that I did not leave std::cout
somewhere in the extracted code and thus make it untestable.
Let us start with main()
. I extended the functionality to be able to operate the program as a filter. The first statement initializing ifs
is using a trick. Since iostream
objects in general are neither copy-able nor move-able it is tricky to provide a factory that either returns std::cin
(or a reference to it) or otherwise a std::ifstream
that opens the file given as an optional argument. File stream objects can be moved, so it is possible to initialize an std::ifstream
variable from a temporary. Here I use the trick, that an ifstream
can either open a file, given as a string or path argument or binds to an already open file descriptor. Since 0 is the file descriptor of the standard input of a program, I create an ifstream
object using that file descriptor when no file name is given. (I should have implemented a factory with my filesystem readability check first, but I refrained from it for brevity. This is left as an exercise to the student.)
The code reading the stream is extracted into a separate function and thus becomes testable.
Instead of searching for the letter with the lowest and highest count, I use the minmax_element
algorithm available since C++11. To avoid having to spell std::pair<char,int>
I use a generic lambda doing the comparison on the counts. Note that we get a slight deviation from the logic, because the ‘commonest’ (largest) letter count is the last one of potentially similar counts. The original algorithm behaved slightly different. Nevertheless, the logic is OK, because we do not have a specification (or test case) demonstrating what should happen if we have multiple ‘commonest’ letters.
Since the output needs to check for validity (the input might have been empty) and do so twice, I extracted the output code into the printresult
function.
For the logging functionality, I refer to a stackoverflow entry, that demonstrates different approaches to achieve logging: https://stackoverflow.com/questions/19415845/a-better-log-macro-using-template-metaprogramming
But my guideline, especially for beginners, when you need logging or interactive debugging to understand what your code does, your code is usually very badly structured. So, simplify it into smaller parts, write unit tests against these parts, or even better write the tests first, this will help you to grow your software according to your needs instead of guesswork and wasting time (see Pete Goodliffe’s article in CVu that presented that problem).
That’s it. I hope I did not forget anything crucial, because I was too lazy to write test cases. Shame on me.
Silas S. Brown
The reason why only some of the letters are being shown as Seen
is that your WRITE_IF(X)
macro tests for (X == true)
when X
is an integer. The integer value of true
is 1, so only letters that occurred exactly once (and no more than once) are being printed as Seen
.
Instead of checking that X
is equal to true
, check that X
is not equal to false
, i.e. (X != false)
.
But you really need to put extra parentheses around that X
, i.e. ((X) != false)
, because the macro is expanded without regard to C++ syntax and you could break it if you ever wrote, for example, WRITE_IF(A & B)
expecting to get the bitwise AND of A
and B
: you would instead get the bitwise AND of A
and either a 0 or a 1 depending on whether B
differs from false
, since !=
has a higher precedence than the bitwise &
operator. Ouch. (This particular problem would have been flagged up by GCC if you had passed the -Wall
(warnings all) parameter to the g++ command. I really don’t understand why -Wall
is not the default.)
In this particular case, we could side-step the need for extra parentheses by noting that if ((X) != false)
is equivalent to if (X)
(since C++, unlike languages such as Java, does let you put non-boolean types into an if
expression, and automatically applies an != 0
when it sees a numeric type), but I still made the point about the extra parentheses because you’ll need to know that if you do any more things with macros.
But ideally I don’t want you to be doing any more with macros at all. To misquote Doc Brown from Back To The Future: “where we’re going, we don’t need macros.†Try this:
static inline auto& WRITE_IF(bool X) { #ifdef NDEBUG return null; #else return X ? std::cout : null; #endif }
If your compiler is too old for auto&
to work, then write std::ostream&
instead, but do consider upgrading your compiler because auto
is a really useful feature of modern C++ that saves us from having to figure out exactly what type something’s going to be when the compiler can do that job.
(The usual convention is that a name written in all capitals implies a macro, so really we should be changing it to lower case now that it’s a function. But I’ve left it as capitals for now so you can drop the above into the existing code without having to change anything else and it’ll still work.)
Here are just some of the advantages of that function-based approach:
- The compiler can check that our function makes sense, even before we try to use it. (Macros are not checked until they are used, and if there’s a problem the error messages are usually harder to understand, that is if they’re not missed altogether as in your case.)
- We can specify a type for the parameter
X
. In this case, we say we want abool
, so anything else will be converted to abool
. If you had writtenX==true
in this version, it would not have been a problem. - We don’t have to worry about extra parentheses (and I haven’t even mentioned multiple evaluations of the same variable).
- It’s easier to span multiple lines, and as the above shows, putting the
#ifdef NDEBUG
inside the function saves us from writing the first line twice (which is good, because writing something twice means there are two places to fix it if it turns out to be wrong, and if one of those is not even reached by the compiler due to an#ifdef
then it could easily be missed).
In the old days, a counterpoint to all of this might have been ‘but what about the overhead of having a function call’. But with modern optimising compilers, especially as we said ‘static inline’, I would be amazed if the actual machine code generated were any different from doing it with a macro, whether or not NDEBUG
is defined.
Another improvement that is possible (now that we have a function) is to hide the null
stream inside that function, by making it a static instead of a global. What I mean is to change the start of the function to:
static inline auto& WRITE_IF(bool X) { static std::ostream null(&nullbuf);
and remove the std::ostream null(&nullbuf);
from your namespace block. This has the advantage of not creating a thing called null
that’s just ‘floating around’ your file’s global namespace, which could be accidentally referred to in the wrong context. After all there’s nothing in the name to reinforce the idea that it’s a null stream, rather than a null something else. Yes, you could change the name, but I’d rather show you how to encapsulate it in the function so you no longer have to worry about a good global name.
Chris Trobridge
Considering the output of the program:
- The program is only outputting letters that occur exactly once;
- The program counting capitals separately. There’s no reason to count each case separately;
- There is more than one most common and least common (case insensitive) letter and it is misleading to just print the first one.
Considering logging.h…
Line 15:
#define WRITE_IF(X) ((X == true) ? std::cout : null)
Comparison against true
is tautology at best but dangerous as bool true
is promoted to an integer value of 1, when compared against an int
, while any non-zero integer converts to bool true
.
I don’t like an object being named as a (conditional) verb.
Considering letter.cpp…
Lines 10–11:
WRITE_IF(argc < 2) << "No arguments provided";
This is an error message and would typically be output unconditionally to std::cerr
.
Line 18:
++letters[ch];
This is case sensitive and ch
should be converted to lowercase first.
Line 26:
WRITE_IF(pair.second ) << "Seen "
As noted previously WRITE_IF
compares its argument against true
, which is promoted to 1 and hence only letters that have a frequency of one are displayed.
This may be fixed by removing the comparison in the definition of WRITE_IF
above by replacing (X == true)
with (X)
.
Replacing the macro invocation with WRITE_IF(pair.second > 0)
would also fix the issue and express the programmer’s intention better.
The rest of the for
loop looks for the most and least common letters. This most and least could be replaced with vectors of char
to capture all the most and least popular letters.
Finally line 40:
if (most != least) std::cout << "Rarest letter: " << least.first << " (" << least.second << ")\n";
This is interesting in as much as it only prints out the rarest letter if it is not equal to the most common. Both the most common and least common letters are set to the first letter in the map and are only changed if there is at least one other letter more common/rarer than the initial letter. The rarest letter will be printed unless all the letters in the file appear with equal frequency.
Making the changes I have suggested I get the following output:
Seen a 2 Seen e 4 Seen h 1 Seen i 3 Seen l 2 Seen m 2 Seen n 1 Seen o 1 Seen p 2 Seen r 1 Seen s 4 Seen t 4 Seen u 1 Most common letters: e s t (4) Rarest letters: h n o r u (1)
Amended letter.cpp:
#include 'logging.h' #include <fstream> #include <map> #include <vector> // get the most and least popular chars in a file int main(int argc, char **argv) { if (argc < 2) { // Require at least one argument std::cerr << 'No arguments provided\n'; } if (argc > 1) { std::ifstream ifs(argv[1]); std::map<char, int> letters; char ch; while (ifs >> ch) // Ignore case of letters when counting ++letters[tolower(ch)]; // The most common letter must have a // frequency >= 0 int mostFrequency = 0; std::vector<char> most; // The rarest letter must have a // frequency <= INT_MAX int leastFrequency = INT_MAX; std::vector<char> least; for (auto pair : letters) { // Diagnostics: display the frequency of // all letters seen in the file WRITE_IF(pair.second > 0) << 'Seen ' << pair.first << ' ' << pair.second << std::endl; if (pair.second > mostFrequency) { // This letter is more common than any // seen before so reset // the most common list to this letter mostFrequency = pair.second; most = { pair.first }; } else if (pair.second == mostFrequency) { // This letter is as common as the most // common seen so far // so add to the most common list most.push_back(pair.first); } if (pair.second < leastFrequency) { // This letter is less common than any // seen before so reset the most rarest // list to this letter leastFrequency = pair.second; least = { pair.first }; } else if (pair.second == leastFrequency) { // This letter is as rare as the rarest // seen so far so add to the rarest list least.push_back(pair.first); } } if (!most.empty()) { // There is at least one letter so print // out the list of most common letters and // their frequency std::cout << 'Most common letters: '; for (auto ch : most) std::cout << ch << ' '; std::cout << '(' << mostFrequency << ')\n'; } if (!least.empty() && mostFrequency != leastFrequency) { // There is at least one letter that is not // the most common so print out the list of // rarest letters and their frequency std::cout << 'Rarest letters: '; for (auto ch : least) std::cout << ch << ' '; std::cout << '(' << leastFrequency << ')\n'; } } }
Commentary
I think the six entrants between them covered pretty well all the ground, so there seems little left for me to add!
One thing that wasn’t explicitly discussed was the performance downside of using a map where the input is a small discrete set of values. I suggest an array
indexed by the (unsigned) char
value might be simpler and faster or, in a more general case, perhaps an unordered_map
.
Tying the logging to NDEBUG
, as Pál suggested, may be a mistake: especially for the error message written if the argument count is wrong. While some of the logging frameworks out there do this, others use other ways to enable/disable the logging.
Note that the empty class null_stream_buf
is required since the default constructor of std::streambuf
is protected.
It might also be worth a little more explanation of the problem with an anonymous namespace in a header: everything inside the namespace will have a unique name for each translation unit that includes the header. The result can be that multiple copies of entities from the header may end up in the resultant binary, differing only in their namespace.
If this sort of thing is needed, in C++17 you can use a named namespace and inline
variable declarations to ensure only one definition will be included in the built program.
While I like Peter’s solution that uses std::cin
if no filename is provided please note that his solution relies on a non-standard extension that allows construction of an ifstream
from a file handle of 0.
Finally, in correspondence received that was not an actual critique as such, it was suggested you could write the below command in less time it takes to compile the C++ program:
cat sample.txt |fold -w1|grep -v ' ' \ |sort|uniq -c|sort -n -k 1 \ |gawk 'NR==1; END{print}'
The winner of CC 116
All the entrants correctly identified the bug in the WRITE_IF
macro with the expression X == true
. Explicitly checking against true
or false
is, as Chris said, a tautology at best. While some languages require this, C++ does not.
I liked Silas’ approach of fixing the problem by replacing the macro with an inline
function – in C++17 there are even more places where this may be possible.
A couple of people ensured the code was valid if the file failed to open, Peter also suggested checking the file exists using std::filesystem
.
Most of the critiques addressed the ‘C++’ issues in the code; Hans noticed that the code incorrectly treated punctuation as letters and Chris recognised in addition to this that the code treated upper and lower case letters differently. (I suspect an international solution might need to deal with Unicode rather than the char
-based code presented...).
While many of the critiques suggested using minmax_element
to replace a hand-written loop (I was pleased to see this!) and some people noted that this might change which element was found where there was a duplication, only Chris suggested that this reflected a problem with the code itself in that it incorrectly failed to handle the case where multiple letters tied as most or least frequent.
After reading again through each critique and weighing the various slightly different approaches each took, in my opinion Chris gave the most helpful answers to the writer of the code and so I have awarded him the prize for this issue’s critique. But thanks to all who took part; one of the things I appreciate is looking at the different ways that different programmers can view the same code, and the variety of possible approaches they suggest to improving it.
Code Critique 117
(Submissions to scc@accu.org by June 1st)
I’m trying to do some simple statistics for some experimental data but I seem to have got an error: I don’t quite get the results I am expecting for the standard deviation. I can’t see what’s wrong: I’ve checked and I don’t get any compiler warnings. I ran a test with a simple data set as follows:
echo 1 2 3 4 5 6 7 8 9 | statistics mean: 5, sd: 1.73205
The mean is right but I think I should be getting a standard deviation of something like 2.16 for the set (without outliers this is [2,8].)
Can you solve the problem – and then give some further advice?
Listing 3 contains statistics.h and Listing 4 is statistics.cpp.
namespace statistics { // get rid of the biggest and smallest template <typename T> void remove_outliers(T& v) { using namespace std; auto min = min_element(v.begin(), v.end()); auto max = max_element(v.begin(), v.end()); v.erase(remove_if(v.begin(), v.end(), [min, max](auto v) { return v == *min || v == *max; }), v.end()); } template <typename T> auto get_sums(T v) { typename T::value_type sum{}, sumsq{}; for (auto i : v) { sum += i; sumsq += i * i; } return std::make_pair(sum, sumsq); } template <typename T> auto calc(T v) { remove_outliers(v); auto sums = get_sums(v); auto n = v.size(); double mean = sums.first / n; double sd = sqrt((sums.second - sums.first * sums.first / n) / n - 1); return std::make_pair(mean, sd); } } |
Listing 3 |
#include <algorithm> #include <cmath> #include <iostream> #include <iterator> #include <vector> #include "statistics.h" void read(std::vector<int>& v) { using iter = std::istream_iterator<int>; std::copy(iter(std::cin), iter(), std::back_inserter(v)); } int main() { std::vector<int> v; read(v); auto result = statistics::calc(v); std::cout << "mean: " << result.first << ", sd: " << result.second << '\n'; } |
Listing 4 |
You can also get the current problem from the accu-general mail list (next entry is posted around the last issue’s deadline) or from the ACCU website (http://accu.org/index.php/journal).
This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.
Roger has been programming for over 20 years, most recently in C++ and Java for various investment banks in Canary Wharf and the City. He joined ACCU in 1999 and the BSI C++ panel in 2002.
Notes:
More fields may be available via dynamicdata ..