Journal Articles
Browse in : |
All
> Journals
> CVu
> 285
(9)
All > Journal Columns > Code Critique (70) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Code Critique Competition 102
Author: Martin Moene
Date: 02 November 2016 17:08:43 +00:00 or Wed, 02 November 2016 17:08:43 +00:00
Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.
Body:
Please note that participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.
Note: If you would rather not have your critique visible online, please inform me. (Email addresses are not publicly visible.)
Last issue’s code
I’m trying to read a list of test scores and names and print them in order.
I wanted to use exceptions to handle bad input as I don’t want to have to check after every use of the >>
operator.
However, it’s not doing what I expect.
I seem to need to put a trailing /something/ on each line, or I get a spurious failure logged, and it’s not detecting invalid (non-numeric) scores properly: I get a random line when I’d expect to see the line ignored.
The scores I was sorting:
-- sort scores.txt -- 34 Alison Day 45 John Smith 32 Roger Orr XX Alex Brown
What I expect:
$ sort_scores < sort_scores.txt Line 4 ignored 32: Roger Orr 34: Alison Day 45: John Smith
What I got:
$ sort_scores < sort_scores.txt Line 2 ignored Line 3 ignored 0: 32: Roger Orr 34: Alison Day 45: John Smith
I tried to test it on another compiler but gcc didn’t like
iss.exceptions(true)
I tried
iss.exceptions(~iss.exceptions())
to fix the problem.
Can you help me understand what I’m doing wrong?
Listing 1 contains sort_scores.cpp.
#include <iostream> #include <map> #include <sstream> #include <string> using pair = std::pair<std::string, std::string>; void read_line(std::string lbufr, std::map<int, pair> & mmap) { std::istringstream iss(lbufr); iss.exceptions #ifdef __GNUC__ (~iss.exceptions()); #else (true); // yes, we want exceptions #endif int score; iss >> score; auto & name = mmap[score]; iss >> name.first; iss >> name.second; } int main() { std::map<int, pair> mmap; std::string lbufr; int line = 0; while (std::getline(std::cin, lbufr)) try { ++line; read_line(lbufr, mmap); } catch (...) { std::cout << "Line " << line << " ignored\n"; } for (auto && entry : mmap) { std::cout << entry.first << ": " << entry.second.first << ' ' << entry.second.second << '\n'; } } |
Listing 1 |
Critique
Felix Petriconi
The main problem is that iss.exceptions()
has a parameter of type std::ios_base::iostate
. The underlying values are implementation defined, as stated here [1]. So it depends on the used library, if the ~iss.exceptions()
or the conversion of "true
" works as expected.
In [2] Howard Hinnant explained that the correct parameter combination would be:
iss.exceptions(std::ios::failbit | std::ios::badbit)
and then your program works as expected.
Beside this, here are my further observations:
The type definition of pair as std::pair<std::string, std::string>
is from my point of view not something that I would do, because on one hand, as soon as one includes std
into the current namespace, one gets a name collision with std::pair
. And on the other hand, a type with the name pair says nothing about its purpose. In this case I would use a type-name like fullname
. The first parameter of the function read_line
is defined as call by value. There is no need to copy the value for parsing. So a call by const reference would be sufficient and would spare the copy of the complete line. Since we have today move semantics (and most compilers implement even longer return value optimization (RVO)), the readability of the code would be better if the parsed values would be returned as std::pair<int, fullname>
. Then composing the code in a functional way would be much easier.
The current implementation of the function read_line
has the problem that partly successful parsing of a line leads to incomplete data records. So if e.g. parsing the score value was successful, but then reading of the first name fails, the map would contain a probably not usable data set. If parsing of the line would be separated from storing the values in the map, one could avoid this kind of error.
The code uses std::map
for two purposes as far as I can see: The data sets are ordered by the score value and in case of double entries the last entry wins. Here the problem of not separating the parsing from storing the data becomes more prominent. Think about the situation that a first data set for score value 42 was read successfully and then later an other data set with the same score value 42 from the input stream is malformed in a way that the 2nd string could not be read. The record in the map would contain information from the 1st successful read action and the first name of the second read attempt. If only fully read data sets would be inserted into the map, then this could not occur. The std::map
has another problem: It is a data structure with very bad cache locality which results in large collections in bad performance. Each insert is a candidate for re-balancing the underlying tree which is costly. Alex Stepanov goes into details in his course at A9, that is available under [3]. So I would follow the general advice, use std::vector
with std::pair<int, fullname>
as value type and append all values. If you know that all score values appear just once, then simply use std::sort
with a less predicate on the score value. If score value may appear multiple times, then I would use first a std::stable_sort
, that keeps the order of equivalent values, and then remove all duplicate values, either keep the first occurrence or the last occurrence of equivalent entries.
The catch(...)
catches all exceptions. We normally use this ‘wildcard’ only in main()
as a fallback. In this case you know that std::ios_base::failure
is thrown and so I would just catch this one. If any other exception is thrown, e.g. std::bad_alloc
, I would either handle it separately, if possible or let it go and let the application terminate. If std::bad_alloc
would be thrown in this example, you would catch it with ...
and then continue parsing, even the probability is high, that the next parsed line throws the same exception again.
In the last range-based for
loop I would not take entry
as an auto&&
. auto&&
is a forwarding reference, which does not make sense in this context. I would write the loop as for (const auto& entry : mmap)
.
For better readability I would indent the try
- catch
block below the while
loop. At the first glance there is the while
loop and after it, the try
-catch
block. But the try
-catch
block actually is inside the loop.
References
[1] http://en.cppreference.com/w/cpp/io/ios_base/iostate
[2] http://stackoverflow.com/a/16473878
[3] https://www.youtube.com/playlist?list=PLHxtyCq_WDLXryyw91lahwdtpZsmo4BGD
James Holland
The student was very close to getting the program working. The problem lies with the function that was causing the student so much trouble, namely exceptions()
. The comment next to the parameter of exceptions()
would suggest that the student thought that the parameter type was of type bool
; we either want exceptions to be thrown, or we do not. This is not the case. The parameter type is iostate
and provides more flexibility in controlling stream exceptions.
The C++ standard does not define the underlying type of iostate
; it is left to the compiler implementer. This accounts for different behaviours of the various compilers used by the student when attempting to pass true
to exceptions()
. The gcc compiler uses an underlying type to which the type bool
cannot be converted and so an error message is emitted. Some compilers may, conceivably, use an underlying type that permits a bool
to be used, such as an unsigned int
. A value of true
, when converted to an unsigned int
, will have a value of 1
. This will enable one of the stream states to throw an exception and it may not be the one required.
The student also experimented with passing to exceptions()
the bit-inverted value of the currently enabled exceptions. As the default state of the stream is to throw no exceptions, the effect of the call will be to enable all exceptions. Although this will compile without error, it is probably not what the student intended.
An exception can be thrown when the end-of-file is encountered, when a read or write operation fails, and when something more serious goes wrong with the file system. A set of constants is provided, of type iostate
, that can be used as parameters of exceptions()
to enable or disable the desired exceptions. The constants are listed below.
Constant |
---|
std::ios::goodbit std::ios::eofbit std::ios::failbit std::ios::badbit |
It is, perhaps, a little confusing that goodbit
is so named as it does not represent a bit position. It has a value of zero and so can be used to ‘represent’ the absence of the other bits. The constants can be combined, using the bitwise operators, to enable more than one exception, if required.
The student states that at least one character is required after the second name for the software to work as required. I suspect this is because the student’s program has the eof
exception enabled. When operator>>()
reads the second name it needs to be sure that it has read all the characters representing the name. It can only do this by attempting to read one character beyond the end of the name. If there are no characters beyond the name, the end of the buffer will be encountered and eofbit
set. Should the eof
exception be enabled, as is the case with the student’s code, an exception will be thrown. If there is a space, for example, after the second name, operator>>()
will read the space and conclude that all the characters of the second name have been read. There is no need to read any more characters and so no attempt is made to read beyond the record; eofbit
will not be set and no exceptions will be thrown. The record will be processed as required.
The student also stated that the program does not correctly handle records with invalid scores. In such cases, operator>>()
attempts to read a numerical value but fails, setting failbit
and writing zero into score
. As a result no exceptions will be thrown because only the eof
exception has been enabled. An entry in the map will then be made with a key of zero. The program then attempts to read the first and second names from the stream. As the stream is not in a good state, the read operations will not have any effect thus leaving the map entry with null value for the two names.
From this analysis, it can be seen that the program needs to be amended in two ways; one to prevent exceptions being thrown on encountering the eof
, and the other to handle invalid scores. Both can be achieved by allowing an exception to be thrown only when a formatting error occurs. This is done by passing std::ios::failbit
to exceptions()
.
Although this modification results in a working program, I suggest, some areas of the design could be improved. It would be better, for example, to read the score, first name and last name into their respective variables before entering them into the map. This will ensure the map does not contain records that are not properly formed. If the student is concerned that this will introduce inefficiencies, use could be made of the move semantics of std::string
(since C++11). This is achieved by using std::move()
in the assignments to the map. After the assignments, the strings first_name
and last_name
will be in a valid but unknown state. This is acceptable as the string variables will no longer be used. It should be noted, however, that some string implementations use what is called small string optimisation for strings of 15 characters or less. In such cases, moving short strings will not be any faster than copying them; but I digress.
It is noticed that the student has used an rvalue reference (&&
) in the for
loop that prints the content of the map. This is legal but not necessary for non-generic code. A simple reference (&
) will do.
Placing the incrementing of line
within the try
block is not required as incrementing an int
will not throw an exception. Simply moving the incrementing statement outside the try
block will alter the program logic as the while
loop does not have separate scope defining braces and, instead, relies on the fact that the try
and catch
blocks are, in effect, one compound statement. I have chosen to add braces to the while
statement and take the incrementing outside the try
block. I think this makes things a little clearer.
It might be worth making variable names a little more descriptive. It is not absolutely clear what lbufr
means, for example. Perhaps something like line_buffer
would be more appropriate. Also, I am not entirely sure what mmap
stands for.
Finally, I include my version of the program.
#include <iostream> #include <map> #include <sstream> #include <string> using pair = std::pair<std::string, std::string>; void read_line(std::string lbufr, std::map<int, pair> & mmap) { std::istringstream iss(lbufr); iss.exceptions(std::ios::failbit); int score; std::string first_name; std::string second_name; iss >> score; iss >> first_name; iss >> second_name; auto & name = mmap[score]; name.first = std::move(first_name); name.second = std::move(second_name); } int main() { std::map<int, pair> mmap; std::string lbufr; int line = 0; while (std::getline(std::cin, lbufr)) { ++line; try { read_line(lbufr, mmap); } catch(...) { std::cout << "Line " << line << " ignored\n"; } } for (auto & entry : mmap) { std::cout << entry.first << ": " << entry.second.first << ' ' << entry.second.second << '\n'; } }
Simon Sebright
It’s been a while since I had a go at the critique and a longer while since I have done any C++. The subtleties of the standard and the language will have to be sacrificed!
I compiled the code using the free Visual Studio edition. I got slightly different results from the student, but similar in form. My value for the score of XX was -858993460. This suggests we are in the realms of undefined behaviour. The first three lines all gave me an exception. Note that I pasted the scores.txt from the website and tidied up the result a bit – perhaps there were deliberately some extra characters hanging around there. I had each line end with the last character of the second name.
There are many levels to critique such a short piece of code; I’ll spew it out in the order it comes to me...
The first thing that strikes me is the lack of overall algorithm structure. I would have expected:
instantiate some collection to store results
for(each line in input)
add to the results
Sort Results
Output Results
The bad naming of the read_line
function contributes to this – it doesn’t read a line, it converts a line to a result and adds it to the collection. add_to_results(...)
would be a better option. Using a map keyed on integer to sort the results automatically is neat, but it is not explicitly mentioned in the code, that this is sorted. sorted_results
would be a better name than mmap
, which says nothing.
So, having dealt with the overall structure, let’s look at some more detail. First, the catch-all exception handling is a bad idea. The first thing I did to change the code was to add #include <exception>
to the top and to catch exception& e
, then output e.what()
in the result:
Line 1 ignored because of ios_base::eofbit set: iostream stream error
So, now we have some more information – it threw because it ran off the end of the stream whilst reading the second name (I debugged it a bit to see that, and also we note that the second names were gathered, so it must have got to the end of them). OK, well that’s not surprising because the string that was passed to read_line()
did indeed end abruptly with the last character of the second name. Odd behaviour that it sent the result back, but maybe that’s the standard (out of scope here).
So why did that occur? Well, the student attempted to get the input stream to throw exceptions when something went wrong – presumably because they wanted to trap the case where a non-number was passed as the score. Calling basic_ios::exceptions()
is a way to accomplish this, but what about that parameter – true
? If you look at the documentation you will see that the function is expecting an int
which is an ORed set of flags for the various exceptions to be thrown. But we are passing true
, which isn’t an int
, but can be implicitly converted to one. On my compiler, it is 1. This corresponds to the std::ios::eofbit
value, so in fact we are only asking the stream to throw when it runs off the end of the stream, which is not what we want. The values of this enumeration are: badbit
, eofbit
, failbit
, goodbit
.
If we change the iss.exceptions()
call to take ios::failbit
instead, then we get:
Line 4 ignored because ofios_base::failbit set: iostream stream error 32: Roger Orr 34: Alison Day 45: John Smith
This is the desired result, as it did indeed trap the case of XX
not being an integer.
So what can we learn here? First, structure and naming is important. Second, make sure you pass suitable parameters to functions avoiding implicit type conversion (I am out of date with compiler settings – perhaps this is a warning one can induce and prevent by building with warnings as errors set). Third, avoid catch-all exception handling.
As an aside, the decision to take a name as two strings – first and second names – is bad. Firstly, consider Mary Chapin Carpenter or Frank Lloyd Wright. Secondly, not all cultures have the ‘given name’ first – in China for example, this comes after the ‘family name’. A lot of work has been done to establish suitable standards for such concepts. In this case, a single string comprising the full name would have sufficed, but then of course the string parsing might have been trickier.
Indeed it should be said that piggy-backing the stream’s string parsing functionality was a good idea, but perhaps not as plain sailing as one hoped!
Paul Floyd
This is a small piece of code, barely over 50 lines long. What can I say about it? First a minor nit, as a matter of course (or habit) I would pass the string argument to read_line
as a const reference. Clearly the grist of the matter is the std::istringstream
exception specification. I don’t remember by heart what the parameters to the setter are, but it certainly looks like the author of the code was guessing at what to use.
When I compiled the code I got:
CC -std=c++14 +w2 -g -o cc101 cc101.cpp "cc101.cpp", line 17: Error: Could not find a match for std::ios::exceptions(std::istringstream, bool) needed in read_line(std::string, std::map<int,std::pair<std::string, std::string> >&). 1 Error(s) detected.
So I hit the context lookup of my IDE and I got the following 3 definitions:
void exceptions(iostate __except) { _M_exception = __except; this->clear(_M_streambuf_state); } typedef _Ios_Iostate iostate; enum _Ios_Iostate { _S_goodbit = 0, _S_badbit = 1L << 0, _S_eofbit = 1L << 1, _S_failbit = 1L << 2, _S_ios_iostate_end = 1L << 16 };
So it looks like the first compiler accepts a conversion from true
(bool
) to iostate
, presumably to the value 1
or badbit
. And it also looks like GCC accepts the bitwise not
which I expect sets all of the bits to 1
.
I guessed that the intention was to turn on all exceptions, so I tried:
iss.exceptions(std::ios_base::badbit | std::ios_base::eofbit | std::ios_base::failbit);
This then compiled and gave the behaviour described in the description. At first I didn’t pay attention to the trailing dot on first line of the input, so I couldn’t see why the first line wasn’t throwing and the others were. The moral is to always read carefully.
Then I had a go at debugging. I generally had a hard time after the first exception was thrown. Either the code would run to completion (dbx
) or would seem to get stuck reading lines (lldb
). To try to make things a bit clearer I decided to make the catch a bit more explicit:
{ std::cout << "Exception: " << ex.what() << " line:" << line << " ignored\n"; } catch (...) { std::cout << "Other exception, line:" << line << " ignored\n"; }
This helped a bit, but not as much as I’d hoped:
Exception: basic_ios::clear line:4 ignored
OK, but I was kind of hoping that it would be more explicit about the cause of the exception.
I also had a look online at http://en.cppreference.com/w/cpp/io/basic_ios/exceptions
Though it doesn’t seem to be the case here, the note on how different implementations set the state was quite interesting.
By now I’d figured out that the problem is that even for the ‘well-formed’ lines, an exception is being thrown when the end of the istringstream
buffer is reached. I don’t consider reading to the end of a buffer to be an exceptional circumstance, at least not in this case, so I removed eofbit
from the call to set the exceptions flags, and then everything worked as expected.
As a final check, I tried wrapping the body of read_line
in a try
/catch
to see the rdstate()
, here just the catch
clause:
catch (...) { std::cout << "iss mask " << iss.rdstate() << "\n"; }
This confirmed that the first two exceptions were eofbit
and the third a failbit.
Commentary
There seem to be few people who use exceptions with iostreams; perhaps this code critique helps to explain why this is so.
The first problem is with the definition of iostate
which the C++ standard defines to be a ‘bitmask type’. These can be implemented as
- an enumerated type that overloads certain operators,
- an integer type, or
- a bitset
It is important to restrict portable code to operations that are valid for all three types of implementations. In the original code, passing true
to iss.exceptions()
only compiles when iostate
is an integer type (as it is for MSVC, but not for g++). It sets the flag corresponding to 1
which will be eofbit
.
Secondly there are the semantics. Many people do not consider end of file to be unexpected (most files have an end!) and so do not want to enable exceptions with eofbit
. While the standard does attempt to specify the different behaviour of badbit
and failbit
the distinction seems quite subtle and the interaction between the two is messy. Typically, as Felix noted, you would set both these bits together.
Also note that goodbit
is not actual a bit value, as James pointed out. It is zero, so code testing for a ‘good’ state cannot use bitwise and, as the expression (ios.rdstate() & iostate::goodbit)
is always false
!
Finally, while four values are defined, the actual implementation of the bitmask type might contain other values with non-portable sematics, so it may be important to make sure you don’t accidentally set the extra bits. The original example did this in the code selected with __GNUC__
as the expression ~iss.exceptions()
may modify bits other than those defined by the standard values.
As the critique also shows, another problem with using exceptions with iostreams is that the operations do not provide the strong exception guarantee: if an exception is thrown when reading into name.second
the string value might already have been modified. In this case though the original code is already not strongly exception safe (as has already been observed in the critiques) as the entry is added to the map even if the streaming operations throw an exception.
One aspect of the code that no-one fixed was that the program is trying to sort people based on a score but if two people have the same score the second entry overwrites the first one. I had hoped the opening sentence implied that all the scores were important and should be retained! The easiest change to resolve this is to replace std::map
with std::multimap
throughout; the variable name mmap
was a bit of a hint (!) (Other than that it’s a poor name for the variable.)
The winner of CC 101
I think the four critiques covered the ground well – both the low level syntactic and semantic problems and also some discussion about the higher level design principles involved in this problem.
Felix and James both gave good explanations of the portability problems with the program’s use of iostate
and all the critiques fixed the main presenting problem.
Simon’s critique addresses some additional design issues explicitly, such as the poor naming of both methods and variables, as well as covering some of the syntactic problems. So, although other critiques may have covered the low level issues in slightly more detail, I have decided the award the prize for this critique to Simon.
Code critique 101
(Submissions to scc@accu.org by Dec 1st)
I am trying to parse a simple text document by treating it as a list of sentences, each of which is a list of words. I'm getting a stray period when I try to write the document out:
-- sample.txt --
This is an example.
It contains two sentences.
$ parse < sample.txt This is an example. It contains two sentences. .
Can you help fix this?
Listing 2 contains parse.cpp.
#include <iostream> #include <sstream> #include <memory> #include <string> // pointer type template<typename t> using ptr = std::shared_ptr<t>; // forward declarations struct document; struct sentence; struct word; document read_document(std::istream & is); sentence read_sentence(std::istream & is); void write_document(std::ostream & os, document const & d); void write_sentence(std::ostream & os, sentence const & s); // A document is a list of sentences struct document { ptr<sentence> first_sentence; }; // A sentence is a list of words struct sentence { ptr<sentence> next_sentence; ptr<word> first_word; }; struct word { ptr<word> next_word; std::string contents; }; // read a document a sentence at a time document read_document(std::istream & is) { sentence head; auto next = &head; std::string str; while (std::getline(is, str, '.')) { std::istringstream is(str); ptr<sentence> s(new sentence( read_sentence(is))); next->next_sentence = s; next = s.get(); } document d; d.first_sentence = head.next_sentence; return d; } // read a sentence a word at a time sentence read_sentence(std::istream & is) { word head; auto next = &head; std::string str; while (is >> str) { ptr<word> w(new word{nullptr,str}); next->next_word = w; next = w.get(); } sentence s; s.first_word = head.next_word; return s; } // write document a sentence at a time void write_document(std::ostream & os, document const & d) { for (auto s = d.first_sentence; s; s = s->next_sentence) { write_sentence(os, *s); } } // write sentence a word at a time void write_sentence(std::ostream & os, sentence const & s) { std::string delim; for (auto w = s.first_word; w; w = w->next_word) { std::cout << delim << w->contents; delim = ' '; } std::cout << '.' << std::endl; } int main() { document d(read_document(std::cin)); write_document(std::cout, d); } |
Listing 1 |
You can also get the current problem from the accu-general mail list (next entry is posted around the last issue’s deadline) or from the ACCU website (http://accu.org/index.php/journal). This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.
Notes:
More fields may be available via dynamicdata ..