Browse in : |
All
> Journal Columns
> Code Critique
All > Journals > CVu > 322 Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Code Critique Competition 123
Author: Bob Schmidt
Date: 01 May 2020 17:43:17 +01:00 or Fri, 01 May 2020 17:43:17 +01:00
Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.
Body:
Please note that participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.
Note: if you would rather not have your critique visible online please inform me. (Email addresses are not publicly visible.)
Last issue’s code
I was set a challenge to read the ‘Top 100 books’ (from the BBC Big Read a few years ago) so thought I could start by seeing which ones I needed to get. Obviously this meant writing a program. Unfortunately, the output from the program isn’t sorted, although I was expecting it would be; and in addition, it crashed on Windows.
Can you help the author fix their program so they can begin to catch up with their reading? The program (the_big_read.cpp) is in Listing 1.
#include <algorithm> #include <iostream> template <int A, int B> void to_buy(const char *(&to_read)[A], const char *(&own)[B]) { const char** ptr = new const char *[A]; const char **first = ptr; for (int i = 0; i != A; ++i) { *ptr++ = to_read[i]; for (int j = 0; j != B; ++j) { if (!strcmp(to_read[i], own[j])) { ptr--; // already own it! } } } std::sort(first, ptr); for (const char **q = first; q != ptr; ++q) std::cout << *q << std::endl; delete ptr; } int main() { const char* top_100 [] = { "The Lord of the Rings", "Pride and Prejudice", "His Dark Materials", "The Hitchhiker's Guide to the Galaxy", // 93 missing ... "Girls In Love", "The Princess Diaries", "Midnight's Children", }; const char* my_library[] = { "The Hitchhiker's Guide to the Galaxy", "Vintage Jeeves", // and many more ... "His Dark Materials", "Pride and Prejudice", "Emma", "Where Eagles Dare", }; to_buy(top_100, my_library); } |
Listing 1 |
Critiques
Francis Glassborow
First impressions
This is a C program with a couple of C++ elements thrown in.
Worse, it is a C program that exhibits the worst aspects of C’s use of pointers.
This is a program where main()
is abused to hold data
There is no protection against accidental variation in titles such as extra spaces
Updating the list of books requires changing the program with all the potential for more bugs.
Second impressions
Why is the programmer using a template?
Why single letter capital variable names?
Why have we got an explicit use of new
and delete
? A sure indication that the programmer has not got hold of the way C++ is designed to work at a high level.
Third impression
Do I really want to try to unravel all those pointers? I am sure the problem lies in those pointers, and almost certainly in the pointers passed to the sort
function.
OK, time to put it into a compiler. Well, it compiles and runs without error. Well that means that the error is almost certainly buried in those pointers. Let us think about that call to sort
. Exactly what is it sorting? Aha! An array of pointers and, of course, they are pointers to string literals (i.e. non-modifiable arrays of char
) are already in numerical order. Had the programmer stuck with C and used qsort()
they would have to have provided a comparison function because C lacks the idea of a defaulted parameter. Unfortunately the default for comparison used by std::sort()
is wrong for this task.
So there is the clue, std::sort
has a third, defaulted, comparison operator. Writing one requires delving into the mysteries of operator overloading or, more recently, writing a lambda function.
So the first solution is to add:
struct { bool operator() (char const * & x, char const * & y) {return strcmp(x, y)< 0;} } custom_less;
I have deliberately used an anonymous struct
(C style) and then provided an instance at the end. This code is likely to look weird to the original programmer but it effectively provides an object (custom_less
) that behaves like a function that returns true
if the first string literal is before the second one in a standard collation sequence.
Now replace the call to sort
with:
std::sort(first, ptr, custom_less);
And the result will be the one desired.
Even more magical for C programmers writing C++ is to use a lambda function where the comparison function is written inline and the call to sort
is replaced by:
std::sort (first, ptr, [](char const *& x, char const *& y){ return strcmp(x, y) < 0;});
Rewrite
Having fixed the original program using arcane C++ magic it is time to rid ourselves of the arcane C magic used to write the original program. The following program is written in a fairly primitive version of C++ without much use of modern features so it should be reasonably easy for the original programmer to follow.
#include <string> #include <vector> #include <fstream> #include <algorithm> #include <iostream> const std::string get_title( std::ifstream & s){ std::string title; std::getline(s, title); return title; }
The function get_title()
is a place holder for a function that can extract titles from a file with other data such as ISBNs and authors. The test files contain one title per line and are terminated with END as the final entry. Any necessary complexity in getting a title can be placed in this function.
void load_data( std::vector<std::string> & data, std::string const & source){ std::ifstream s(source); if( !s ) throw source + " does NOT exist"; std::string title; std::string const END("END"); title = get_title(s); while(title != END) { data.push_back(title); title = get_title(s); } std::sort(data.begin(), data.end()); return; }
The function load_data()
uses get_title()
to extract titles from source and copy them into the provided vector<string>
. Before returning it sorts the vector so that the titles are now in alphabetical order. It is called twice, once to capture the reading list and once to capture a list of books owned.
Note that it checks that source
correctly names a file. If it does not the function aborts by throwing an exception.
int main(){ try{ std::vector<std::string> read_list; std::string file {"D:\\data_files\\to_read.txt"}; load_data(read_list, file); file = "D:\\data_files\\my_library.txt"; std::vector<std::string> my_books; load_data(my_books, file); for(auto&& title :read_list){ if(my_books.end() == std::find(my_books.begin(), my_books.end(), title)){ std::cout << title << '\n'; } } } catch(std::string error_message) { std::cerr << "***MISSING FILE***\n" << error_message << std::endl;} return 0; }
The only thing that may seem strange to our novice C++ programmer is that I have used the range for that came into the language in 2011. It works so nicely and produces code that is easier to read.
I hope the reader notices that there is not a single explicit use of a pointer (of course they are there but deep inside library code which is where they belong.) Real C++ rather than C using a couple of C++ features is so much more comfortable to use.
Ovidiu Parvu
The source code given in this Code Critique is platform independent. Therefore, I have analysed and updated it in my preferred environment, namely Ubuntu 18.04.4 using clang 10.0.0, and tested it on both Ubuntu and Windows. On Windows, the Visual C++ compiler 19.15.26732.1 for x86 was used.
There are two issues reported with the program compiled from the given source code. First of all, the program crashes on Windows. Secondly, the book titles printed to the standard output are not sorted in alphabetical order.
I was unable to reproduce the crash on Ubuntu or Windows. However, I assume that the program can crash. Due to the use of raw pointers and the manual memory (de)allocations, I hypothesized that the crashes are memory management related. To verify this hypothesis, I have compiled the source code with address sanitizer support enabled as follows:
clang++ -Wall -Werror -Wextra -std=c++17 –g\ -O3 -fsanitize=address the_big_read.cpp\ -o the_big_read
Running the resulting program revealed the first major issue in the source code, namely that the array of string literals (i.e. const char**
) created at the beginning of the to_buy()
function is deallocated incorrectly:
./the_big_read the_big_read.cpp:29:3: error: 'delete' applied to a pointer that was allocated with 'new[]'; did you mean 'delete[]'? [-Werror, -Wmismatched-new-delete] delete ptr; ...
There are two issues with the deallocation. Firstly, a delete
instead of delete []
expression is used to deallocate the array of string literals. Secondly, the pointer used in the delete expression is ptr
instead of first
, and ptr
may not point to the beginning of the array when the delete
expression is executed. To address this issue delete ptr
should be replaced with delete [] first
.
The second reported issue, namely that book titles are not printed to the standard output in alphabetical order, could be reproduced on both Ubuntu and Windows. The cause of the issue is that the book titles to be printed, represented as const char *
, are sorted by character string address rather than character string contents. This issue can be fixed by calling the std::sort()
overload that takes a comparator lambda function as input which compares character string contents rather than character string addresses.
Although the changes described above address the reported issues, the source code can be improved further.
There are three main outstanding issues. Firstly memory is managed manually, which is error prone. Secondly it is possible to inadvertently define duplicate book titles in the top_100
and my_library
collections. Thirdly, the time complexity of computing the collection of books to buy could be reduced. All of these issues can be addressed by changing the type of top_100
and my_library
to std::set<std::string>
and computing the collection of books to buy using std::set_difference()
.
Computing the collection of books to buy using std::set_difference()
rather than two nested loops reduces the time complexity of the to_buy()
function from O(N * M)
to O(N + M)
, where N
and M
are the sizes of the top_100
and my_library
collections, respectively.
The execution time impact of representing the collections of books as std::set
s and computing the collection of books to buy using std::set_difference()
was measured using [google benchmark] (https://github.com/google/benchmark). Results suggest that using std::set
and std::set_difference()
leads to an approximately 14x speedup compared to using const char*
arrays and nested loops.
$ ./benchmark 2020-03-29 19:19:52 Running ./benchmark Run on (4 X 3700 MHz CPU s) CPU Caches: L1 Data 32 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 256 KiB (x4) L3 Unified 6144 KiB (x1) Load Average: 0.34, 0.63, 1.12 ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. ... UsingNestedLoops 36772 ns 36770 ns 18864 ... UsingStdSetDifference 2558 ns 2558 ns 271562
Note that the measurements included defining the collections of books.
Following on from replacing const char*
arrays with std::set<std::string>
collections, the string.h header include, which should have been cstring
, was replaced with a string
header include.
There are two additional minor issues that could be addressed. Firstly std::cout
does not need to be flushed after each book title is printed. Therefore std::endl
should be replaced with '\n'
. Secondly the safety and readability of the source code could be improved respectively by enclosing the std::cout
expression in curly braces, by renaming the to_buy()
function to print_to_buy()
.
Finally, given that book titles are known at compile time, the collection of books to buy could be computed at compile time as well. However, given that constexpr
support is not yet available for the std::set
type, the constexpr
idea was not explored further here.
The revised source code for this Code Critique is made available below.
#include <algorithm> #include <iostream> #include <iterator> #include <set> #include <string> void print_to_buy( const std::set<std::string>&to_read, const std::set<std::string>& own) { std::set<std::string> books_to_buy; std::set_difference(to_read.begin(), to_read.end(), own.begin(), own.end(), std::inserter(books_to_buy, books_to_buy.end())); for (const std::string& book : books_to_buy) { std::cout << book << '\n'; } } int main() { const std::set<std::string> top_100 = { "The Lord of the Rings", ... "Midnight's Children", }; const std::set<std::string> my_library = { "The Hitchhiker's Guide to the Galaxy", ... "Where Eagles Dare", }; print_to_buy(top_100, my_library); }
Hans Vredeveld
Looking at the code, I get the impression that the author is a C-developer that learned some C++. With a few changes to the program it could compile as a C-program, where the biggest change would be in the signature of to_buy
. It has to change from a function template to a normal function that takes the array sizes, A
and B
, as two additional function parameters.
Before we change the program into a modern C++-program, let’s have a look at the things that can be improved.
Except for loading the lists of books to read and books in the library, to_buy
does everything in its body: selecting the books to buy, sorting those books and printing that sorted list.
The meaning of ptr
and first
is a bit mixed up. When ptr
is declared, it becomes the owning pointer to an array, but later in the function it is used to move forward and backward in that array. At the end of the function it is used to delete the array, which fails because ptr
isn’t pointing to the beginning of the array any more. Swapping ptr
and first
in the first two statements and in the last statement would solve this.
In the last statement of to_buy
, delete
has to be delete[]
(together with the former point, this solves the program crash).
The statement *ptr++ = to_read[i];
does two things. It assigns to *ptr
and it increments ptr
. I prefer to have a statement do only one thing and do the assignment and increment in two separate statements.
The if
-statement if (!strcmp(to_read[i], own[j]))
abuses the implicit conversion of an int
to a bool
. It reads like ‘if not compares to_read[i]
to own[j]
’, or ‘if to_read[i]
is not equal to own[j]
’. The statement actually means ‘if to_read[i]
is equal to own[j]
’, that is better written as if (strcmp(to_read[i], own[j]) == 0)
. Even better would be to make to_read
and own
arrays of std::string
, in which case it would become if (to_read[i] == own[j]
).
If an entry of to_read
appears twice in own
, ptr
is decremented twice. In the best case, a book that should be bought is not bought. In the worst case, ptr
is moved to before the array and then the location it is pointing to is written to, resulting in data corruption or a program crash.
The purpose of the inner for-loop is to find if to_read[i]
appears in the array own
. A safer way to achieve this is by using std::find_if
, which also makes the code’s purpose immediately clear.
The statement std::sort(first, ptr);
doesn’t sort the titles, but the pointers to the titles. This is the reason that the output is not sorted.
In the last for
-loop in to_buy
, the list of titles to buy is printed. This binds the function to_buy
tightly to the standard output, making it unusable for other situations. To reduce this binding and make to_buy
more versatile, it should return the list of titles to buy and leave the printing to the calling function, main
.
In main
, top_100
and my_library
are arrays of const char*
and their content is hard-coded. Instead their type could be changed to std::array<std::string, A>
and std::array<std::string, B>
, but then we would have to hard-code the values of A
and B
(unless we can use C++20’s std::to_array
). When the program would work as expected, the first feature I would like to add, would be to read the contents for top_100
and my_library
from files. In that case we don’t know their sizes and it would be logical to make their type std::vector<std::string>
. So I advise doing that right away.
The author already found their way to the <algorithm>
library for std::sort
. But <algorithm>
contains more useful functions. With std::set_difference
we can easily select the titles to buy, under the condition that the top 100 list and the library are sorted and the lists don’t contain duplicate entries.
Putting it all together, the program becomes:
#include <algorithm> #include <iostream> #include <iterator> #include <string> #include <vector> using book_list = std::vector<std::string>; book_list to_buy(book_list const& to_read, book_list const& own) { book_list new_titles; std::set_difference( to_read.begin(), to_read.end(), own.begin(), own.end(), std::back_inserter(new_titles)); return new_titles; } void print(book_list const& titles) { for (auto const& title : titles) std::cout << title << '\n'; } int main() { book_list top_100 = { // titles removed for brevity }; book_list my_library = { // titles removed for brevity }; std::sort(top_100.begin(), top_100.end()); top_100.erase( std::unique(top_100.begin(), top_100.end()), top_100.end()); std::sort(std::begin(my_library), std::end(my_library)); my_library.erase( std::unique(my_library.begin(), my_library.end()),my_library.end()); auto titles_to_buy = to_buy(top_100, my_library); print(titles_to_buy); }
Balog Pal
I almost sent an entry to the previous one. That was starting with "Hmm, we need a full book to cover the problems related to these few lines, or at least several chapters." Then I decided to take the lazy way out, with maybe writing a standalone article if the others leave something important out. This one looks much better fitting. And I really like that it is plausible to come from real life. And it even has the bait with the crash… ☺
The first thing we notice is that we have the whole solution in a single file, no headers, no fluff, and probably would be just a main()
if the OP found a way around using template. That is great. This is meant as a single-use tool to be thrown away. The last thing we want is over-engineering. The problem can probably be solved in a text editor or using the unix shell tools, but those not familiar probably can solve it with python or C++ in a similar timeframe.
I start inspecting in main()
. We have the inputs embedded, instead of read from some outside source. Again, that is a good thing. What I miss is top-level const
on the collections, as we clearly have no intention to patching them up. Using const char *
is not incorrect, however I mark it for attention to avoid comparing pointers instead of strings. Use of old-style array is also suspect. With the latest C++ the old arrays are not that hated, we can even iterate on them without fiddling with the size. But excuses on not using std::array
are in short supply. And the rest is a single call to the template function, that seems to just serve to deduce the bounds. That could have been just done using size()
. Or we could say std::array top_100 const = {...};
and have everything deduced. (Finally… the need of the explicit bound was a serious impediment before C++17.)
Before inspecting how the material is processed, let’s take a detour and think about the problem and the solution. That is my usual method on code reviews, reading the supplied solution can bend the thinking process too much. While if I came up with the same solution increases the confidence.
When I read the problem the first thing that jumped to my mind was that it sounds awfully like an interview question. I kinda heard the whole conversation.
Interviewer: I’m interested how you write programs, I’ll fetch a keyboard, can you write some code for me?
Me: No.
I: Come again?
Me: You heard it right, I said no as in no way.
I (baffled): How come? Can’t you write code?
Me: Sure I can, and I do a lot. When the circumstances are right. Not in ad hoc way. And especially not starting with a keyboard. It has a level requirement of like 4 and we did not even see the dungeon entrance. It is not even supposed to be in the room serving as distraction. But I’m happy to talk about you problem, figure out a solution. It has some chance to be used at some later point.
I: (Should I just kick this weirdo? I have nothing to do for another 5 minutes so let’s roll with it for a while)
Ok, so suppose I need to complete reading a bunch of books listed on a website. I have a small library, but at a glance I miss plenty of those. I have an excel sheet with my inventory and want to get a shopping list for the missing items. Write me a C++ program that provides it.
Me: I think I understand the problem and believe it can be solved. Let me put it in formal terms and you verify that it matches your aim.
I: Ok.
Me: So, we have a set of books (I form a circle using my left thumb and index finger) that I already have, let’s call it H. We have a set of books (showing similar circle with my right hand) the website requires, and so I want to have, let’s call it W. So the intersection (I overlap the circles like an improvised Venn diagram) has the books that I already have. And the difference W-H has those I need to obtain. The goal is to have this latter.
I: Yeah, you got it right.
Me: Fine. That is definitely doable if we can extract the information. Can we? If so, what and in what format?
I: Yes. You can get a list of titles. You can choose the format within reason.
Me: Well, I can just export the Excel into CSV, and that I can even just paste right in the code. Can the website also give me a CSV too?
I: Yeah, that works.
Me: That settles the inputs and the processing, what do you want as the output?
I: Whatever goes, you can just dump the titles on the console one title per line.
Me: Anything else?
I: It would be nice to have it sorted.
Me: And you want a program in C++? I think I can just arrange it right in the Excel or using shell tools.
I: I agree it’s somewhat arbitrary, but let’s make C++ a hard requirement. Is that a problem? (Prepared for some really lame next excuse.)
Me: No, the client is entitled to have whatever requirements, my job is just to make sure he means them and that we have proper understanding. I certainly point out if it is possible but not feasible while better ways exist. For this case we can go ahead, as it doesn’t look a big deal.
I (relieved): I’m glad to hear that.
Me: So, good news, C++ happens to have kinda native support for this case. So it can be done in just a handful lines of code that almost matches the formal description. We have a function std::set_difference
. If we just feed it with our input lists, will provide the desired output. We can even directly make at as you described passing an ostream_iterator
looking at cout
and newline as the delimiter. It requires two sorted ranges with the content of sets W and H. We have luck with that too, as we have std::set
that is auto sorted. So we just construct the set<Book>
const using brace init with the text in the CSV files. And pass .begin()
and .end()
. That’s it, we’re done. By the way it happens to emit the output in its original order, that was sorted, so you have that part too.
I (tempted to ask to repeat the part after ‘good news’ to catch up): Are we?
Me: Well, almost, we have one open point on Book
. To work in the setup it needs three things (beyond being fit in a container): implicit construction from string literal, less operator creating an order that fits for us and the standard ordering, and usable with ostream
. Neither is a big deal. For details we need additional design discussion. Say in the constructor it could invoke some web service passing the title and maybe get an ISBN number. Or do some transformations on the title like losing the punctuation, the international characters, make all lowercase, maybe remove all the whitespace. I think I would be unhappy to buy a second copy of a book just because I had an extra space somewhere. Or a different apostrophe from the wide selection. We could store the original title and the converted/gathered info, using that for the compare. Unfortunately, that could mess with the order of the final output, say, by ISBN instead of title… We can compensate for that collecting in a vector then sort that using a different criteria before copying to cout
. Any specific requirements?
I (feeling overwhelmed): No, I’m fine with the simplest solution, just strcmp
is fine on the title as provided. So would you write a Book
class?
Me: In that case there is no reason, we can just use std::string
as Book
, it passes all our criteria out of the box.
I: Couldn’t we just use the const char *
s from the literals without extra steps?
Me: It’s possible too. Of the requirements stated earlier it passes all but the sorting one. For our desired semantics at least. But we can make up for that, both set and the algo can take a comparator as additional parameter. Easiest is to create a function like auto const cmp = [](char const* a, char const* b) { return strcmp(a, b) < 0;};
and feed it as the second parameter of the set and 6th parameter of the function. We even save some characters, as we can just say set W = {{"a", "b"}, cmp};
without template arguments and let it be deduced.
Me: I think we covered all the necessary ground. If you’re still interested, you can bring the keyboard now and observe that my fingers are indeed pretty lame using an unfamiliar specimen in a foreign environment.
I: No, thanks, we’re good. (Maybe this guy has a point there, all the other folks just jumped on the keyboard and the few who even provided something that works arrived at some arbitrary variant without alternatives considered or even a hint that the result will match the desires before a lot of time invested… Next time I’ll at least wait till the keyboard is demanded.)
After the detour we covered the solution, so let’s see what OP provided instead of that. In function to_buy
(that shows poor naming that would not be a problem if we never had the function in the first place…) we spot use of strcmp
and sort
that are good signs. We also spot use of new
and delete
that is a very bad sign: we found no reason whatsoever to deal with raw memory to start with, and if we had a reason wild delete is still on top of the forbidden list: allocated stuff should go to an owner like a unique_ptr
. To add injury to the insult we use array new
matched with plain delete
, that is undefined behavior – good enough reason for the observed crash. Even if we managed to pass the proper pointer, the one pointing to the allocation. In this case first
, not ptr
. Keep in mind to properly match your delete not using delete avoids this whole can of worms.
Now, what are we using the memory for? Looks we’re collecting the titles for the output. Like as we had vector<char const*>
reserved to size A, push_back
the item under scrutiny, then pop_back
if we found it in own
. An obvious improvement would have been doing just that and avoiding the crash while making the code more obvious (while probably doing the exact same thing under the hood.) An obvious chance to further improvement would be to avoid the jojoing, and just collect the item when found worthy. Maybe OP tried first to issue a continue
when strcmp
came out with 0. Then realized that he meant to continue
the outer loop, not the inner. Might even made some noises mentioning WG21 and evil killing all the proposals to have the labeled break
and continue
other languages have… Then resorted to use the bouncing instead of the prescribed good measure that sits in the language: label
and goto
. Or the more prescribed measures like replace the loop with the algorithm it tries to re-implement: that would be find_if
in this case. What would also make the code more readable, meanwhile avoiding the continue
of the scan after a hit for possible other instances of the title in the list, that are not there. Hopefully, as if they are we start popping other items from the collection, then hitting UB after passing first.
Suppose we fixed the mentioned issues, and get to sort with a proper state. Did I mention up front to look out for using strcmp
? Well, we did at one place, did not with this second. Sort works fine, certainly, but it will create the order by the pointer values, not strings, so the output may be whatever for practical purposes. We need to pass in a sorting function here too. Could have avoided the whole thing by just using std::string
as holder.
The rest is more like nitpicking, on using endl
without a good reason, not including all the required headers, not using auto
… probably boring details considering the amount of real problems and missing the simple and effective solution.
Effective? one could ask as objection. Doesn’t using string
, vector
, set
and the other goodies waste plenty of memory and processing cycles? Stupid question… NOT! (TANSTAASQ!) Performance is certainly a concern… WHEN it is. If we run this program and it reports out of memory, or crashes due to that, or takes hours to produce the output, then yes, we’re wasting stuff that we might be able to conserve. And in that case it is worth to look closer. However if the output is on the screen before we release Enter all the way? Then we can just acknowledge that we have our goal completed. (Provided the answer is correct that is). If it took a fill 80 extra nanoseconds to run, so be it. Or used an extra 10k out of our 6G pool of free memory. (left from the 14G pool, by the 312 open tabs in our browser and 2 dozen gizmos in the alert area we know nothing about).
Anyway, let’s give it some consideration. This solution has N2 scaling over the NlogN using sort
and set_difference
. So we can expect the latter to consistently win as we get to big numbers on the list size. We know that one list has 100 members, the other is unknown, probably in the 50–3000 range. Well above the 8 that is a popular cut-off point.
As the memory footprint, set
is fairly wasteful, if concerned we’re better off using array
and sort it (that could be better on time too.) std::string
certainly has an overhead over the plain literal, and we only used it to avoid passing the compare function. On the execution level it is a poor tradeoff. But it has benefits on much harder to mess up and easier to read/review.
And as an extra point tied to performance, the whole consideration may be moot. Let’s notice that our program is pretty shy on observable behavior. The only observable behavior is the output itself. So the compiler has freedom to just have the final output text in the image and dump that. On every variant (that we fixed to be UB-free and correct wrt. our goal). I would be surprised if none of the other entries have shown a solution that works consistently at compile time. It can be done. Sprinkling the code with constexpr
and arranging it to be compile-time friendly may help today. But a good optimizer is allowed to go ahead without all that help. And some might even do it already.
James Holland
The CVu magazine’s version of the program needs string.h included to provide a definition of strcmp()
. The [accu-general] email version has string.h
included.
There are two errors in the code that are causing undefined behaviour and probably causing the crash when run under Windows. Firstly, the wrong pointer is being used to delete the dynamically created array. The pointer first
should be used, not ptr
. Perhaps the mistake would not have happened if the first two lines of to_buy()
had been written as follows.
const char ** const first = new const char *[A]; const char ** ptr = first;
In this way it makes it clearer that first is meant to be the custodial pointer and would, therefore, be used to delete
the array. Also, by making first a const
pointer prevents its value from being changed. Secondly, the non-array form of delete
is used to destroy the array. The required form is delete[]
. The code now compiles and runs. The problem of the output not being sorted remains.
The sort
function is doing exactly what it is told. It is sorting the elements of the array pointed to by first
according to the value of the elements. The elements of the array are already in ascending order as they point to strings that are at successively higher locations in memory. This can be demonstrated by sorting the pointers in descending order by adding the greater()
predicate to std::sort
. The list of books to purchase now appears in the opposite order. In either case, the array is not sorted in alphabetical order of their strings. The sort function needs modification.
As indicated above, the sort criteria is incorrect. Instead of sorting by the values of the pointers in the array, we need to sort according to the alphabetical order of the strings pointed to be the pointers. The sort
function can be modified by adding a predicate in the form of a lambda expression.
std::sort(first, ptr, [](const char * const a, const char * const b){ return strcmp(a, b) < 0; });
The lambda makes use of strcmp
to compare the two strings and returns true
if the first string comes alphabetically before the second string. With this modification, the program produces the desired result.
There are a few ways in which the program can be improved. When it is discovered that a required book is already in the library, there is no need to continue searching the library for the same book. A break
statement can, therefore, be added after decrementing the ptr
pointer.
Although it does not make any difference when using built-in types, it is conventional to use the pre-decrement operator in preference to the post-decrement operator when decrementing a variable. With user-defined types, the pre-decrement operator is usually more efficient. In keeping with convention, the statement to decrement the ptr
pointer should be --pre;
. Also, there is no need to flush std::cout
when printing the books to purchase. A simple '\n'
will do instead of std::endl
.
The parameters of to_buy()
could be declares const
(as well as to what they point). This will help, to some extent, make the code self documenting. The pointer first
could also be declared const
for the same reason. If we want to be thorough about this, the pointer q
in the for
-loop that prints the result could be declared as a pointer that points to a const
pointer, i.e. const char * const * q
. The pointer itself cannot be declared const
as it will be incremented within the loop.
The names of the template parameters of to_buy()
could be a little more descriptive. Perhaps A
and B
could be renamed to_read_length
and own_length
respectively, or something similar.
The supplied program has been written using very low-level constructs. There are pointers and even pointers to pointers. The student has chosen raw pointers to be responsible for deleting objects created on the heap. A better choice would be to use std::unique_ptr
because the arrays would be automatically deleted when std::unique_ptr
goes out of scope, even when an exception is thrown. Better still would be to use std::vector
when relatively large arrays are required as they will delete themselves when they go out of scope. Low-level features may be required occasionally but they make the program difficult to write, understand and maintain.
Despite this, there are some interesting features to the program. One particularly intriguing feature is the use of a template to define the types of the parameters passed to to_buy()
. Without a feature like this the array type would decay to a pointer to a char
(in this case) and the size of the array would be lost. It is sometimes the case that ‘clever features’ are a sign that the program is too elaborate and complicated and that there is a simpler way to construct a program that performs the same function.
As mentioned above, the student’s software is written at the very low level. If it could be determined what the software is required to achieve in fundamental terms, then it might be possible to use some high-level constructs (preferably provided by a library) to write a program that is shorter and simpler to understand. After a little thought, it is realised that what is required is for the program to print the elements of one set that are not in another set. That is all! This is such a fundamental and useful idea that there must be existing code to do the job. Well, there is and it’s in the C++ standard library by the name of set_difference()
. This function takes two sets, calculates the difference (the elements that are in one set and not in the other) and then copies them, in sorted order, to an output stream. This is exactly what is required. My version of the program makes use of set_difference()
and consists of just three C++ statements. By making appropriate use of an existing implementation of an algorithm, the program has become considerably simpler.
#include <algorithm> #include <iostream> #include <iterator> #include <set> #include <string.h> int main() { const std::set<std::string> top_100 { //... }; const std::set<std::string> my_library { //... }; std::set_difference( top_100.cbegin(), top_100.cend(), my_library.cbegin(), my_library.cend(), std::ostream_iterator<std::string>( std::cout, "\n")); }
But what about the speed of execution? The student’s code is specially written to solve a specific problem, so surely a general purpose library function must be slower. The best way to find out is to perform some tests. I arranged for my_library
and top_100
to have the same number, n
, of random titles. I then added a further 50 randomly generated titles to top_100
. I then ran the two programs and timed their execution. The results for various values of n
are shown in Table 1.
Table 1: Program Performance Test Results | |||
---|---|---|---|
Number of titles (n) | Execution time / seconds | Performance factor | |
Student’s program | Using set_difference | ||
1 | 0.00029 | 0.000239 | 1.21 |
10 | 0.000249 | 0.000245 | 0.98 |
100 | 0.000358 | 0.000292 | 1.22 |
1000 | 0.00214 | 0.000203 | 10.55 |
10000 | 0.17435 | 0.000737 | 236.56 |
100000 | 35.194 | 0.01183 | 2975 |
1000000 | > 1800 | 0.135 | > 13333 |
After waiting half and hour for the student’s program to work its way through a million books, I gave up and put it out of its misery. For values of up to about 100 there is not much difference in the speed of execution between the student’s program and the one using set_difference
as the execution time is dominated by the time taken to print the result. As the number of books increase, it can be seen that the program using set_difference
gets significantly faster than the student’s code. The execution time of the student’s code increases exponentially with n
whereas the set_difference
program increases linearly with n
. The way set_difference
achieves this performance is by relying on the data sets being sorted. This is ensured because std::set
stores its data (in this case, the film titles) in sorted order. By spending a little time sorting the data at the beginning of the program allows set_difference
to have very good performance, especially with large data sets.
In summary, it is always worth thinking about what a program is required to do in abstract terms and then seeing whether there is an existing algorithm that can be used to perform the task. A great deal of thought by some clever people has gone into the design of library functions and it is often advantageous to make use of their labours. The advice is to familiarise oneself with the C++ standard library (and other libraries) to see what algorithms and functions are available and attempt to program at a higher level of abstraction.
Commentary
First, a comment about string.h which, as James noted, was in the version emailed to accu-general but not in that printed in the magazine… This is because needing this header depends on which compiler/library combination you are using. In C++ it’s unspecified which other things may be included when you use a standard library header and on one of the environments I tested the header was implicitly included. While there are some good reasons for this flexibility, it does make writing portable code is little harder (and even code that works with different versions of the same compiler). This thing to bear in mind is that you should list in your source the include files the code requires. So since the code in this critique uses strcmp
it ought to include <cstring>
.
This critique is an example of what Pal describes as ‘a single-use tool to be thrown away’. The context can matter: in this case in particular, the list of 100 top books from the BBC Good Read is a given, so having it in source code for a program of this kind may be perfectly fine; however, I suspect the list of books in ‘my library’ ought to be held separately as, presumably, I am going to be purchasing some of the books that I’m missing and so my library will change! However, unless I have very deep pockets, my library is not going to be more than a few thousand volumes, so worrying too much about the performance of larger collections is likely to be out of scope in this particular case. (Of course, programs (and fragments of programs) can get re-used in very different environments, but I think it is the person re-using the code that needs to consider the performance issues since only they know the actual constraints in the new use case.)
As several people noted, the code is a C++ program but with a strong flavour of C. There seems no reason for the use of C features (such as raw pointers and use of strcmp
) and I think it makes sense to try to provide the programmer with a more idiomatic C++ program. This includes making using other algorithms, such as set_difference
. Looking at James’s solution, for example, just above this commentary, the program consists of two statements for initialisation and one to actually do all the work of the program. The result, in my mind, is a program that is obviously correct. To quote Tony Hoare’s famous dictum:
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies.
We should encourage newcomers to C++ to use the algorithms! We have quite a few, and teaching people to know them, and how to recognise when they can be used, is very useful.
The winner of CC 122
All the critiques correctly identified that the sort order was broken because of sorting the pointer values of the string literals. Moving to a value-based type, such as std::string
(or, in this case, std::string_view
), provides an easy way to solve that problem.
It was good to see that Ovidiu, for one, reached for a tool (in this case address sanitizer) to help diagnose the crash. Tooling can be an easy win for finding problems like these (and even latent ones that are currently without symptoms.)
Francis noted the program has no protection against bad typing or spurious spaces and both Ovidiu and Hans noted that a duplicate title in one of the lists would cause problems, especially with the original code. Francis’s separation of the get_title
function as a place to add logic to do something cleverer with the title list was a good idea.
I liked Pal’s attempt to fill in a possible back-story to the critique, and thought this was a creative way to put across some of the points in his critique. I also liked his use of CTAD (class template argument deduction) for std::array
- as he says, this removes the pain point of having to provide the array bound explicitly.
The measurements of the performance costs of the algorithms over a range of input sizes from James were very interesting: they show how much difference the right algorithm can make at scale.
Overall there were some good solutions (and very little left uncovered) so thanks all for your entries… and I am awarding the prize for this critique to Pal, by a short head.
Code Critique 123
(Submissions to scc@accu.org by June 1st)
A shorter critique than usual this time, and in C, to give some variety.
Please find a bug in my code based on finding whether a positive integer can be expressed as pow(a, b) where a and b are non negative integers and b>1. My code outputs for 1000 (103), 0 where it should be 1. Please find the bug…
Thanks to Francis Glassborow for passing this example on to me, originally from alt.comp.lang.learn.c-c++. He also points out that this competition is now in its 21st year (he originated it). That means it has been running for longer than most software/developer magazines.
The code is in Listing 2 (pow.c) and Listing 3 (test.c). Can you help?
/** * @input A : Integer * * @Output Integer */ int isPower(int A) { if(A==1) return 1; else { int i; if(A%2==0) { for(i=2; i<=sqrt(A); i=i+2) if(pow(i, (int)(log(A)/log(i)))==A) return 1; } else { for(i=3; i<=sqrt(A); i=i+2) if(pow(i, (int)(log(A)/log(i)))==A) return 1; } return 0; } } |
Listing 2 |
#include <stdio.h> int main(void) { int A = 1000; printf("isPower(1000) => %i\n", isPower(A)); return 0; } |
Listing 3 |
You can also get the current problem from the accu-general mail list (next entry is posted around the last issue’s deadline) or from the ACCU website (http://accu.org/index.php/journal). This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.
has been programming for over 20 years, most recently in C++ and Java for various investment banks in Canary Wharf and the City. He joined ACCU in 1999 and the BSI C++ panel in 2002.
Notes:
More fields may be available via dynamicdata ..