Student Code Critiques from CVu journal. + CVu Journal Vol 32, #2 - May 2020

Browse in :

All > Journal Columns > Code Critique
All > Journals > CVu > 322
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Code Critique Competition 123

Author: Bob Schmidt

Date: 01 May 2020 17:43:17 +01:00 or Fri, 01 May 2020 17:43:17 +01:00

Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.

Body:

Please note that participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.

Note: if you would rather not have your critique visible online please inform me. (Email addresses are not publicly visible.)

Last issueâ€™s code

I was set a challenge to read the â€˜Top 100 booksâ€™ (from the BBC Big Read a few years ago) so thought I could start by seeing which ones I needed to get. Obviously this meant writing a program. Unfortunately, the output from the program isnâ€™t sorted, although I was expecting it would be; and in addition, it crashed on Windows.

Can you help the author fix their program so they can begin to catch up with their reading? The program (the_big_read.cpp) is in Listing 1.

#include <algorithm>
#include <iostream>
template <int A, int B>
void to_buy(const char *(&to_read)[A],
  const char *(&own)[B])
{
  const char** ptr = new const char *[A];
  const char **first = ptr;
  for (int i = 0; i != A; ++i)
  {
    *ptr++ = to_read[i];
    for (int j = 0; j != B; ++j)
    {
      if (!strcmp(to_read[i], own[j]))
      {
        ptr--; // already own it!
      }
    }
  }
  std::sort(first, ptr);
  for (const char **q = first; q != ptr; ++q)
    std::cout << *q << std::endl;
  delete ptr;
}
int main()
{
  const char* top_100 [] = {
    "The Lord of the Rings",
    "Pride and Prejudice",
    "His Dark Materials",
    "The Hitchhiker's Guide to the Galaxy",
    // 93 missing ...
    "Girls In Love",
    "The Princess Diaries",
    "Midnight's Children",
    };
  const char* my_library[] = {
    "The Hitchhiker's Guide to the Galaxy",
    "Vintage Jeeves",
    // and many more ...
    "His Dark Materials",
    "Pride and Prejudice",
    "Emma",
    "Where Eagles Dare",
    };
  to_buy(top_100, my_library);
}

Listing 1

Critiques

Francis Glassborow

First impressions

This is a C program with a couple of C++ elements thrown in.

Worse, it is a C program that exhibits the worst aspects of Câ€™s use of pointers.

This is a program where main() is abused to hold data

There is no protection against accidental variation in titles such as extra spaces

Updating the list of books requires changing the program with all the potential for more bugs.

Second impressions

Why is the programmer using a template?

Why single letter capital variable names?

Why have we got an explicit use of new and delete? A sure indication that the programmer has not got hold of the way C++ is designed to work at a high level.

Third impression

Do I really want to try to unravel all those pointers? I am sure the problem lies in those pointers, and almost certainly in the pointers passed to the sort function.

OK, time to put it into a compiler. Well, it compiles and runs without error. Well that means that the error is almost certainly buried in those pointers. Let us think about that call to sort. Exactly what is it sorting? Aha! An array of pointers and, of course, they are pointers to string literals (i.e. non-modifiable arrays of char) are already in numerical order. Had the programmer stuck with C and used qsort() they would have to have provided a comparison function because C lacks the idea of a defaulted parameter. Unfortunately the default for comparison used by std::sort() is wrong for this task.

So there is the clue, std::sort has a third, defaulted, comparison operator. Writing one requires delving into the mysteries of operator overloading or, more recently, writing a lambda function.

So the first solution is to add:

  struct {
  bool operator() (char const * & x,
                   char const * & y)
  {return strcmp(x, y)< 0;}
  } custom_less;

I have deliberately used an anonymous struct (C style) and then provided an instance at the end. This code is likely to look weird to the original programmer but it effectively provides an object (custom_less) that behaves like a function that returns true if the first string literal is before the second one in a standard collation sequence.

Now replace the call to sort with:

  std::sort(first, ptr, custom_less);

And the result will be the one desired.

Even more magical for C programmers writing C++ is to use a lambda function where the comparison function is written inline and the call to sort is replaced by:

  std::sort (first, ptr,
  [](char const *& x, char const *& y){
  return strcmp(x, y) < 0;});

Rewrite

Having fixed the original program using arcane C++ magic it is time to rid ourselves of the arcane C magic used to write the original program. The following program is written in a fairly primitive version of C++ without much use of modern features so it should be reasonably easy for the original programmer to follow.

  #include <string>
  #include <vector>
  #include <fstream>
  #include <algorithm>
  #include <iostream>
  const std::string get_title(
    std::ifstream & s){
    std::string title;
    std::getline(s, title);
    return title;
    }

The function get_title() is a place holder for a function that can extract titles from a file with other data such as ISBNs and authors. The test files contain one title per line and are terminated with END as the final entry. Any necessary complexity in getting a title can be placed in this function.

  void load_data(
    std::vector<std::string> & data,
    std::string const & source){
      std::ifstream s(source);
      if( !s ) throw source + " does NOT exist";
      std::string title;
      std::string const END("END");
      title = get_title(s);
      while(title != END) {
        data.push_back(title);
        title = get_title(s);
      }
      std::sort(data.begin(), data.end());
      return;
  }

The function load_data() uses get_title() to extract titles from source and copy them into the provided vector<string>. Before returning it sorts the vector so that the titles are now in alphabetical order. It is called twice, once to capture the reading list and once to capture a list of books owned.

Note that it checks that source correctly names a file. If it does not the function aborts by throwing an exception.

  int main(){
    try{
      std::vector<std::string> read_list;
      std::string file
        {"D:\\data_files\\to_read.txt"};
      load_data(read_list, file);
      file = "D:\\data_files\\my_library.txt";
      std::vector<std::string> my_books;
      load_data(my_books, file);
      for(auto&& title :read_list){
        if(my_books.end() ==
          std::find(my_books.begin(),
            my_books.end(), title)){
          std::cout << title << '\n';
       }
      }
    }
    catch(std::string error_message) {
      std::cerr << "***MISSING FILE***\n"
                << error_message << std::endl;}
  return 0;
  }

The only thing that may seem strange to our novice C++ programmer is that I have used the range for that came into the language in 2011. It works so nicely and produces code that is easier to read.

I hope the reader notices that there is not a single explicit use of a pointer (of course they are there but deep inside library code which is where they belong.) Real C++ rather than C using a couple of C++ features is so much more comfortable to use.

Ovidiu Parvu

The source code given in this Code Critique is platform independent. Therefore, I have analysed and updated it in my preferred environment, namely Ubuntu 18.04.4 using clang 10.0.0, and tested it on both Ubuntu and Windows. On Windows, the Visual C++ compiler 19.15.26732.1 for x86 was used.

There are two issues reported with the program compiled from the given source code. First of all, the program crashes on Windows. Secondly, the book titles printed to the standard output are not sorted in alphabetical order.

I was unable to reproduce the crash on Ubuntu or Windows. However, I assume that the program can crash. Due to the use of raw pointers and the manual memory (de)allocations, I hypothesized that the crashes are memory management related. To verify this hypothesis, I have compiled the source code with address sanitizer support enabled as follows:

  clang++ -Wall -Werror -Wextra -std=c++17 â€“g\
    -O3 -fsanitize=address the_big_read.cpp\
    -o the_big_read

Running the resulting program revealed the first major issue in the source code, namely that the array of string literals (i.e. const char**) created at the beginning of the to_buy() function is deallocated incorrectly:

  ./the_big_read
  the_big_read.cpp:29:3: error: 'delete' applied
  to a pointer that was allocated with 'new[]';
  did you mean 'delete[]'? [-Werror,
  -Wmismatched-new-delete]
    delete ptr;
  ...

There are two issues with the deallocation. Firstly, a delete instead of delete [] expression is used to deallocate the array of string literals. Secondly, the pointer used in the delete expression is ptr instead of first, and ptr may not point to the beginning of the array when the delete expression is executed. To address this issue delete ptr should be replaced with delete [] first.

The second reported issue, namely that book titles are not printed to the standard output in alphabetical order, could be reproduced on both Ubuntu and Windows. The cause of the issue is that the book titles to be printed, represented as const char *, are sorted by character string address rather than character string contents. This issue can be fixed by calling the std::sort() overload that takes a comparator lambda function as input which compares character string contents rather than character string addresses.

Although the changes described above address the reported issues, the source code can be improved further.

There are three main outstanding issues. Firstly memory is managed manually, which is error prone. Secondly it is possible to inadvertently define duplicate book titles in the top_100 and my_library collections. Thirdly, the time complexity of computing the collection of books to buy could be reduced. All of these issues can be addressed by changing the type of top_100 and my_library to std::set<std::string> and computing the collection of books to buy using std::set_difference().

Computing the collection of books to buy using std::set_difference() rather than two nested loops reduces the time complexity of the to_buy() function from O(N * M) to O(N + M), where N and M are the sizes of the top_100 and my_library collections, respectively.

The execution time impact of representing the collections of books as std::sets and computing the collection of books to buy using std::set_difference() was measured using [google benchmark] (https://github.com/google/benchmark). Results suggest that using std::set and std::set_difference() leads to an approximately 14x speedup compared to using const char* arrays and nested loops.

  $ ./benchmark
  2020-03-29 19:19:52
  Running ./benchmark
  Run on (4 X 3700 MHz CPU s)
  CPU Caches:
    L1 Data 32 KiB (x4)
    L1 Instruction 32 KiB (x4)
    L2 Unified 256 KiB (x4)
    L3 Unified 6144 KiB (x1)
  Load Average: 0.34, 0.63, 1.12
  ***WARNING*** CPU scaling is enabled, the
  benchmark real time measurements may be noisy and
  will incur extra overhead.
  ...
  UsingNestedLoops
       36772 ns        36770 ns        18864
  ...
  UsingStdSetDifference
        2558 ns         2558 ns       271562

Note that the measurements included defining the collections of books.

Following on from replacing const char* arrays with std::set<std::string> collections, the string.h header include, which should have been cstring, was replaced with a string header include.

There are two additional minor issues that could be addressed. Firstly std::cout does not need to be flushed after each book title is printed. Therefore std::endl should be replaced with '\n'. Secondly the safety and readability of the source code could be improved respectively by enclosing the std::cout expression in curly braces, by renaming the to_buy() function to print_to_buy().

Finally, given that book titles are known at compile time, the collection of books to buy could be computed at compile time as well. However, given that constexpr support is not yet available for the std::set type, the constexpr idea was not explored further here.

The revised source code for this Code Critique is made available below.

  #include <algorithm>
  #include <iostream>
  #include <iterator>
  #include <set>
  #include <string>
  void print_to_buy(
    const std::set<std::string>&to_read,
    const std::set<std::string>& own) {
    std::set<std::string> books_to_buy;
    std::set_difference(to_read.begin(),
      to_read.end(), own.begin(), own.end(),
      std::inserter(books_to_buy,
      books_to_buy.end()));
    for (const std::string& book : books_to_buy) {
      std::cout << book << '\n';
    }
  }
  int main() {
    const std::set<std::string> top_100 = {
      "The Lord of the Rings",
      ...
      "Midnight's Children",
    };
    const std::set<std::string> my_library = {
      "The Hitchhiker's Guide to the Galaxy",
      ...
      "Where Eagles Dare",
    };
    print_to_buy(top_100, my_library);
  }

Hans Vredeveld

Looking at the code, I get the impression that the author is a C-developer that learned some C++. With a few changes to the program it could compile as a C-program, where the biggest change would be in the signature of to_buy. It has to change from a function template to a normal function that takes the array sizes, A and B, as two additional function parameters.

Before we change the program into a modern C++-program, letâ€™s have a look at the things that can be improved.

Except for loading the lists of books to read and books in the library, to_buy does everything in its body: selecting the books to buy, sorting those books and printing that sorted list.

The meaning of ptr and first is a bit mixed up. When ptr is declared, it becomes the owning pointer to an array, but later in the function it is used to move forward and backward in that array. At the end of the function it is used to delete the array, which fails because ptr isnâ€™t pointing to the beginning of the array any more. Swapping ptr and first in the first two statements and in the last statement would solve this.

In the last statement of to_buy, delete has to be delete[] (together with the former point, this solves the program crash).

The statement *ptr++ = to_read[i]; does two things. It assigns to *ptr and it increments ptr. I prefer to have a statement do only one thing and do the assignment and increment in two separate statements.

The if-statement if (!strcmp(to_read[i], own[j])) abuses the implicit conversion of an int to a bool. It reads like â€˜if not compares to_read[i] to own[j]â€™, or â€˜if to_read[i] is not equal to own[j]â€™. The statement actually means â€˜if to_read[i] is equal to own[j]â€™, that is better written as if (strcmp(to_read[i], own[j]) == 0). Even better would be to make to_read and own arrays of std::string, in which case it would become if (to_read[i] == own[j]).

If an entry of to_read appears twice in own, ptr is decremented twice. In the best case, a book that should be bought is not bought. In the worst case, ptr is moved to before the array and then the location it is pointing to is written to, resulting in data corruption or a program crash.

The purpose of the inner for-loop is to find if to_read[i] appears in the array own. A safer way to achieve this is by using std::find_if, which also makes the codeâ€™s purpose immediately clear.

The statement std::sort(first, ptr); doesnâ€™t sort the titles, but the pointers to the titles. This is the reason that the output is not sorted.

In the last for-loop in to_buy, the list of titles to buy is printed. This binds the function to_buy tightly to the standard output, making it unusable for other situations. To reduce this binding and make to_buy more versatile, it should return the list of titles to buy and leave the printing to the calling function, main.

In main, top_100 and my_library are arrays of const char* and their content is hard-coded. Instead their type could be changed to std::array<std::string, A> and std::array<std::string, B>, but then we would have to hard-code the values of A and B (unless we can use C++20â€™s std::to_array). When the program would work as expected, the first feature I would like to add, would be to read the contents for top_100 and my_library from files. In that case we donâ€™t know their sizes and it would be logical to make their type std::vector<std::string>. So I advise doing that right away.

The author already found their way to the <algorithm> library for std::sort. But <algorithm> contains more useful functions. With std::set_difference we can easily select the titles to buy, under the condition that the top 100 list and the library are sorted and the lists donâ€™t contain duplicate entries.

Putting it all together, the program becomes:

  #include <algorithm>
  #include <iostream>
  #include <iterator>
  #include <string>
  #include <vector>
  using book_list = std::vector<std::string>;
  book_list to_buy(book_list const& to_read,
    book_list const& own)
  {
    book_list new_titles;
    std::set_difference(
      to_read.begin(), to_read.end(),
      own.begin(), own.end(),
      std::back_inserter(new_titles));
    return new_titles;
  }
  void print(book_list const& titles)
  {
    for (auto const& title : titles)
      std::cout << title << '\n';
  }
  int main()
  {
    book_list top_100 = {
      // titles removed for brevity
    };
    book_list my_library = {
      // titles removed for brevity
      };
    std::sort(top_100.begin(), top_100.end());
    top_100.erase(
      std::unique(top_100.begin(),
      top_100.end()), top_100.end());
    std::sort(std::begin(my_library),
      std::end(my_library));
    my_library.erase(
      std::unique(my_library.begin(),
      my_library.end()),my_library.end());
    auto titles_to_buy =
      to_buy(top_100, my_library);
    print(titles_to_buy);
  }

Balog Pal

I almost sent an entry to the previous one. That was starting with "Hmm, we need a full book to cover the problems related to these few lines, or at least several chapters." Then I decided to take the lazy way out, with maybe writing a standalone article if the others leave something important out. This one looks much better fitting. And I really like that it is plausible to come from real life. And it even has the bait with the crashâ€¦ ☺

The first thing we notice is that we have the whole solution in a single file, no headers, no fluff, and probably would be just a main() if the OP found a way around using template. That is great. This is meant as a single-use tool to be thrown away. The last thing we want is over-engineering. The problem can probably be solved in a text editor or using the unix shell tools, but those not familiar probably can solve it with python or C++ in a similar timeframe.

I start inspecting in main(). We have the inputs embedded, instead of read from some outside source. Again, that is a good thing. What I miss is top-level const on the collections, as we clearly have no intention to patching them up. Using const char * is not incorrect, however I mark it for attention to avoid comparing pointers instead of strings. Use of old-style array is also suspect. With the latest C++ the old arrays are not that hated, we can even iterate on them without fiddling with the size. But excuses on not using std::array are in short supply. And the rest is a single call to the template function, that seems to just serve to deduce the bounds. That could have been just done using size(). Or we could say std::array top_100 const = {...}; and have everything deduced. (Finallyâ€¦ the need of the explicit bound was a serious impediment before C++17.)

Before inspecting how the material is processed, letâ€™s take a detour and think about the problem and the solution. That is my usual method on code reviews, reading the supplied solution can bend the thinking process too much. While if I came up with the same solution increases the confidence.

When I read the problem the first thing that jumped to my mind was that it sounds awfully like an interview question. I kinda heard the whole conversation.

Interviewer: Iâ€™m interested how you write programs, Iâ€™ll fetch a keyboard, can you write some code for me?

Me: No.

I: Come again?

Me: You heard it right, I said no as in no way.

I (baffled): How come? Canâ€™t you write code?

Me: Sure I can, and I do a lot. When the circumstances are right. Not in ad hoc way. And especially not starting with a keyboard. It has a level requirement of like 4 and we did not even see the dungeon entrance. It is not even supposed to be in the room serving as distraction. But Iâ€™m happy to talk about you problem, figure out a solution. It has some chance to be used at some later point.

I: (Should I just kick this weirdo? I have nothing to do for another 5 minutes so letâ€™s roll with it for a while)

Ok, so suppose I need to complete reading a bunch of books listed on a website. I have a small library, but at a glance I miss plenty of those. I have an excel sheet with my inventory and want to get a shopping list for the missing items. Write me a C++ program that provides it.

Me: I think I understand the problem and believe it can be solved. Let me put it in formal terms and you verify that it matches your aim.

I: Ok.

Me: So, we have a set of books (I form a circle using my left thumb and index finger) that I already have, letâ€™s call it H. We have a set of books (showing similar circle with my right hand) the website requires, and so I want to have, letâ€™s call it W. So the intersection (I overlap the circles like an improvised Venn diagram) has the books that I already have. And the difference W-H has those I need to obtain. The goal is to have this latter.

I: Yeah, you got it right.

Me: Fine. That is definitely doable if we can extract the information. Can we? If so, what and in what format?

I: Yes. You can get a list of titles. You can choose the format within reason.

Me: Well, I can just export the Excel into CSV, and that I can even just paste right in the code. Can the website also give me a CSV too?

I: Yeah, that works.

Me: That settles the inputs and the processing, what do you want as the output?

I: Whatever goes, you can just dump the titles on the console one title per line.

Me: Anything else?

I: It would be nice to have it sorted.

Me: And you want a program in C++? I think I can just arrange it right in the Excel or using shell tools.

I: I agree itâ€™s somewhat arbitrary, but letâ€™s make C++ a hard requirement. Is that a problem? (Prepared for some really lame next excuse.)

Me: No, the client is entitled to have whatever requirements, my job is just to make sure he means them and that we have proper understanding. I certainly point out if it is possible but not feasible while better ways exist. For this case we can go ahead, as it doesnâ€™t look a big deal.

I (relieved): Iâ€™m glad to hear that.

Me: So, good news, C++ happens to have kinda native support for this case. So it can be done in just a handful lines of code that almost matches the formal description. We have a function std::set_difference. If we just feed it with our input lists, will provide the desired output. We can even directly make at as you described passing an ostream_iterator looking at cout and newline as the delimiter. It requires two sorted ranges with the content of sets W and H. We have luck with that too, as we have std::set that is auto sorted. So we just construct the set<Book> const using brace init with the text in the CSV files. And pass .begin() and .end(). Thatâ€™s it, weâ€™re done. By the way it happens to emit the output in its original order, that was sorted, so you have that part too.

I (tempted to ask to repeat the part after â€˜good newsâ€™ to catch up): Are we?

Me: Well, almost, we have one open point on Book. To work in the setup it needs three things (beyond being fit in a container): implicit construction from string literal, less operator creating an order that fits for us and the standard ordering, and usable with ostream. Neither is a big deal. For details we need additional design discussion. Say in the constructor it could invoke some web service passing the title and maybe get an ISBN number. Or do some transformations on the title like losing the punctuation, the international characters, make all lowercase, maybe remove all the whitespace. I think I would be unhappy to buy a second copy of a book just because I had an extra space somewhere. Or a different apostrophe from the wide selection. We could store the original title and the converted/gathered info, using that for the compare. Unfortunately, that could mess with the order of the final output, say, by ISBN instead of titleâ€¦ We can compensate for that collecting in a vector then sort that using a different criteria before copying to cout. Any specific requirements?

I (feeling overwhelmed): No, Iâ€™m fine with the simplest solution, just strcmp is fine on the title as provided. So would you write a Book class?

Me: In that case there is no reason, we can just use std::string as Book, it passes all our criteria out of the box.

I: Couldnâ€™t we just use the const char *s from the literals without extra steps?

Me: Itâ€™s possible too. Of the requirements stated earlier it passes all but the sorting one. For our desired semantics at least. But we can make up for that, both set and the algo can take a comparator as additional parameter. Easiest is to create a function like auto const cmp = [](char const* a, char const* b) { return strcmp(a, b) < 0;}; and feed it as the second parameter of the set and 6th parameter of the function. We even save some characters, as we can just say set W = {{"a", "b"}, cmp}; without template arguments and let it be deduced.

Me: I think we covered all the necessary ground. If youâ€™re still interested, you can bring the keyboard now and observe that my fingers are indeed pretty lame using an unfamiliar specimen in a foreign environment.

I: No, thanks, weâ€™re good. (Maybe this guy has a point there, all the other folks just jumped on the keyboard and the few who even provided something that works arrived at some arbitrary variant without alternatives considered or even a hint that the result will match the desires before a lot of time investedâ€¦ Next time Iâ€™ll at least wait till the keyboard is demanded.)

After the detour we covered the solution, so letâ€™s see what OP provided instead of that. In function to_buy (that shows poor naming that would not be a problem if we never had the function in the first placeâ€¦) we spot use of strcmp and sort that are good signs. We also spot use of new and delete that is a very bad sign: we found no reason whatsoever to deal with raw memory to start with, and if we had a reason wild delete is still on top of the forbidden list: allocated stuff should go to an owner like a unique_ptr. To add injury to the insult we use array new matched with plain delete, that is undefined behavior â€“ good enough reason for the observed crash. Even if we managed to pass the proper pointer, the one pointing to the allocation. In this case first, not ptr. Keep in mind to properly match your delete not using delete avoids this whole can of worms.

Now, what are we using the memory for? Looks weâ€™re collecting the titles for the output. Like as we had vector<char const*> reserved to size A, push_back the item under scrutiny, then pop_back if we found it in own. An obvious improvement would have been doing just that and avoiding the crash while making the code more obvious (while probably doing the exact same thing under the hood.) An obvious chance to further improvement would be to avoid the jojoing, and just collect the item when found worthy. Maybe OP tried first to issue a continue when strcmp came out with 0. Then realized that he meant to continue the outer loop, not the inner. Might even made some noises mentioning WG21 and evil killing all the proposals to have the labeled break and continue other languages haveâ€¦ Then resorted to use the bouncing instead of the prescribed good measure that sits in the language: label and goto. Or the more prescribed measures like replace the loop with the algorithm it tries to re-implement: that would be find_if in this case. What would also make the code more readable, meanwhile avoiding the continue of the scan after a hit for possible other instances of the title in the list, that are not there. Hopefully, as if they are we start popping other items from the collection, then hitting UB after passing first.

Suppose we fixed the mentioned issues, and get to sort with a proper state. Did I mention up front to look out for using strcmp? Well, we did at one place, did not with this second. Sort works fine, certainly, but it will create the order by the pointer values, not strings, so the output may be whatever for practical purposes. We need to pass in a sorting function here too. Could have avoided the whole thing by just using std::string as holder.

The rest is more like nitpicking, on using endl without a good reason, not including all the required headers, not using autoâ€¦ probably boring details considering the amount of real problems and missing the simple and effective solution.

Effective? one could ask as objection. Doesnâ€™t using string, vector, set and the other goodies waste plenty of memory and processing cycles? Stupid questionâ€¦ NOT! (TANSTAASQ!) Performance is certainly a concernâ€¦ WHEN it is. If we run this program and it reports out of memory, or crashes due to that, or takes hours to produce the output, then yes, weâ€™re wasting stuff that we might be able to conserve. And in that case it is worth to look closer. However if the output is on the screen before we release Enter all the way? Then we can just acknowledge that we have our goal completed. (Provided the answer is correct that is). If it took a fill 80 extra nanoseconds to run, so be it. Or used an extra 10k out of our 6G pool of free memory. (left from the 14G pool, by the 312 open tabs in our browser and 2 dozen gizmos in the alert area we know nothing about).

Anyway, letâ€™s give it some consideration. This solution has N² scaling over the NlogN using sort and set_difference. So we can expect the latter to consistently win as we get to big numbers on the list size. We know that one list has 100 members, the other is unknown, probably in the 50â€“3000 range. Well above the 8 that is a popular cut-off point.

As the memory footprint, set is fairly wasteful, if concerned weâ€™re better off using array and sort it (that could be better on time too.) std::string certainly has an overhead over the plain literal, and we only used it to avoid passing the compare function. On the execution level it is a poor tradeoff. But it has benefits on much harder to mess up and easier to read/review.

And as an extra point tied to performance, the whole consideration may be moot. Letâ€™s notice that our program is pretty shy on observable behavior. The only observable behavior is the output itself. So the compiler has freedom to just have the final output text in the image and dump that. On every variant (that we fixed to be UB-free and correct wrt. our goal). I would be surprised if none of the other entries have shown a solution that works consistently at compile time. It can be done. Sprinkling the code with constexpr and arranging it to be compile-time friendly may help today. But a good optimizer is allowed to go ahead without all that help. And some might even do it already.

James Holland

The CVu magazineâ€™s version of the program needs string.h included to provide a definition of strcmp(). The [accu-general] email version has string.h included.

There are two errors in the code that are causing undefined behaviour and probably causing the crash when run under Windows. Firstly, the wrong pointer is being used to delete the dynamically created array. The pointer first should be used, not ptr. Perhaps the mistake would not have happened if the first two lines of to_buy() had been written as follows.

  const char ** const first = new const char *[A];
  const char ** ptr = first;

In this way it makes it clearer that first is meant to be the custodial pointer and would, therefore, be used to delete the array. Also, by making first a const pointer prevents its value from being changed. Secondly, the non-array form of delete is used to destroy the array. The required form is delete[]. The code now compiles and runs. The problem of the output not being sorted remains.

The sort function is doing exactly what it is told. It is sorting the elements of the array pointed to by first according to the value of the elements. The elements of the array are already in ascending order as they point to strings that are at successively higher locations in memory. This can be demonstrated by sorting the pointers in descending order by adding the greater() predicate to std::sort. The list of books to purchase now appears in the opposite order. In either case, the array is not sorted in alphabetical order of their strings. The sort function needs modification.

As indicated above, the sort criteria is incorrect. Instead of sorting by the values of the pointers in the array, we need to sort according to the alphabetical order of the strings pointed to be the pointers. The sort function can be modified by adding a predicate in the form of a lambda expression.

  std::sort(first, ptr, [](const char * const a,
    const char * const b){
    return strcmp(a, b) < 0;
  });

The lambda makes use of strcmp to compare the two strings and returns true if the first string comes alphabetically before the second string. With this modification, the program produces the desired result.

There are a few ways in which the program can be improved. When it is discovered that a required book is already in the library, there is no need to continue searching the library for the same book. A break statement can, therefore, be added after decrementing the ptr pointer.

Although it does not make any difference when using built-in types, it is conventional to use the pre-decrement operator in preference to the post-decrement operator when decrementing a variable. With user-defined types, the pre-decrement operator is usually more efficient. In keeping with convention, the statement to decrement the ptr pointer should be --pre;. Also, there is no need to flush std::cout when printing the books to purchase. A simple '\n' will do instead of std::endl.

The parameters of to_buy() could be declares const (as well as to what they point). This will help, to some extent, make the code self documenting. The pointer first could also be declared const for the same reason. If we want to be thorough about this, the pointer q in the for-loop that prints the result could be declared as a pointer that points to a const pointer, i.e. const char * const * q. The pointer itself cannot be declared const as it will be incremented within the loop.

The names of the template parameters of to_buy() could be a little more descriptive. Perhaps A and B could be renamed to_read_length and own_length respectively, or something similar.

The supplied program has been written using very low-level constructs. There are pointers and even pointers to pointers. The student has chosen raw pointers to be responsible for deleting objects created on the heap. A better choice would be to use std::unique_ptr because the arrays would be automatically deleted when std::unique_ptr goes out of scope, even when an exception is thrown. Better still would be to use std::vector when relatively large arrays are required as they will delete themselves when they go out of scope. Low-level features may be required occasionally but they make the program difficult to write, understand and maintain.

Despite this, there are some interesting features to the program. One particularly intriguing feature is the use of a template to define the types of the parameters passed to to_buy(). Without a feature like this the array type would decay to a pointer to a char (in this case) and the size of the array would be lost. It is sometimes the case that â€˜clever featuresâ€™ are a sign that the program is too elaborate and complicated and that there is a simpler way to construct a program that performs the same function.

As mentioned above, the studentâ€™s software is written at the very low level. If it could be determined what the software is required to achieve in fundamental terms, then it might be possible to use some high-level constructs (preferably provided by a library) to write a program that is shorter and simpler to understand. After a little thought, it is realised that what is required is for the program to print the elements of one set that are not in another set. That is all! This is such a fundamental and useful idea that there must be existing code to do the job. Well, there is and itâ€™s in the C++ standard library by the name of set_difference(). This function takes two sets, calculates the difference (the elements that are in one set and not in the other) and then copies them, in sorted order, to an output stream. This is exactly what is required. My version of the program makes use of set_difference() and consists of just three C++ statements. By making appropriate use of an existing implementation of an algorithm, the program has become considerably simpler.

  #include <algorithm>
  #include <iostream>
  #include <iterator>
  #include <set>
  #include <string.h>
  int main()
  {
    const std::set<std::string> top_100 {
      //...
    };
    const std::set<std::string> my_library {
      //...
    };
    std::set_difference(
      top_100.cbegin(), top_100.cend(),
      my_library.cbegin(), my_library.cend(),
      std::ostream_iterator<std::string>(
        std::cout, "\n"));
  }

But what about the speed of execution? The studentâ€™s code is specially written to solve a specific problem, so surely a general purpose library function must be slower. The best way to find out is to perform some tests. I arranged for my_library and top_100 to have the same number, n, of random titles. I then added a further 50 randomly generated titles to top_100. I then ran the two programs and timed their execution. The results for various values of n are shown in Table 1.

Table 1: Program Performance Test Results
Number of titles (n)	Execution time / seconds		Performance factor
Number of titles (n)	Studentâ€™s program	Using set_difference	Performance factor
1	0.00029	0.000239	1.21
10	0.000249	0.000245	0.98
100	0.000358	0.000292	1.22
1000	0.00214	0.000203	10.55
10000	0.17435	0.000737	236.56
100000	35.194	0.01183	2975
1000000	> 1800	0.135	> 13333

After waiting half and hour for the studentâ€™s program to work its way through a million books, I gave up and put it out of its misery. For values of up to about 100 there is not much difference in the speed of execution between the studentâ€™s program and the one using set_difference as the execution time is dominated by the time taken to print the result. As the number of books increase, it can be seen that the program using set_difference gets significantly faster than the studentâ€™s code. The execution time of the studentâ€™s code increases exponentially with n whereas the set_difference program increases linearly with n. The way set_difference achieves this performance is by relying on the data sets being sorted. This is ensured because std::set stores its data (in this case, the film titles) in sorted order. By spending a little time sorting the data at the beginning of the program allows set_difference to have very good performance, especially with large data sets.

In summary, it is always worth thinking about what a program is required to do in abstract terms and then seeing whether there is an existing algorithm that can be used to perform the task. A great deal of thought by some clever people has gone into the design of library functions and it is often advantageous to make use of their labours. The advice is to familiarise oneself with the C++ standard library (and other libraries) to see what algorithms and functions are available and attempt to program at a higher level of abstraction.

Commentary

First, a comment about string.h which, as James noted, was in the version emailed to accu-general but not in that printed in the magazineâ€¦ This is because needing this header depends on which compiler/library combination you are using. In C++ itâ€™s unspecified which other things may be included when you use a standard library header and on one of the environments I tested the header was implicitly included. While there are some good reasons for this flexibility, it does make writing portable code is little harder (and even code that works with different versions of the same compiler). This thing to bear in mind is that you should list in your source the include files the code requires. So since the code in this critique uses strcmp it ought to include <cstring>.

This critique is an example of what Pal describes as â€˜a single-use tool to be thrown awayâ€™. The context can matter: in this case in particular, the list of 100 top books from the BBC Good Read is a given, so having it in source code for a program of this kind may be perfectly fine; however, I suspect the list of books in â€˜my libraryâ€™ ought to be held separately as, presumably, I am going to be purchasing some of the books that Iâ€™m missing and so my library will change! However, unless I have very deep pockets, my library is not going to be more than a few thousand volumes, so worrying too much about the performance of larger collections is likely to be out of scope in this particular case. (Of course, programs (and fragments of programs) can get re-used in very different environments, but I think it is the person re-using the code that needs to consider the performance issues since only they know the actual constraints in the new use case.)

As several people noted, the code is a C++ program but with a strong flavour of C. There seems no reason for the use of C features (such as raw pointers and use of strcmp) and I think it makes sense to try to provide the programmer with a more idiomatic C++ program. This includes making using other algorithms, such as set_difference. Looking at Jamesâ€™s solution, for example, just above this commentary, the program consists of two statements for initialisation and one to actually do all the work of the program. The result, in my mind, is a program that is obviously correct. To quote Tony Hoareâ€™s famous dictum:

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies.

We should encourage newcomers to C++ to use the algorithms! We have quite a few, and teaching people to know them, and how to recognise when they can be used, is very useful.

The winner of CC 122

All the critiques correctly identified that the sort order was broken because of sorting the pointer values of the string literals. Moving to a value-based type, such as std::string (or, in this case, std::string_view), provides an easy way to solve that problem.

It was good to see that Ovidiu, for one, reached for a tool (in this case address sanitizer) to help diagnose the crash. Tooling can be an easy win for finding problems like these (and even latent ones that are currently without symptoms.)

Francis noted the program has no protection against bad typing or spurious spaces and both Ovidiu and Hans noted that a duplicate title in one of the lists would cause problems, especially with the original code. Francisâ€™s separation of the get_title function as a place to add logic to do something cleverer with the title list was a good idea.

I liked Palâ€™s attempt to fill in a possible back-story to the critique, and thought this was a creative way to put across some of the points in his critique. I also liked his use of CTAD (class template argument deduction) for std::array - as he says, this removes the pain point of having to provide the array bound explicitly.

The measurements of the performance costs of the algorithms over a range of input sizes from James were very interesting: they show how much difference the right algorithm can make at scale.

Overall there were some good solutions (and very little left uncovered) so thanks all for your entriesâ€¦ and I am awarding the prize for this critique to Pal, by a short head.

Code Critique 123

(Submissions to scc@accu.org by June 1st)

A shorter critique than usual this time, and in C, to give some variety.

Please find a bug in my code based on finding whether a positive integer can be expressed as pow(a, b) where a and b are non negative integers and b>1. My code outputs for 1000 (10³), 0 where it should be 1. Please find the bugâ€¦

Thanks to Francis Glassborow for passing this example on to me, originally from alt.comp.lang.learn.c-c++. He also points out that this competition is now in its 21st year (he originated it). That means it has been running for longer than most software/developer magazines.

The code is in Listing 2 (pow.c) and Listing 3 (test.c). Can you help?

/**
  * @input A : Integer
  *  * @Output Integer
  */
int isPower(int A) {
  if(A==1) return 1;
  else
  {
    int i;
    if(A%2==0)
    {
      for(i=2; i<=sqrt(A); i=i+2)
      if(pow(i, (int)(log(A)/log(i)))==A)
      return 1;
    }
    else
    {
      for(i=3; i<=sqrt(A); i=i+2)
      if(pow(i, (int)(log(A)/log(i)))==A)
      return 1;
    }
    return 0;
  }
}

Listing 2

#include <stdio.h>
int main(void)
{
  int A = 1000;
  printf("isPower(1000) => %i\n", isPower(A));
  return 0;
}

Listing 3

You can also get the current problem from the accu-general mail list (next entry is posted around the last issueâ€™s deadline) or from the ACCU website (http://accu.org/index.php/journal). This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.

Roger Orr has been programming for over 20 years, most recently in C++ and Java for various investment banks in Canary Wharf and the City. He joined ACCU in 1999 and the BSI C++ panel in 2002.

Notes:

More fields may be available via dynamicdata ..