Title: Code Critique Competition 120

Author: Bob Schmidt

Date: 02 November 2019 18:07:28 +00:00 or Sat, 02 November 2019 18:07:28 +00:00

Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.

Body:

Please note that participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.

Note: if you would rather not have your critique visible online please inform me. (Email addresses are not publicly visible.)

Last issueâ€™s code

Iâ€™m relatively new to C++ and Iâ€™m trying to build up a simple hierarchy of different entities, where their name and id and a few more characteristics are held in an Entity base class; and specific subclasses, such as Widget in the code below, hold the more specialised data â€“ in this case simulated by the member name data. Iâ€™ve been having problems with the program Iâ€™m writing crashing and so Iâ€™ve simplified it down to a short example which usually produces output like this:

     Test
     none
     Another widget
     <core dump>

Please can you suggest what might be causing the core dump? Iâ€™ve had to use -fpermissive with gcc (but not msvc) â€“ but surely thatâ€™s not causing the problem?

Listing 1 contains Entity.h, Listing 2 is Widget.h and Listing 3 is example.cpp. As always, there are a number of other things you might want to draw to the authorâ€™s attention as well as the presenting problem.

#pragma once
class Entity
{

private: 
  char* pName{ NULL }; 
  int idNumber{ 0 };

public:
  void Setup(char const* name)
  {
    pName = strdup(name);
  }
  virtual ~Entity()
  {
    delete pName;
  }
  // Delete the current object using the dtor,
  // and re-create as a blank object
  void Reset()
  {
    this->~Entity();
    new(this) Entity();
  }
  char *Name() 
  {
    return pName ? pName : "none";
  }
};

Listing 1

#pragma once
class Widget : public Entity
{
  int* data;
public:
  Widget() { data = new int[10]; }
  ~Widget() { delete data; }
};

Listing 2

#include <iostream>
#include <memory>
#include <string.h>

#include "Entity.h"
#include "Widget.h"

int main()
{
  Widget w;
  w.Setup("Test");
  std::cout << w.Name() << '\n';
  w.Reset();
  std::cout << w.Name() << '\n';
  w.Setup("Another widget");
  std::cout << w.Name() << '\n';
}

Listing 3

Critiques

Martin Janzen

Overly permissive?

The author is right about one thing: the error being masked by gccâ€™s -fpermissive flag is not the cause of the core dump. The compiler is unhappy with the Entity::Name() method, which can return a non-const pointer to a string literal. Changing the return type to const char* (or char const*, to be trendy) removes the warning and makes the code safer, as it prevents the caller from attempting to overwrite pNameâ€™s memory â€“ or the literal! â€“ through the returned char*.

The Name() method itself ought to be const too, as itâ€™s clearly not intended to allow modifications to the Entity.

Placement new?!

Instead, the core dump is caused by the bizarre behaviour of Entity::Reset(), which invokes the objectâ€™s dtor manually, then uses placement new to initialize a new Entity, for reasons known only to the author.

The problem is, Entity is meant to be used as a base class. Since its dtor is virtual, the call to this->~Entity() correctly causes the derived classâ€™s dtor to be called. But Reset() then has no way to re-create the derived class; it can initialize only the Entity slice of the object.

In this example, Widget has allocated an array pointed to by data, which is destroyed by the explicit dtor call. But only the Entity ctor is called, not the Widget ctor; so the value of data is not reset. Therefore, when Widget goes out of scope in main(), data is deleted a second time, resulting in the core dump.

(This is easy to diagnose by running the code under valgrind or, in gcc or clang, by enabling ASAN with the -fsanitize=address flag. Either tool will show the stack trace at the time of each delete call, making it clear what happened.)

To fix this, Entity::Reset() shouldnâ€™t try to get so fancy: it should just clear its data members. Also, it should be made virtual so that when the object is to be reset to its initial state, derived classes get a chance to reset their own data members too.

Or, we could follow Herb Sutterâ€™s â€˜Non-Virtual Interfaceâ€™ idiom, keeping the public Reset() method non-virtual in the base class, but calling a private virtual ResetImpl() which the derived classes can override.

(Some programmers will try to invent other clever alternatives, using CRTP and the like. That seems like overkill, and an invitation for trouble, especially given that the author is â€œrelatively new to C++â€.)

C vs. modern C++ style

The real question, though, is why all of this allocation and deletion is happening in the first place. As weâ€™ve seen, itâ€™s error-prone in even a simple example, and completely unnecessary in this enlightened age.

Most importantly, the Entity::pName pointer should be replaced with a std::string. This single change:

manages the memory for the name string safely and automatically;
eliminates the memory leak which now occurs if Setup() is called twice;
permits the dtor to be replaced by a simple =default; and,
if the name is short enough for the small string optimization to be used, may even make the code run faster, with no allocations at all.

(Thereâ€™s a whole discussion that could be had about how best to handle sink arguments, as for Setup(). If this method were modified to take a std::string argument rather than a char const*, the argument could be moved into Entity::name rather than copied, possibly allowing the compiler to do even more optimization. But for now, letâ€™s assume that the interface is to be changed as little as possible.)

In the derived class, the Widget::data array has a fixed size, so it can be replaced by a C-style array or std::array member rather than allocating it on the heap. This allows both the ctor and dtor to be replaced with =default. Also, removing the unnecessary indirection makes the code just a bit more cache-friendly.

Basic hygiene

The example.cpp test program shouldnâ€™t have to know that it needs to include <memory> and <string.h> in order to use Entity and Widget. Each classâ€™s header should #include all of its dependencies, so that a would-be user doesnâ€™t have to worry about the finer points of the implementation (which, as weâ€™ve seen, can and should change).

Since Entity appears to be intended as an abstract base class but has no pure virtual methods, its ctor could be given protected access so that it can be called only from derived classes; no standalone Entity instances can be created.

Speaking of access, Entity::idNumber is initialized to 0, but it has private access and is never touched in the Entity base class. It needs to be generated somehow by Entity, or given protected access, or simply removed; itâ€™s of no use to anyone right now. Similarly, Widget::data is also private but not used. But presumably this code is simply an early experiment, with more methods to be added later.

I wonâ€™t waste any time on coding style except to note that the capitalized method names are a bit unusual and archaic-looking. As weâ€™re trying to minimize changes to the interface, they can be left as is.

Learning the craft

Finally, I think itâ€™d be worthwhile to offer a couple of bits of general advice to the author.

The use of strdup(), etc. suggests a C programming background, and/or some 1990s C++. It would be worth spending some time with one or two more recent books, such as A Tour of C++ (the 2nd edition) and Effective Modern C++; also, at the C++ Core Guidelines, and at Bjarneâ€™s homepage at www.stroustrup.com, which has a couple of papers on the subject of how to write good modern C++.
Most C++ compilers today are very, very good. As a rule, itâ€™s best to never ignore their warnings unless you are absolutely sure you understand the reasons both for the warning and for why it should be ignored in a specific situation. There is hardly ever a good excuse for this; itâ€™s good practice to strive for zero warnings.
The -fpermissive flag, in particular, is a flagrant admission that your code is broken. It should never be needed when writing new code.
Learn to use tools like valgrind and ASAN: they can save you from hours or days of frustration, or much worse.
Finally, given how rapidly and substantially the C++ language is changing, do make sure to remain current by taking advantage of the endless resources available today: conferences, books, videos, web sites, courses, podcasts â€“ and of course ACCU publications!

[ED: Full source code was supplied, but I elided it to save space.]

Hans Vredeveld

Before we look into the core dump, letâ€™s look into the need for -fpermissive. With -fpermissive, you tell the compiler to ignore a group of errors and just compile it. Hopefully it will not bite you at runtime. In this case, -fpermissive is needed because the function Entity::Name can return the constant none (type char const*), while it advertises to return a (modifiable) char*. Just make it return a (non-modifiable) char const* and this problem is solved. Also, the function itself can be made const, giving the signature:

  char const* Name() const

On to the core dump. Running the program on my Linux system with the environment variable MALLOC_CHECK set to 3 gives some information about what is going wrong:

  *** Error in `./cc119': free(): invalid pointer:
  0x00000000014e5c20 ***
  Aborted (core dumped)

Apparently, we try to free some memory that canâ€™t be freed. For further investigation, letâ€™s build the program with address sanitizer (-fsanitizer=address in gcc/clang) and run it. The first thing address sanitizer complains about is a mismatch between operator new[] and operator delete. Indeed, in Widgetâ€™s constructor, we new[] an array, and in the destructor we delete a single object. Changing delete to delete[] in the destructor solves this issue.

Building the program again with address sanitizer and running it, shows that address sanitizer is not done with us. Now it complains that we used malloc to allocate memory and delete to deallocate it. The malloc is hidden in strdup in Entity::Setup. We can fix this by changing delete pName; in Entityâ€™s destructor to free pName;. (Even better: change the type of pName from char* to std::string. Also rename it to name in that case.)

Still, we havenâ€™t found the reason for the core dump. Build it a third time with address sanitizer and run it. Now address sanitizer complains that we are attempting a double-free. In other words, we try to free something that we freed before. To understand why, we have to look at what happens when we call w.Reset() on a Widget w. Before we call w.Reset() and after we constructed w, w.data is pointing to some allocated memory. Now, when we call w.Reset(), we first destruct w (via a call to the virtual destructor of Entity). The memory pointed to by w.data is deallocated, but the pointer itself is not changed. Next in Reset(), a new Entity object is constructed by the placement new. This placement new knows nothing about inheritance and does not construct a Widget. The memory previously occupied by w.data is left untouched and when interpreting the new object as a Widget, it contains the same w.data that points at unallocated memory. When w finally goes out of scope, we are going to destruct it as if it is a Widget, resulting in trying to delete the unallocated memory that w.data is pointing to.

We could solve this problem by making Reset a virtual function in Entity and implementing it in Widget, but it will be hard to create something that is robust and easy to understand and maintain. Before we continue, letâ€™s have a look at what the code really means.

In main() we begin with declaring and default constructing a Widget w. This w does not have a name, and conveys the meaning of no widget. It is just a placeholder for a widget. Next we call w.Setup("Test"), which converts w into â€˜Testâ€™ widget. This is the moment that we actually construct a Widget. Following this, we call w.Reset(). This destructs the widget w and changes it again into an empty placeholder. With the call to w.Setup("Another widget") we construct â€˜Anotherâ€™ widget at w. Finally, when w goes out of scope, we destroy the widget for the last time.

This analysis of the code makes clear that Widget has two functions:

It is a container of size at most one for Widgets.
It is a widget, but not a specific one. It represents widgets of different types.

Note that I talk about widgets here, but that everything I talk about is not implemented by the class Widget, but by Entity.

We can improve the code by separating the two functions. The container function can be implemented easily with an std::unique_ptr<Entity>. Now Entity no longer needs the functions Setup and Reset, but it will need a constructor:

  class Entity
  {
  private:
    std::string name;
    int idNumber{ 0 };
  public:
    explicit Entity(std::string name_) :
      name{name_}
    { }
    virtual ~Entity() = default;
    std::string const& Name() const
    {
      return name;
    }
  };

Instead of a single class Widget that has to represent both a test widget and another widget, I would create two classes TestWidget and AnotherWidget. In this test program they will have similar implementations. For TestWidget this is (also changing its data member, to simplify its use):

  class TestWidget : public Entity
  {
    int data[10];
  public:
    TestWidget() : Entity("Test")
    { }
  };

Some attention must be given to copying and moving Entitys and Widgets. This is left as an exercise for the reader.

We can now rewrite main() as:

  int main()
  {
    std::unique_ptr<Entity> w =
      std::make_unique<TestWidget>();
    std::cout << w->Name() << '\n';
    w.reset();
    //Error: std::cout << w->Name() << '\n';
    w = std::make_unique<AnotherWidget>();
    std::cout << w->Name() << '\n';
  }

Finally, we have to clean up the includes. Widget.h has to include "Entity.h", Entity.h has to include <string>, and example.cpp has to include "Entity.h", "Widget.h", <memory> (which was not needed in the original code) and <iostream>. The include for <string.h> is not needed any more (originally, it was needed in Entity.h, not in example.cpp).

Marcel MarrÃ© and Jan Eisenhauer

While the reason for gccâ€™s complaint that -fpermissive must be used is indeed not the cause for the core dump, any programmer is generally ill-advised to ignore warnings. In this case, Entity::Name() returns a non-const char*, but in case of a nullptr in pname, a pointer to the literal "none" is returned, which is by definition const. It is also good style to use const for getters, so the user knows an objectâ€™s state is not changed by calling it. There is no need to change the body of the method, but the signature is best changed to char const *Name() const.

Let us now look at the reason for the core dump. With Widget w; in main(), an object of class Widget is created and the compiler knows that it needs to call its destructor ~Widget when it goes out of scope. In Entity::Reset() the widget destructor is called for this object, because we have a virtual destructor and invoke it through a pointer. However, only an Entity object is created in the Widgetâ€™s place. When w goes out of scope at the end of main(), the compiler does not use the virtual destructor, however, because (it thinks) it knows the precise type of w, since it was created on the stack. (Polymorphism in C++ works only with references and pointers, not with concrete objects.) Hence, ~Widget is used on an object whose type is Entity, and in particular Widget::data is deleted twice, likely causing the core dump. (Incidentally, the code did not core dump on our machine, showing how insidious bugs due to undefined behaviour can be.)

There is a multitude of ways to fix this problem. With the code given it is obviously difficult to see why the author felt the need to re-use objects. If the main fear is that heap allocations and deallocations are slow, from C++17 onwards other memory allocators exist that can be used. Another possibility that eschews heap allocations is virtual Reset() and every class that introduces additional members overriding it with its own and calling their ancestorâ€™s Reset(). This can be error prone (both due to forgetting to implement Reset() for a child class requiring it and due to forgetting to call the immediate base classâ€™s Reset() and the author should make sure the effort is worth the payoff. Depending on the use case, there are other approaches that can be used, such as an Entity-Component-System (see https://en.wikipedia.org/wiki/Entity_component_system), that do not use inheritance and thus do not present the same problem.

There are a few further points on the code.

Both Entity and Widget have compiler-generated copy constructor and copy assignment, but when one makes a copy, a shallow copy is created, meaning that only the pointers are copied, but not the content pointed to (which would be a deep copy). Someone expecting to be able to make a copy of an object and then be able to change one without affecting the other will find that this works for idNumber, but not for pName or data. Additionally, the latter membersâ€™ destructors would be called twice. Note also that copying can be dangerous in the case of a hierarchy, because assigning, e.g., a Widget object to an Entity object will lead to object slicing (see https://en.wikipedia.org/wiki/Object_slicing).

It is good style to avoid raw pointers to benefit from automatically working copy- and move-members. Hence, use std::string instead of char* for pName and std::array<int, 10> for data. See CppCoreGuideline R.1 (Manage resources automatically using resource handles and RAII), R.5 (Prefer scoped objects; donâ€™t heap-allocate unnecessarily) and R.11 (Avoid calling new and delete explicitly).

It is a good idea for headers to include standard and other own headers they require; in the authorâ€™s code neither does Widget.h include the base class header, nor does Entity.h include the string header it needs, although the main code file does not itself require it directly.

As a matter of self-documenting code, change the signature of the subclass destructor to ~Widget() override.

Another minor problem is that the string literal "none" is not technically an invalid name for an Entity. With std::string, an empty string could indicate an unnamed Entity. Another method is to make the possibility of a missing value explicit by using std::optional (since C++17: boost::optional was available before).

James Holland

My first impressions are that the code seems to be a mixture of C and C++. Perhaps the student comes from a C background and has not yet learnt the C++ way of doing things. I shall say more about this later but for now letâ€™s persevere with the code presented.

When compiling the code on my system I get one error, namely that an object of type const char * cannot be used to initialise a return object of type char * in function Name(). This is because a string literal is considered const and allowing the conversion to non-const would enable the string literal to be modified which would not make sense; hence the error message. Perhaps the simplest way to overcome this problem is to make the return type of Name() into const char *. The code now compiles without error. There are still problems, however.

I notice the student is using a function named strdup(). After a little research I discover this function has recently been added to the C language. According to cppreference, the function duplicates the string pointed to by its parameter and returns a pointer that must be freed using the function free(). An inspection of the studentâ€™s code reveals that delete is being used. This will result in undefined behaviour.

There is a similar problem with the destructor of Widget. Widgetâ€™s constructor creates an array of 10 ints on the heap using new[]. The destructor, therefore, needs to call delete[], not delete. It should also be noted that the variable idNumber is not used in the sample code.

Running the revised code using Clang Address Sanitiser results in a message stating that data is being deleted twice. This would result in undefined behaviour. A quick and simple way of resolving this problem is to assign data the value nullptr after deleting data. The program now produces the required output. Unfortunately, the design of the program leaves a lot to be desired.

Some minor improvements could be made to the software. There is a superfluous set of braces placed on the end of the placement new statement. It is better to assign pName the value nullptr rather than NULL.

More seriously, the Reset() function creates a new version of Entity in the same memory location as the previous incarnation of Entity but not before calling Entityâ€™s destructor. This will result in undefined behaviour as the new object resides in unallocated memory and may be overwritten at any moment by some other part of the system. Perhaps the direct call to the destructor of Entity could be removed but this results in memory leaks. To cure the memory leak problem pName needs to be freed. This may well result in code that works perfectly but things are getting very complicated and difficult to understand.

The student states that he or she is relatively new to C++ and this shows in the construction of the code. In many ways, C++ is quite a simple language once certain rules and idioms have been studied and understood. The best way to develop a C++ program is to take advantage of the facilities the programming language provides. Advantage can be taken of constructors, for example, to that objects can be created and initialised in a single statement. This will make functions such as Setup() unnecessary. Text strings should employ std::string instead of creating and using strings based on pointers to chars. The main feature of std::string is that they look after themselves. They will allocate storage on the heap, perform any initialisation and relinquish heap memory when they go out of scope. They have many more features that make them simple and safe to use. Try to use a higher level of abstraction.

The student appears to be attempting to reuse memory when a new Widget objects are required. In such cases a new object should be created. The old object should be left to be destroyed when it goes out of scope. There is rarely a need to destroy objects explicitly. It is important not to be too clever when writing software. The simplest approach is usually the best.

Finally, I provide a simple program that operates in a similar way to the studentâ€™s but uses higher level features that are provided by the C++ standard.

  #include <array>
  #include <iostream>
  #include <string>
  class Entity
  {
    public:
    Entity(const char * new_name)
    : entity_name(new_name){}
    std::string entity_name;
    int ID_number{0};
  };
  class Widget : public Entity
  {
    public:
    Widget(const char * widget_name)
    : Entity(widget_name){}
    void reset()
    {
      entity_name.clear();
      ID_number = 0;
    }
    void setup(const char * new_name)
    {
      entity_name = new_name;
    }
    std::array<int, 10> data;
    std::string name() const
    {
      return entity_name.empty() ? "none" :
        entity_name;
    }
  };
  int main()
  {
    Widget w("Test");
    std::cout << w.name() << '\n';
    w.reset();
    std::cout << w.name() << '\n';
    w.setup("Another widget");
    std::cout << w.name() << '\n';
  }

Commentary

The fundamental issue with this code, as all the critiques explained, is the explicit invocation of the destructor in this->~Entity();. The comment for the Reset() method explains that the intent is to replace the current object with a blank one.

The trouble is that the static types of the object deleted and the new object created do not match. The destructor is virtual and so the call this->~Entity(); actually invokes Widget::~Widget() and so destroys the Widget object. When the variable w goes out of scope at the end of main, the Widget destructor is called but the actual type of the object being destroyed is Entity and so the behaviour is undefined.

One â€˜solutionâ€™ would be to just destroy the Entity part of the object, but while this can be done (this->Entity::~Entity();) it doesnâ€™t help achieve the goal of the Reset() method. The question is whether Reset() is intended to be used to create another object or simply to re-initialise the existing one. At this point youâ€™d probably need to talk to the original author to find out what their intended use cases are.

As James noted, this â€˜two-phaseâ€™ construction via a constructor and a Setup() method not idiomatic C++. It would usually be better to provide the name at construction and create a complete object.

As Hans pointed out, you can argue that we appear to have a single class (Entity) with two different responsibilities â€“ it is both a container for an optional object (hence the Name is "none" when the entity is not yet populated by a call to Setup()) and a base class for a class hierarchy

The winner of CC 119

All the critiques identified the root cause of the presenting problem â€“ the core dump â€“ and also gave an explanation of the compiler warning.

There were various suggestions for how to improve the design (as well as more detailed suggestions on places where the code was more C than C++)

Hans and James both pointed out the memory allocation/deallocation mismatches (malloc + new[] being incorrectly paired with delete) and every critique included suggestions to more use std::string and an array or std::array to remove the explicit memory management completely.

Several submissions also recommended using an address sanitizer or a memory checker as ways to help the person to solve their own problem. There are a wide variety of tools now available for this sort of assistance, and I suspect many newer programmers will need some guidance on how to pick the correct one for a given problem. Martinâ€™s recommendations under â€˜Learning the craftâ€™ would be a good place to start!

I found this a hard critique to judge since there were so many strong entries â€“ thanks to all for your work in providing our readers with food for thought. However, in my opinion Martin provided the best entry for this critique by a short head.

Code Critique 120

(Submissions to scc@accu.org by December 1st)

Iâ€™m doing some simple text processing to extract sentences from a text document and produce each one on a separate line. Iâ€™m getting a space at the start of the lines and wonder whatâ€™s the best way to change the regexes to remove it?"

    $ cat input.txt
    This is the first sentence. And the
    second. This is
    the
    last
    one.
    
    $ sentences input.txt
    This is the first sentence.
     And the second.
     This is the last one.

Listing 4 contains the code. As always, there are a number of other things you might want to draw to the authorâ€™s attention as well as the presenting problem.

#include <fstream>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>
#include <regex>
using namespace std;

int main(int argc, char** argv)
{
  std::ifstream ifs(argv[1]);
  std::stringstream ss;
  char ch;
  while (ifs >> std::noskipws >> ch)
    ss << ch;

  string s = ss.str();

  smatch m;

  while (regex_search(s, m,
    regex("[\\s\\S]*?\\.")))
  {
    std::string sentence(m[0]);
    std::regex whitepace("[\\s]+");
    std::regex_replace(
      std::ostream_iterator<char>(std::cout),
      sentence.begin(), sentence.end(),
      whitepace, " ");
    std::cout << std::endl;
    s = m.suffix().str();
  }
}

Listing 4

You can also get the current problem from the accu-general mail list (next entry is posted around the last issueâ€™s deadline) or from the ACCU website (https://accu.org/index.php/journal). This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.

Roger Orr Roger has been programming for over 20 years, most recently in C++ and Java for various investment banks in Canary Wharf and the City. He joined ACCU in 1999 and the BSI C++ panel in 2002.

Notes:

More fields may be available via dynamicdata ..

Journal Articles