Journal Articles
Browse in : |
All
> Journals
> CVu
> 315
(7)
All > Journal Columns > Code Critique (70) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Code Critique Competition 120
Author: Bob Schmidt
Date: 02 November 2019 18:07:28 +00:00 or Sat, 02 November 2019 18:07:28 +00:00
Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.
Body:
Please note that participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.
Note: if you would rather not have your critique visible online please inform me. (Email addresses are not publicly visible.)
Last issue’s code
I’m relatively new to C++ and I’m trying to build up a simple hierarchy of different entities, where their name and id and a few more characteristics are held in an Entity
base class; and specific subclasses, such as Widget
in the code below, hold the more specialised data – in this case simulated by the member name data
. I’ve been having problems with the program I’m writing crashing and so I’ve simplified it down to a short example which usually produces output like this:
Test none Another widget <core dump>
Please can you suggest what might be causing the core dump? I’ve had to use -fpermissive
with gcc (but not msvc) – but surely that’s not causing the problem?
Listing 1 contains Entity.h, Listing 2 is Widget.h and Listing 3 is example.cpp. As always, there are a number of other things you might want to draw to the author’s attention as well as the presenting problem.
#pragma once class Entity { private: char* pName{ NULL }; int idNumber{ 0 }; public: void Setup(char const* name) { pName = strdup(name); } virtual ~Entity() { delete pName; } // Delete the current object using the dtor, // and re-create as a blank object void Reset() { this->~Entity(); new(this) Entity(); } char *Name() { return pName ? pName : "none"; } }; |
Listing 1 |
#pragma once class Widget : public Entity { int* data; public: Widget() { data = new int[10]; } ~Widget() { delete data; } }; |
Listing 2 |
#include <iostream> #include <memory> #include <string.h> #include "Entity.h" #include "Widget.h" int main() { Widget w; w.Setup("Test"); std::cout << w.Name() << '\n'; w.Reset(); std::cout << w.Name() << '\n'; w.Setup("Another widget"); std::cout << w.Name() << '\n'; } |
Listing 3 |
Critiques
Martin Janzen
Overly permissive?
The author is right about one thing: the error being masked by gcc’s -fpermissive
flag is not the cause of the core dump. The compiler is unhappy with the Entity::Name()
method, which can return a non-const pointer to a string literal. Changing the return type to const char*
(or char const*
, to be trendy) removes the warning and makes the code safer, as it prevents the caller from attempting to overwrite pName
’s memory – or the literal! – through the returned char*
.
The Name()
method itself ought to be const too, as it’s clearly not intended to allow modifications to the Entity.
Placement new?!
Instead, the core dump is caused by the bizarre behaviour of Entity::Reset()
, which invokes the object’s dtor manually, then uses placement new
to initialize a new Entity
, for reasons known only to the author.
The problem is, Entity
is meant to be used as a base class. Since its dtor is virtual, the call to this->~Entity()
correctly causes the derived class’s dtor to be called. But Reset()
then has no way to re-create the derived class; it can initialize only the Entity
slice of the object.
In this example, Widget
has allocated an array pointed to by data
, which is destroyed by the explicit dtor call. But only the Entity
ctor is called, not the Widget
ctor; so the value of data
is not reset. Therefore, when Widget
goes out of scope in main()
, data
is deleted a second time, resulting in the core dump.
(This is easy to diagnose by running the code under valgrind or, in gcc or clang, by enabling ASAN with the -fsanitize=address
flag. Either tool will show the stack trace at the time of each delete call, making it clear what happened.)
To fix this, Entity::Reset()
shouldn’t try to get so fancy: it should just clear its data members. Also, it should be made virtual
so that when the object is to be reset to its initial state, derived classes get a chance to reset their own data members too.
Or, we could follow Herb Sutter’s ‘Non-Virtual Interface’ idiom, keeping the public Reset()
method non-virtual in the base class, but calling a private virtual ResetImpl()
which the derived classes can override.
(Some programmers will try to invent other clever alternatives, using CRTP and the like. That seems like overkill, and an invitation for trouble, especially given that the author is “relatively new to C++â€.)
C vs. modern C++ style
The real question, though, is why all of this allocation and deletion is happening in the first place. As we’ve seen, it’s error-prone in even a simple example, and completely unnecessary in this enlightened age.
Most importantly, the Entity::pName
pointer should be replaced with a std::string
. This single change:
- manages the memory for the name string safely and automatically;
- eliminates the memory leak which now occurs if
Setup()
is called twice; - permits the dtor to be replaced by a simple
=default
; and, - if the name is short enough for the small string optimization to be used, may even make the code run faster, with no allocations at all.
(There’s a whole discussion that could be had about how best to handle sink arguments, as for Setup()
. If this method were modified to take a std::string
argument rather than a char const*
, the argument could be moved into Entity::name
rather than copied, possibly allowing the compiler to do even more optimization. But for now, let’s assume that the interface is to be changed as little as possible.)
In the derived class, the Widget::data
array has a fixed size, so it can be replaced by a C-style array or std::array
member rather than allocating it on the heap. This allows both the ctor and dtor to be replaced with =default
. Also, removing the unnecessary indirection makes the code just a bit more cache-friendly.
Basic hygiene
The example.cpp test program shouldn’t have to know that it needs to include <memory>
and <string.h>
in order to use Entity
and Widget
. Each class’s header should #include
all of its dependencies, so that a would-be user doesn’t have to worry about the finer points of the implementation (which, as we’ve seen, can and should change).
Since Entity
appears to be intended as an abstract base class but has no pure virtual methods, its ctor could be given protected access so that it can be called only from derived classes; no standalone Entity
instances can be created.
Speaking of access, Entity::idNumber
is initialized to 0, but it has private access and is never touched in the Entity
base class. It needs to be generated somehow by Entity
, or given protected access, or simply removed; it’s of no use to anyone right now. Similarly, Widget::data
is also private but not used. But presumably this code is simply an early experiment, with more methods to be added later.
I won’t waste any time on coding style except to note that the capitalized method names are a bit unusual and archaic-looking. As we’re trying to minimize changes to the interface, they can be left as is.
Learning the craft
Finally, I think it’d be worthwhile to offer a couple of bits of general advice to the author.
- The use of
strdup()
, etc. suggests a C programming background, and/or some 1990s C++. It would be worth spending some time with one or two more recent books, such as A Tour of C++ (the 2nd edition) and Effective Modern C++; also, at the C++ Core Guidelines, and at Bjarne’s homepage at www.stroustrup.com, which has a couple of papers on the subject of how to write good modern C++. - Most C++ compilers today are very, very good. As a rule, it’s best to never ignore their warnings unless you are absolutely sure you understand the reasons both for the warning and for why it should be ignored in a specific situation. There is hardly ever a good excuse for this; it’s good practice to strive for zero warnings.
- The
-fpermissive
flag, in particular, is a flagrant admission that your code is broken. It should never be needed when writing new code. - Learn to use tools like valgrind and ASAN: they can save you from hours or days of frustration, or much worse.
- Finally, given how rapidly and substantially the C++ language is changing, do make sure to remain current by taking advantage of the endless resources available today: conferences, books, videos, web sites, courses, podcasts – and of course ACCU publications!
[ED: Full source code was supplied, but I elided it to save space.]
Hans Vredeveld
Before we look into the core dump, let’s look into the need for -fpermissive
. With -fpermissive
, you tell the compiler to ignore a group of errors and just compile it. Hopefully it will not bite you at runtime. In this case, -fpermissive
is needed because the function Entity::Name
can return the constant none
(type char const*
), while it advertises to return a (modifiable) char*
. Just make it return a (non-modifiable) char const*
and this problem is solved. Also, the function itself can be made const
, giving the signature:
char const* Name() const
On to the core dump. Running the program on my Linux system with the environment variable MALLOC_CHECK
set to 3 gives some information about what is going wrong:
*** Error in `./cc119': free(): invalid pointer: 0x00000000014e5c20 *** Aborted (core dumped)
Apparently, we try to free
some memory that can’t be free
d. For further investigation, let’s build the program with address sanitizer (-fsanitizer=address
in gcc/clang) and run it. The first thing address sanitizer complains about is a mismatch between operator new[]
and operator delete
. Indeed, in Widget
’s constructor, we new[]
an array, and in the destructor we delete
a single object. Changing delete
to delete[]
in the destructor solves this issue.
Building the program again with address sanitizer and running it, shows that address sanitizer is not done with us. Now it complains that we used malloc
to allocate memory and delete
to deallocate it. The malloc
is hidden in strdup
in Entity::Setup
. We can fix this by changing delete pName;
in Entity
’s destructor to free pName;
. (Even better: change the type of pName
from char*
to std::string
. Also rename it to name
in that case.)
Still, we haven’t found the reason for the core dump. Build it a third time with address sanitizer and run it. Now address sanitizer complains that we are attempting a double-free. In other words, we try to free
something that we free
d before. To understand why, we have to look at what happens when we call w.Reset()
on a Widget w
. Before we call w.Reset()
and after we constructed w
, w.data
is pointing to some allocated memory. Now, when we call w.Reset()
, we first destruct w
(via a call to the virtual destructor of Entity
). The memory pointed to by w.data
is deallocated, but the pointer itself is not changed. Next in Reset()
, a new Entity
object is constructed by the placement new
. This placement new
knows nothing about inheritance and does not construct a Widget
. The memory previously occupied by w.data
is left untouched and when interpreting the new object as a Widget
, it contains the same w.data
that points at unallocated memory. When w
finally goes out of scope, we are going to destruct it as if it is a Widget
, resulting in trying to delete the unallocated memory that w.data
is pointing to.
We could solve this problem by making Reset
a virtual function in Entity
and implementing it in Widget
, but it will be hard to create something that is robust and easy to understand and maintain. Before we continue, let’s have a look at what the code really means.
In main()
we begin with declaring and default constructing a Widget w
. This w
does not have a name, and conveys the meaning of no widget. It is just a placeholder for a widget. Next we call w.Setup("Test")
, which converts w
into ‘Test’ widget. This is the moment that we actually construct a Widget
. Following this, we call w.Reset()
. This destructs the widget w
and changes it again into an empty placeholder. With the call to w.Setup("Another widget")
we construct ‘Another’ widget at w
. Finally, when w
goes out of scope, we destroy the widget for the last time.
This analysis of the code makes clear that Widget
has two functions:
- It is a container of size at most one for
Widget
s. - It is a widget, but not a specific one. It represents widgets of different types.
Note that I talk about widgets here, but that everything I talk about is not implemented by the class Widget
, but by Entity
.
We can improve the code by separating the two functions. The container function can be implemented easily with an std::unique_ptr<Entity>
. Now Entity
no longer needs the functions Setup
and Reset
, but it will need a constructor:
class Entity { private: std::string name; int idNumber{ 0 }; public: explicit Entity(std::string name_) : name{name_} { } virtual ~Entity() = default; std::string const& Name() const { return name; } };
Instead of a single class Widget
that has to represent both a test widget and another widget, I would create two classes TestWidget
and AnotherWidget
. In this test program they will have similar implementations. For TestWidget
this is (also changing its data member, to simplify its use):
class TestWidget : public Entity { int data[10]; public: TestWidget() : Entity("Test") { } };
Some attention must be given to copying and moving Entity
s and Widget
s. This is left as an exercise for the reader.
We can now rewrite main()
as:
int main() { std::unique_ptr<Entity> w = std::make_unique<TestWidget>(); std::cout << w->Name() << '\n'; w.reset(); //Error: std::cout << w->Name() << '\n'; w = std::make_unique<AnotherWidget>(); std::cout << w->Name() << '\n'; }
Finally, we have to clean up the includes. Widget.h has to include "Entity.h"
, Entity.h has to include <string>
, and example.cpp has to include "Entity.h"
, "Widget.h"
, <memory>
(which was not needed in the original code) and <iostream>
. The include for <string.h>
is not needed any more (originally, it was needed in Entity.h, not in example.cpp).
Marcel Marré and Jan Eisenhauer
While the reason for gcc’s complaint that -fpermissive
must be used is indeed not the cause for the core dump, any programmer is generally ill-advised to ignore warnings. In this case, Entity::Name()
returns a non-const char*
, but in case of a nullptr
in pname
, a pointer to the literal "none"
is returned, which is by definition const
. It is also good style to use const
for getters, so the user knows an object’s state is not changed by calling it. There is no need to change the body of the method, but the signature is best changed to char const *Name() const
.
Let us now look at the reason for the core dump. With Widget w;
in main()
, an object of class Widget
is created and the compiler knows that it needs to call its destructor ~Widget
when it goes out of scope. In Entity::Reset()
the widget destructor is called for this object, because we have a virtual destructor and invoke it through a pointer. However, only an Entity
object is created in the Widget
’s place. When w
goes out of scope at the end of main()
, the compiler does not use the virtual destructor, however, because (it thinks) it knows the precise type of w
, since it was created on the stack. (Polymorphism in C++ works only with references and pointers, not with concrete objects.) Hence, ~Widget
is used on an object whose type is Entity
, and in particular Widget::data
is deleted twice, likely causing the core dump. (Incidentally, the code did not core dump on our machine, showing how insidious bugs due to undefined behaviour can be.)
There is a multitude of ways to fix this problem. With the code given it is obviously difficult to see why the author felt the need to re-use objects. If the main fear is that heap allocations and deallocations are slow, from C++17 onwards other memory allocators exist that can be used. Another possibility that eschews heap allocations is virtual Reset()
and every class that introduces additional members overriding it with its own and calling their ancestor’s Reset()
. This can be error prone (both due to forgetting to implement Reset()
for a child class requiring it and due to forgetting to call the immediate base class’s Reset()
and the author should make sure the effort is worth the payoff. Depending on the use case, there are other approaches that can be used, such as an Entity-Component-System (see https://en.wikipedia.org/wiki/Entity_component_system), that do not use inheritance and thus do not present the same problem.
There are a few further points on the code.
Both Entity
and Widget
have compiler-generated copy constructor and copy assignment, but when one makes a copy, a shallow copy is created, meaning that only the pointers are copied, but not the content pointed to (which would be a deep copy). Someone expecting to be able to make a copy of an object and then be able to change one without affecting the other will find that this works for idNumber
, but not for pName
or data
. Additionally, the latter members’ destructors would be called twice. Note also that copying can be dangerous in the case of a hierarchy, because assigning, e.g., a Widget
object to an Entity
object will lead to object slicing (see https://en.wikipedia.org/wiki/Object_slicing).
It is good style to avoid raw pointers to benefit from automatically working copy- and move-members. Hence, use std::string
instead of char*
for pName
and std::array<int, 10>
for data
. See CppCoreGuideline R.1 (Manage resources automatically using resource handles and RAII), R.5 (Prefer scoped objects; don’t heap-allocate unnecessarily) and R.11 (Avoid calling new and delete explicitly).
It is a good idea for headers to include standard and other own headers they require; in the author’s code neither does Widget.h include the base class header, nor does Entity.h include the string header it needs, although the main code file does not itself require it directly.
As a matter of self-documenting code, change the signature of the subclass destructor to ~Widget() override
.
Another minor problem is that the string literal "none"
is not technically an invalid name for an Entity
. With std::string
, an empty string could indicate an unnamed Entity
. Another method is to make the possibility of a missing value explicit by using std::optional
(since C++17: boost::optional
was available before).
James Holland
My first impressions are that the code seems to be a mixture of C and C++. Perhaps the student comes from a C background and has not yet learnt the C++ way of doing things. I shall say more about this later but for now let’s persevere with the code presented.
When compiling the code on my system I get one error, namely that an object of type const char *
cannot be used to initialise a return object of type char *
in function Name()
. This is because a string literal is considered const
and allowing the conversion to non-const would enable the string literal to be modified which would not make sense; hence the error message. Perhaps the simplest way to overcome this problem is to make the return type of Name()
into const char *
. The code now compiles without error. There are still problems, however.
I notice the student is using a function named strdup()
. After a little research I discover this function has recently been added to the C language. According to cppreference, the function duplicates the string pointed to by its parameter and returns a pointer that must be free
d using the function free()
. An inspection of the student’s code reveals that delete
is being used. This will result in undefined behaviour.
There is a similar problem with the destructor of Widget
. Widget
’s constructor creates an array of 10 int
s on the heap using new[]
. The destructor, therefore, needs to call delete[]
, not delete
. It should also be noted that the variable idNumber
is not used in the sample code.
Running the revised code using Clang Address Sanitiser results in a message stating that data is being deleted twice. This would result in undefined behaviour. A quick and simple way of resolving this problem is to assign data the value nullptr
after deleting data
. The program now produces the required output. Unfortunately, the design of the program leaves a lot to be desired.
Some minor improvements could be made to the software. There is a superfluous set of braces placed on the end of the placement new
statement. It is better to assign pName
the value nullptr
rather than NULL
.
More seriously, the Reset()
function creates a new version of Entity
in the same memory location as the previous incarnation of Entity
but not before calling Entity
’s destructor. This will result in undefined behaviour as the new object resides in unallocated memory and may be overwritten at any moment by some other part of the system. Perhaps the direct call to the destructor of Entity
could be removed but this results in memory leaks. To cure the memory leak problem pName
needs to be freed. This may well result in code that works perfectly but things are getting very complicated and difficult to understand.
The student states that he or she is relatively new to C++ and this shows in the construction of the code. In many ways, C++ is quite a simple language once certain rules and idioms have been studied and understood. The best way to develop a C++ program is to take advantage of the facilities the programming language provides. Advantage can be taken of constructors, for example, to that objects can be created and initialised in a single statement. This will make functions such as Setup()
unnecessary. Text strings should employ std::string
instead of creating and using strings based on pointers to chars. The main feature of std::string
is that they look after themselves. They will allocate storage on the heap, perform any initialisation and relinquish heap memory when they go out of scope. They have many more features that make them simple and safe to use. Try to use a higher level of abstraction.
The student appears to be attempting to reuse memory when a new Widget
objects are required. In such cases a new object should be created. The old object should be left to be destroyed when it goes out of scope. There is rarely a need to destroy objects explicitly. It is important not to be too clever when writing software. The simplest approach is usually the best.
Finally, I provide a simple program that operates in a similar way to the student’s but uses higher level features that are provided by the C++ standard.
#include <array> #include <iostream> #include <string> class Entity { public: Entity(const char * new_name) : entity_name(new_name){} std::string entity_name; int ID_number{0}; }; class Widget : public Entity { public: Widget(const char * widget_name) : Entity(widget_name){} void reset() { entity_name.clear(); ID_number = 0; } void setup(const char * new_name) { entity_name = new_name; } std::array<int, 10> data; std::string name() const { return entity_name.empty() ? "none" : entity_name; } }; int main() { Widget w("Test"); std::cout << w.name() << '\n'; w.reset(); std::cout << w.name() << '\n'; w.setup("Another widget"); std::cout << w.name() << '\n'; }
Commentary
The fundamental issue with this code, as all the critiques explained, is the explicit invocation of the destructor in this->~Entity();
. The comment for the Reset()
method explains that the intent is to replace the current object with a blank one.
The trouble is that the static types of the object deleted and the new object created do not match. The destructor is virtual
and so the call this->~Entity();
actually invokes Widget::~Widget()
and so destroys the Widget
object. When the variable w
goes out of scope at the end of main
, the Widget
destructor is called but the actual type of the object being destroyed is Entity
and so the behaviour is undefined.
One ‘solution’ would be to just destroy the Entity
part of the object, but while this can be done (this->Entity::~Entity();
) it doesn’t help achieve the goal of the Reset()
method. The question is whether Reset()
is intended to be used to create another object or simply to re-initialise the existing one. At this point you’d probably need to talk to the original author to find out what their intended use cases are.
As James noted, this ‘two-phase’ construction via a constructor and a Setup()
method not idiomatic C++. It would usually be better to provide the name at construction and create a complete object.
As Hans pointed out, you can argue that we appear to have a single class (Entity
) with two different responsibilities – it is both a container for an optional object (hence the Name
is "none"
when the entity is not yet populated by a call to Setup()
) and a base class for a class hierarchy
The winner of CC 119
All the critiques identified the root cause of the presenting problem – the core dump – and also gave an explanation of the compiler warning.
There were various suggestions for how to improve the design (as well as more detailed suggestions on places where the code was more C than C++)
Hans and James both pointed out the memory allocation/deallocation mismatches (malloc
+ new[]
being incorrectly paired with delete
) and every critique included suggestions to more use std::string
and an array or std::array
to remove the explicit memory management completely.
Several submissions also recommended using an address sanitizer or a memory checker as ways to help the person to solve their own problem. There are a wide variety of tools now available for this sort of assistance, and I suspect many newer programmers will need some guidance on how to pick the correct one for a given problem. Martin’s recommendations under ‘Learning the craft’ would be a good place to start!
I found this a hard critique to judge since there were so many strong entries – thanks to all for your work in providing our readers with food for thought. However, in my opinion Martin provided the best entry for this critique by a short head.
Code Critique 120
(Submissions to scc@accu.org by December 1st)
I’m doing some simple text processing to extract sentences from a text document and produce each one on a separate line. I’m getting a space at the start of the lines and wonder what’s the best way to change the regexes to remove it?"
$ cat input.txt This is the first sentence. And the second. This is the last one. $ sentences input.txt This is the first sentence. And the second. This is the last one.
Listing 4 contains the code. As always, there are a number of other things you might want to draw to the author’s attention as well as the presenting problem.
#include <fstream> #include <iostream> #include <iterator> #include <sstream> #include <string> #include <regex> using namespace std; int main(int argc, char** argv) { std::ifstream ifs(argv[1]); std::stringstream ss; char ch; while (ifs >> std::noskipws >> ch) ss << ch; string s = ss.str(); smatch m; while (regex_search(s, m, regex("[\\s\\S]*?\\."))) { std::string sentence(m[0]); std::regex whitepace("[\\s]+"); std::regex_replace( std::ostream_iterator<char>(std::cout), sentence.begin(), sentence.end(), whitepace, " "); std::cout << std::endl; s = m.suffix().str(); } } |
Listing 4 |
You can also get the current problem from the accu-general mail list (next entry is posted around the last issue’s deadline) or from the ACCU website (https://accu.org/index.php/journal). This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.
Roger has been programming for over 20 years, most recently in C++ and Java for various investment banks in Canary Wharf and the City. He joined ACCU in 1999 and the BSI C++ panel in 2002.
Notes:
More fields may be available via dynamicdata ..