Journal Articles
Browse in : |
All
> Journals
> CVu
> 311
(6)
All > Journal Columns > Code Critique (70) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Code Critique Competition 116
Author: Bob Schmidt
Date: 02 March 2019 15:47:00 +00:00 or Sat, 02 March 2019 15:47:00 +00:00
Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.
Body:
Please note that participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.
Note: if you would rather not have your critique visible online please inform me. (Email addresses are not publicly visible.)
Last issue’s code
I am trying to write a generic ‘field’ that holds a typed value, optionally with a ‘dimension’ such as ‘kg’. One use case is to be able to hold attributes of various different types in a standard collection. I want to be able to compare two attributes to see if they’re the same (and maybe later to sort them?) – two simple fields are only comparable if they are the same underlying type and value and two ‘dimensioned’ fields need to be using the same dimension too. I’ve put something together, but I’m not quite sure why in my little test program I get true for comparing house number and height:
Does weight == height? false Does weight == house number? false Does house number == height? true Re-reading attributes Unchanged Reading new attributes some values were updated
Can you help with the problem reported, and point out some potential flaws in the direction being taken and/or alternative ways to approach this problem?
The code is in Listing 1 (field.h) and Listing 2 (field test.cpp).
#pragma once #include <memory> #include <string> // Base class of the 'field' hierarchy class field { protected: virtual bool equality(field const &rhs) const = 0; public: // convert any field value to a string virtual std::string to_string() const = 0; // try to compare any pair of fields friend bool operator==(field const &lhs, field const &rhs) { return lhs.equality(rhs); } }; // Holds a polymorphic field value struct field_holder : std::unique_ptr<field> { using std::unique_ptr<field>::unique_ptr; bool operator==(field_holder const &rhs) const { return **this == *rhs; } }; // Implementation in terms of an underlying // representation template <typename Rep> class field_t : public field { public: field_t(Rep const &t) : value_(t) {} std::string to_string() const override { return std::to_string(value_); } protected: bool equality(field const &f) const override { // compare iff of the same type auto p = dynamic_cast<field_t const *>(&f); if (p) return value_ == p->value_; else return false; } private: Rep value_; }; // Add a dimension #include <typeinfo> template <typename dimension, typename Rep = int> class typed_field : public field_t<Rep> { public: using field_t<Rep>::field_t; std::string to_string() const override { return field_t<Rep>::to_string() + dimension::name(); } protected: bool equality(field const &f) const override { // compare iff the _same_ dimension return typeid(*this) == typeid(f) && field_t<Rep>::equality(f); } }; |
Listing 1 |
#include "field.h" #include <iostream> #include <map> // Example use struct tag_kg{ static const char *name() { return "kg"; } }; struct tag_m{ static const char *name() { return "m"; } }; using kg = typed_field<tag_kg>; using metre = typed_field<tag_m>; using attribute_map = std::map<std::string, field_holder>; // Dummy method for this test program attribute_map read_attributes(int value) { attribute_map result; result["weight"] = std::make_unique<kg>(value); result["height"] = std::make_unique<metre>(value); result["house number"] = std::make_unique<field_t<int>>(value); return result; } int main() { auto old_attributes = read_attributes(130); // height and weight aren't comparable std::cout << "Does weight == height? " << std::boolalpha << (old_attributes["weight"] == old_attributes["height"]) << '\n'; // weight and house number aren't comparable std::cout << "Does weight == house number? " << std::boolalpha << (old_attributes["weight"] == old_attributes["house number"]) << '\n'; // house number and height aren't comparable std::cout << "Does house number == height? " << std::boolalpha << (old_attributes["house number"] == old_attributes["height"]) << '\n'; std::cout << "Re-reading attributes\n"; auto new_attributes = read_attributes(130); if (old_attributes == new_attributes) { std::cout << "Unchanged\n"; } else { // update ... std::cout << "Some values were updated\n"; } std::cout << "Reading new attributes\n"; new_attributes = read_attributes(140); if (old_attributes == new_attributes) { std::cout << "Unchanged\n"; } else { // update ... std::cout << "some values were updated\n"; } } |
Listing 2 |
Critique
Balog Pál
“I am trying to write a generic ‘field’...†Oh, my. A picture forms in my head, that OP shows this to his deskmate for some pre-review advice. And he gets something like:
Seriously? You want to run it through Pál? Well, bring a jetpack, as he’ll throw you outta window. Also I see you went ahead and wrote code, didn’t you hear enough times to ask for a design review on ideas before you coding, especially for a new class hierarchy? Please just wait a moment while I fetch some more audience and a bucket of popcorn.
And right he is: ‘properties’, ‘attributes’ and ‘generic fields’ are for me like muleta to a bull. Probably because in my working experience with writing applications I mostly saw it creating mess and more mess. While it stays a popular demand, and is ‘backed up’ by claims that ‘other popular languages’ have those with native support therefore it must be good. I wish the world worked that way.
Certainly exiling people and asking questions later is not my thing, I more like flood them with questions until they maybe decide to exile themselves... Unfortunately this format is not interactive, and building a strawman just to burn it down is not really fun. Or preaching. So instead I’ll try to create some conversation that will supposedly reveal my general concerns.
First let’s take a detour for that missed design review. OP came to me with the idea before writing the code, but already though up those classes and templates. Obviously the discussion starts with the motivation, the use cases, the real problem we want to solve (really too bad not a word on that is present in the entry), but let’s stash that for a moment, and jump to technicals. My opening suggestion would be like:
So you need a ‘tuple’ of { type, value, dimension }. Where type could be a string
or enum
, dimension a string
, empty covering the no-dimension case. value is more tricky but you can start with string
. Or int
. Later a variant of candidates. What else we need? standard collection: check. Equality? if it’s an actual std::tuple
, isn’t the factory version just do what you need? And maybe even just {value, dimension } is enough, you’re somewhat moot on the ‘generic’ part, IME just the type of storage unit is not enough, your dimension may cover that information .
I look up and notice the expression on OP’s face. It radiates:
WHAT? You can’t be serious! I imagined up a full framework, gadgets, templates, a hierarchy. Using a tuple or a simple struct instead of all those almost makes me obsolete. I definitely can’t brag about that to my girlfriend… A puny std::pair
and full stop? No, this just can’t be right. I need to shred this blasphemy^Widea to pieces right away. This just can’t possibly be a solution, not even a candidate for initial brainstorming.
Before going on, I suggest the reader take a step back and actually consider this. Yes, it has a plenty of shortcomings both for OO theory and some imagined use practice, but does it solve what was revealed as a problem? Would it pass the test cases? How much opportunity it leaves for bugs in that area? I honestly think it would work okay-ish, so it leaves us balancing its potential problems in other parts of the use case against the UB (sorry for the spoiler) and broken ==
in the presented alternative let alone its complexity. And it also open for improvements. Back to the conversation:
OP: That’s not exactly what I had in mind. How about making the field an interface and create subclasses.
Me: Sure, that can be done, but looks plenty of work off the bat just for the hierarchy alone. And you looked for a standard collection for it, I think we still didn’t include polymorphic ones. I have some in my support lib, but that does not qualify as solution. You need to rack up some benefits to offset the costs.
OP: I think we’re wasting a lot of memory. In my way the dimension is not stored, but part of the class behavior. So that’s an extra string per instance. And if we use variant that will be heavier than just one actual type. And while at it, the user could just assign a different dimension or switch the variant to a different type unlike with subclasses.
Me: The second part is granted for the most primitive implementation. Looks trivial to cover if we use an actual class with those members private and function to change the value, while keep the dimension immutable after construction. Still way less complexity than a subclass.
Me: If dimension is a COW string, const char *
or enum
, that is not much storage. The variant depends on the data types you will need to disclose at some point, till then let’s assume string, long long
and double
is enough for all practical uses. that is like a dozen extra bytes over a plain int
. While in the alternative we have a vtable pointer and a heap block. I’d call it a tie at best. But let’s pretend we have a few hundreds of bytes waste. Is that really a significant factor?
OP: Of course it is. Say my users have a million instances, that would amount to hundreds of MBs.
Me: Hmm, we really can’t postpone the intended use really. What I pictured is not compatible with this claim. So in what part of the program you want those generic fields and for what?
OP: Why, everywhere! Isn’t that cool and generic? A Field<int>
is more sophisticated than a plain int
, so why not replace our plain int
s? Also we read a database in the program, it looks natural to fetch the record into such fields. (Here OP notices a murmur from the audience and some looks at the window…)
Me: (Despite other rumors I just do not launch nukes before going through the 5 whys, so must let down the audience’s expectations... let’s dig a little deeper.) Was that really the original idea behind this?
OP: Well, not really, I just wanted an easy way to write user interface. Kinda like in the Visual Studio ‘Properties’, that is just a 2-column listview with editable value. I could use that instead of creating a custom dialog for many of our classes. Also much easier to maintain if we change the data. I can just map data members to those fields and the UI could edit the values independently of the client, that can then get it back and consolidate. But is we have this component created why not use it for many other things in our model? (A few people head for the door.)
Me: Okay, I appreciate the enthusiasm but let’s focus on the real task. Sleep on this other part and we can come back to it at a later time unless you happen to change your mind. So how many of those fields you intend to have in the interface at any one time?
OP: Maybe a dozen. But... suppose I also want to present a ‘browse’ for database tables like I saw in the museum there was in dBase clones? If the table has a million records, we’re back to the waste.
Me: The browse only shows a few dozen lines that fit the screen vertically. If for that you read up all the million-to-unlimited amount of records without consideration, field or native, I’m sure someone will call that idea FU^H^Hsuboptimal.
OP: I concede the waste. But how about extensibility? For my solution the user can create fields for new types. All independently of everyone else. Especially not touching a central component. Do you want everyone just add extra elements to the variant if we go with that?
Me: See, that is a good valid point. With no stock answer. Say we did not include float
. And 15 client classes would want to use that. Is it better to all of them creating their own or just one adding it to central. A really problematic case could be if a client has its specific type that is not available on the kernel level (or where the central facility lives.) If that is expected subclasses have an edge. However alternatives still exist, à la ‘strategy’ pattern. I think just adding a std::function
into the variant could solve that problem. But I’m not sure I would happy with that. For a sensible verdict we need real data, real information to work, generics and speculation will NOT result is good software. You need to lay out the actual aimed use cases, and we will focus on that, selecting the solution that fits that case.
OP: But what if we create a solution for the presented specific problem and next day someone adds a class that does not fit? I think we should think ahead and work a future-proof solution.
Me: Over my dead body. ☺ No, Sir, while I’m in charge here that is just not happening. But feel free to apply for my position. And put your own ass in the fire explaining that we missed several deadlines struggling to solve for some made-up problems from an imagined future instead of what is needed by real clients here and now. And carry the burden of all the excess complexity. My good record of deploying working software in the field is mostly tied to strictly following KISS. Also having seen a lot of cases of the speculative programming. Thrown away as soon as actual change requests arrived, that almost always had some nuance not expected. I admit it may be confirmation bias, but enough cases happened.
And after this detour that possibly could prevented the whole accident, let’s look at the written code and figure out the problems. I’m sure some other entries will use the compiler’s help, I keep just reading the code and hope to understand it. Some conclusions may be inaccurate, if so I expect Roger to insert a warning.
A fast look through the code makes me say it is interesting and looks much better than many first swings at making and collecting a polymorphic type. Especially old times before smart pointers. But there are some fishy points. Before anything else, the elephant in the room:
We have abstract class field and store its subclasses with unique_ptr
. A sound idea, that would be one of my first picks is working from scratch. But the class lacks a virtual dtor. It’s a major goof on OP’s part, almost on the level of accidental deletion of the related line after the last compilation and before submission. The guideline of starting a class hierarchy with virtual dtor is almost as old as C++ itself, and I believe most compilers warn about the deviation for maybe a decade. This should not be missed or dismissed. There are some special cases when you can get away without but those are better left for pure trivia. (And the only acceptable motivation to not have virtual dtor at the root is when it has no virtual functions either...) Worth mentioning, that if shared_ptr
was used here, it would actually work even with this situation, as it can capture the correct deleter at the creation if the proper incantation is used. But unique_ptr<field>
will always issue plain delete on field*
static type, that in our program always have a different dynamic type and no virtual dtor, that is UB. Until the fix we should not treat any output or other observation seriously. The fix is to either add it with =default
directly or follow my practice to inherit from an empty base class VirtualDtor
that has just that and the other spec functions defaulted – because that defaulted dtor still counts as user-declared and kills our move ctor and assignment. And if we declare those with default that kills the copy ctor and assignment. Not nice. As field is empty at the moment it may look redundant but I rather consistently replaced all my dtors with that workaround and sleep well.
After steering back to defined behavior, the tests still fail. But I suggest another small detour to improve that test first. We’re testing a function that returns bool
. So it’s got a 50% chance of looking right with a broken implementation. Even if we forgot the proxy implementation that just returned false
no matter what. As a matter of fact, having that the actually present tests would just pass with flying colors. Oh, Oh. I’d add a function that at least covers what it can, taking (a, b, expected) looks that a == a
and b == b
are true
and a == b
and b == a
result expected
. My estimate is that the last line would produce different results with the arguments switched. Then we could use some more cases with a different base type.
Now finally let’s look at those op ==
s. We see two of those, a nonmember-friend in field and a member in holder. I’m somewhat baffled on this inconsistency, and ask OP to explain. Beyond that the implementation looks correct. In field it defers to virtual function. Important thing that we have no op ==
defined for subclasses that could probably ruin the show. In holder defers to the held type. With C++20 we could add a precondition contract to be nonempty and expecting the user to comply. Some designers would object that and instead require check for null
and report ==
accordingly. With the latter splitting to two camps, those returning false
for null
on either side and those returning true
if both null
s. All three variants make sense for certain philosophy, that I leave at that till our examples have no empty holders. But OP must make a choice and document it.
Following up in equality()
. That is pure in field (good), and defined for field_t
and typed_field
. With more curious discrepancy: using dynamic_cast
and typeid
. Again OP to explain why the different approach while verbally seeking the same semantics.
Vanilla uses dynamic_cast
on the rhs to its own type. if it succeeds, then comparing the value otherwise going false
for incompatibility. I say that matches the aim.
For dimensioned ,I’d expect the same approach really. I see no good reason to drag in typeinfo
, and it is a thing I consider a code smell in its own right. I had to look up how it is supposed to work, turns out it can be invoked for the reference and fetches the record for the most derived type. and ==
is supposed to work too. Is this different from the dynamic_cast
version? As long as we have just the presented classes I think not. But typed_field
is not marked final
, and if we subclassed that we could observe a difference. What is probably a bad thing, say I subclass to have a more decorated version of to_string()
and nothing else, I’d expect the same values for ==
, but would get false
instead. But for the current state I say this is correct too.
So where we are? We fixed the UB, have correct implementations of ==, and yet the test fails. What did we leave unchecked? I guess the wimps just single-step in the debugger till reach some unexpected state. I just have to make predictions: for the 3rd line we eventually get to field_t::equality
with lhs being vanilla and rhs being a metre. dynamic_cast
gives us its base class field_t<int>
and finds the same value, returning the true
we see. (With lhs and rhs switched we’d be in derived
and looking at different typeid
s and a false
result...)
So, are we ready to declare the verdict that dynamic_cast
sucks and switch to typeid
here too? I hope not, before thinking again, especially as we passed this function as correct recently. Indeed, dynamic_cast
is only guilty in working by its specs instead of our wishes. Just as those pesky computers keep following the instructions (unless we drive to UB-land that is). For the case dynamic_cast
works fine for derived-to-public-base-class. And that is the relation we defined when we said
class typed_field : public field_t<Rep>
In other words, we said that typed_field
IS-A field
. That it stands fine in any place vanilla would be fine. That is a clear contradiction with the equality-test where the substitution (semantically, not technically) slices the dimension and puts us on the incorrect true path. typed_field
is IS-A vanilla field for value-holding purpose but NOT for equality.
This is very similar to the basic OO problem of ‘is Rectangle a Square or is Square a Rectangle or neither in OO design’ that surprised may novices (probably quite close to all!). OO and hierarchy design is riddled with these kind of traps. The simple-looking ancient guideline ‘make sure all your public inheritance means is-a’ is not trivial to follow.
How to fix the situation? Well, I rather leave it to the OP, after all he still has the jetpack unused, and next time he can add this lesson to the for-hierarchy arguments and how much easier are those to handle for the clients. They may be. Or may open one more subtle opportunity to naively making a subclass and breaking some part of is-a-ness. (Or the other way around, as mentioned at the typeid
-using version).
OP mentioned he will want sort too eventually. That will need op <
, may turn out even trickier. Before someone gets the wrong conclusion that I’m pushing for the flat version and imply it is a piece of cake just like that: it’s not the case. But it definitely dodges lots of bullets. And once we really know what we are chasing, and is still (!) on the table – in a contest on less bugs initially and over lifetime my bets would go there. And if your experience is different, consider to make the world a favor by teaching good and working OO and hierarchy-design practices. (Also please write an article or just PS reaction...)
I could nit-pick on the rest of the code, but will not do so, as I believe these are the really important lessons and better not water them down.
Commentary
This issue’s critique was somewhat long: while the questions raised by the code did seem to be interesting I think that my inability to shrink the code very much resulted in readers feeling a bit overwhelmed. (This issue’s critique is much shorter...)
Pál does a good job of covering some of the problems with the tricky question of equality and class hierarchies. The problem is that it is easy to break the (mathematical) definition of equality.
The first property is substitution: if a == b then f(a) == f(b). This maps nicely into Liskov Substituability Principle ("LSP") and the IS-A principle.
The next set of properties are the reflexive, symmetric and transitive properties. These are:
a == a
if a == b then b == a
if a == b and b == c then a == c
These are much harder to manage in a class hierarchy, as this example demonstrates.
Implementations usually adhere to the reflexive property. This critique fails with the symmetric property:
If a is a field_t
and b is a typed_field
then
a == b
calls the equality method in field_t
that checks that b IS-A field_t
(and then compares the values) whereas
b == a
checks that the types of a and b are the same.
The safest change is to make the implementations consistently verify their arguments have the same type: but this, as Pál noted, means that substitutability is broken since you now cannot substitute a typed_field
object for a field_t
one. This is a case where the question must be pushed back to the original writer of the code as they need to decide which is the design best matching their use cases. Alternatively, the use of operator==
and the method equality
should be changed to something without the same implicitly assumed set of properties.
The test code in the code example is quite minimal: the first set of tests checks various objects all of which have the same value (130) – but does not do an exhaustive set of tests – and subsequent tests simply test equality and inequality of a set of objects.
It’s hard to write good test cases; thought need to be given to what possible failure and success modes there might be and therefore what test data should be used. For example, in the case of an operator==
we should ensure some tests are written for each of the four main properties listed earlier in the commentary.
The winner of CC 115
There was only one critique this time, but I liked Pál’s approach in concentrating on the higher level issues so I am happy to award him the prize for this issue’s critique.
Code Critique 166
(Submissions to scc@accu.org by Apr 1st)
I’m trying to extract the count of letters in a file and while the program itself seems to work OK, I’m not getting the debugging output I expected. I was expecting to see a debug line of each letter being checked during the final loop, but I only get some of the letters shown.
For example, using sample.txt that consists of “This is a some sample letter input†I get:
letter sample.txt Seen T Seen h Seen n Seen o Seen r Seen u Commonest letter: e (4) Rarest letter: T (1)
Why are only some of the letters being shown as Seen
?
Can you solve the problem – and then give some further advice?
Listing 3 contains logging.h and Listing 4 is letter.cpp.
#include <iostream> namespace { class null_stream_buf : public std::streambuf { } nullbuf; std::ostream null(&nullbuf); } #ifdef NDEBUG #define WRITE_IF(X) if (false) null #else #define WRITE_IF(X) ((X == true) ? std::cout : null) #endif |
Listing 3 |
#include "logging.h" #include <fstream> #include <map> // get the most and least popular chars // in a file int main(int argc, char **argv) { WRITE_IF(argc < 2) << "No arguments provided"; if (argc > 1) { std::ifstream ifs(argv[1]); std::map<char, int> letters; char ch; while (ifs >> ch) ++letters[ch]; std::pair<char, int> most = *letters.begin(); std::pair<char, int> least = *letters.begin(); for (auto pair : letters) { WRITE_IF(pair.second) << "Seen " << pair.first << std::endl; if (pair.second > most.second) { most = pair; } else if (pair.second < least.second) { least = pair; } } std::cout << "Commonest letter: " << most.first << " (" << most.second << ")\n"; if (most != least) std::cout << "Rarest letter: " << least.first << " (" << least.second << ")\n"; } } |
Listing 4 |
You can also get the current problem from the accu-general mail list (next entry is posted around the last issue’s deadline) or from the ACCU website (http://accu.org/index.php/journal).
This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.
Roger has been programming for over 20 years, most recently in C++ and Java for various investment banks in Canary Wharf and the City. He joined ACCU in 1999 and the BSI C++ panel in 2002.
Notes:
More fields may be available via dynamicdata ..