Title: Code Critique Competition 115

Author: Bob Schmidt

Date: 03 January 2019 16:05:04 +00:00 or Thu, 03 January 2019 16:05:04 +00:00

Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.

Body:

Please note that participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.

Note: if you would rather not have your critique visible online please inform me. (Email addresses are not publicly visible.)

Last issueâ€™s code

I am trying to write a simple test for some C++17 code, but I am getting some unexpected output. I have simplified the code quite a bit, and in the resultant code below I was expecting to see the output "A,B" but on the latest version of both gcc and MSVC I get just "AB". Where has my comma gone?! Even more oddly, if I use an older compiler I get different output: "65,66", which gives me back the comma but loses the letters. Has C++17 broken something?

The code is in Listing 1.

#include <iostream>
#include <string>
#include <vector>
template <typename T>
using coll = std::vector<T>;
// Start the collection
template <typename T>
void start(coll<T> &up)
{
  up.clear();
}
// End the collection
template <typename T>
void end(coll<T> &up)
{
  up.push_back({});
}
// Extract the non-zero data as a string
template <typename T>
std::string data(coll<T> const &up)
{
  std::string result;
  for (auto v : up)
  {
    if (v)
    {
      result += std::to_string(v);
      result += ",";
    }
  }
  result.pop_back();
  return result;
}
void test_char()
{
  auto i2 = coll<char>();
  start(i2);
  i2.push_back('A');
  i2.push_back('B');
  end(i2);
  // actual test code elided
  std::cout << data(i2) << '\n';
}
int main()
{
  test_char();
}

Listing 1

Critiques

Balog Pal

The code looks simple enough so I try a pure reading, no compiler solution. Letâ€™s scan the code first:

coll is defined as an alias for vector<> for no visible reason, but that may be part of simplification
a function called â€˜endâ€™ is looking for trouble, clashing with begin() and end(). Also the name makes no sense for the semantics, might be finish()
inside end, a more optimal form of insertion would be emplace_back()
another nonsense function name: â€˜dataâ€™
implementation iterates by value instead of auto const&, that is okay with the char type, but suboptimal for a general T
T must support if(v) â€“ this works for char but might be bad, no information was provided on the intended T for the code
I have to look up std::to_string on cppreference: it appears we can use it for numeric types and will produce string-ized version of the value. Like "65" for input of 'A'.
if empty v is meant as a sentinel, Iâ€™d expect a break on else
pop_back will be bugged if up was empty (but not executed this way by the test)
the test starts with an auto X = ...; form that seems to be gaining popularity lately, Iâ€™m still undecided if it is good or bad. The traditional form is certainly a direct init: coll<char> i2;

And nothing else interesting, so I would expect "65,66" as result. And the description states it is "AB"? Hmm, that looks like we asked for i2.data(). Maybe we added another free function data() to the company of begin/end, size and friends? A look in cppreference shows that has been the case since C++17.

I guess telling that to the OP would result in the question: how come that is relevant to me, I never said std::data, not even used using namespace std. And we are in global scope. Shouldnâ€™t ::data be called?

Well, for some interpretations of â€˜shouldâ€™, it should be alright. It seems we have stumbled on another case of computers work on commands instead of wishes. At first I thought of just passing over the issue stating that name lookup and overload resolution rules are not for ordinary humans to explain, but this particular case is not THAT complicated, and even I could figure it out without extra reading (of course ONLY after the fact/accident ☺) The explanation is deliberately simplified for this case. Anyone who wants the full run-down can read it at https://en.cppreference.com/w/cpp/language/overload_resolution and the name lookup link at its beginning.

The first thing to explain is how std::data gets considered at all. That is a thing that ordinary programmers should be aware of really: we have names visible from the current scope plus any associated namespaces related by arguments. So the latter, argument-dependent lookup drags in std:: because our argument is in reality std::vector. That we used it through an alias does not change its origin. If coll was a class template that just reproduced vector or subclassed it, we would not fall into this trap.

While we are at it, letâ€™s mention that this only applies to unqualified lookup, if spelled ::data() in the call, again we would be calling our function.

The next step is more interesting: why is std::data better than ::data? The compiler could still pick ours or declare it ambiguous. The rules state an ordering of candidates to find a best fit based on fewer conversions or more specialization. And at first glance ours looks winning, as coll<T> is more specialized than just <T>. But again the language drops another rule on us that has proven unintuitive and misleading to programmers in many other cases. std::data has not one but two overloads matching our case, one taking T& and one taking T const&. And our ammo, i2, is â€˜lvalue of type coll<char>â€™. What I usually just think of in decltype terms as â€˜coll<char> &â€™. certainly can be bound to a const&, but that involves a qualification conversion. If all we had to select from were the std::data() functions, the non-const one is used. And with ::data, there is no such version, so that will lose the race. We would not have the problem if i2 was const.

So what we get is what we ordered. Did C++17 â€˜break somethingâ€™? That is up to interpretation. The language is not supposed to make subtle changes to well-written code without making some noise. And the presented code in not obviously badly written. Strictly speaking it was 1) open to ADL from the start and 2) the standard stated from the start that it can add names to std:: at any time. And it does. Though I didnâ€™t find a promise, the names used by the standard are all-lowercase. For practice the industry relies on that, and expects no clash from mixed-case names.

This code used all-lowercase names and really poor ones too. Iâ€™d rather go with the other wisdom of â€œBad names will come and bite you in the ass. If not today, then tomorrow.â€ Here it took all the way to C++17, but it happened. (Letâ€™s just kindly overlook the multitude of elements required to create the perfect storm, even be careful to assemble a fine zero-terminated string in i2 to avoid UB.... ☺

The one thing I could not figure out is why the OP expected to see letters "A" and "B" from this, but Iâ€™m happy to leave it as mystery.

Alastair Harrison

Given the unexpected behaviour of the program, a good approach is to try stepping through the data function with a debugger. Perhaps surprisingly, this shows that the data function turns out to never be executed. Neither does the end function. In fact, you can totally remove them from the program and it will still compile and run without complaint on C++17.

The reason is Argument Dependent Lookup (ADL) [1]. ADL is a mechanism which allows the compiler to find free functions that live in other namespaces, without the calling code needing to specify that namespace. The example below shows why thatâ€™s (usually) helpful. The accu::print function is effectively treated as part of the public interface of the Widget class. This can be very useful when writing generic template functions when we donâ€™t know up-front which namespace the arguments live in. But sometimes ADL can unexpectedly call the wrong function, as is the case in the original program.

  namespace accu {
    class widget {};
    void print(widget const&);
  }
  void foo(accu::widget const& w) {
    ...
    // ADL allows compiler to find the
    // accu::print function, even though it's in
    // a different namespace.
    print(w);
    ...
  }

In C++11, the std::end template function was introduced. You pass in a reference to an STL container, and it will return an iterator to the end of the element range. Itâ€™s an unconstrained template (ie. accepts any argument type) and ADL is choosing it instead of the template <typename T> void end(coll<T> &up) template function that weâ€™ve defined in the global namespace.

In the C++ Core Guidelines, rule T.47 [2] tells us to â€œAvoid highly visible unconstrained templates with common namesâ€ because often ADL will select them in preference to the function you might expect. The text points out that the standard library violates the rule widely. And thatâ€™s exactly what is causing problems for us now.

Since std::end doesnâ€™t mutate the container, the effect of the end(i2) call in the test_char function is to do absolutely nothing. Thus the container will be left holding only the letters A and B, without a terminating character.

But it gets better, because the C++17 standard introduced another unconstrained template function called std::data. It accepts a reference to a container and returns a pointer to the block of memory where the containerâ€™s elements are stored. So when the test_char function calls data(i2), ADL is actually selecting the std::data template, which returns a char* pointer to the contents of the i2 container. Thatâ€™s then being passed to the char const* overload of operator<<, which treats the input as if itâ€™s a null-terminated character array.

As we discussed above, our container doesnâ€™t actually contain a null terminator. But luckily for us, the vector... probably allocated more than two bytes of storage, and itâ€™s... probably filled with zeros that look like null terminators. So the program prints the string "AB" and quickly finishes before anyone notices that itâ€™s just caused some Undefined Behaviour.

If youâ€™re curious, you can call i2.reserve(2); immediately after instantiating the container, to limit the memory allocated. Then compile the code using Clangâ€™s address sanitizer [3]. The program will now blow up impressively when it tries to print to std::cout, whizzes right off the end of the memory allocated by the vector and plunges into the nether regions of the free store.

The obvious solution for the ADL problems is to rename the free functions to give them less common names. Another option is to create a strong type to implement the custom container, rather than simply making it a typedef for std::vector.

So whatâ€™s happening on older compilers? Under C++14, the std::data template doesnâ€™t exist, so the data template in the program gets used as expected, and prints some comma-separated values. Unfortunately std::to_string doesnâ€™t have an overload for char, so it ends up performing an implicit conversion to some other integral type, then returns a textual representation of the integer value as a std::string.

Sadly weâ€™re not done quite yet, as thereâ€™s another bug lurking in the data function. When the input container is empty or contains only null elements the result string wonâ€™t have any characters appended to it. The result.pop_back() call will then fail (with more of the dreaded Undefined Behaviour) as weâ€™ve violated its pre-condition that the string is non-empty. This is the sort of thing that unit-tests should be exercising.

Other improvements

Because of the code authorâ€™s simplification efforts, itâ€™s not clear how the original code is intended to be used (especially given the alarmingly terse variable names). If you delimit multiple sequences by calling end after each one then the data function wonâ€™t format them properly; itâ€™ll just remove the comma between subsets. If instead the aim is to have only a single sequence then the start and end calls are superfluous. Worse, thereâ€™s nothing to stop someone calling end multiple times and no mechanism to prevent elements from being inserted after a call to end.

We also need to talk about the use of a default-constructed element as the end-delimiter. Though this seems to work tolerably well when the element type is char, it could be a total disaster for int, because the caller could legitimately want to place a zero value into their container. A more robust design would ensure that the only way to insert an end delimiter is via an explicit API call. It shouldnâ€™t be possible to do it accidentally.

For the multiple-sequence case a better design could involve using another container as the element type, as shown below. It requires no delimiters, so the question of delimiter representations is completely avoided, as is the need for calls to start and end.

  template <typename T>
  using subset = std::vector<T>;
  
  template <typename T>
  using coll = std::vector<subset<T>>;
  
  // Used like this
  auto i2 = coll<char>();
  i2.push_back({'A', 'B'});
  i2.push_back({'C', 'D', 'E'});

Finally, the data function in the original code uses a fairly awkward loop to print a comma-delimited list of elements. It could also be quite inefficient for non-trivial element types, as the range-for loop copies every element. An alternative would be to use an ostream_iterator, as shown below. If the thought of a trailing comma horrifies you, then you can use a custom iterator such as Jerry Coffinâ€™s infix_iterator [4]. Or wait for std::ostream_joiner to be standardised.

  template <typename InputIter>
  std::string comma_delimit(InputIter first,
    InputIter last) {
    std::stringstream ss;
    std::copy(first, last,
      std::ostream_iterator<char>(ss, ","));
    return ss.str();
  }

References

[1] Argument-dependent lookup in cppreference.com:https://en.cppreference.com/w/cpp/language/adl

[2] C++ Core Guidelines: http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#t47-avoid-highly-visible-unconstrained-templates-with-common-names

[3] AddressSanitizer in the Clang 8 documentation:https://clang.llvm.org/docs/AddressSanitizer.html)

[4] c++ infix_iterator code on Code Review Stack Exchange:https://codereview.stackexchange.com/questions/13176/infix-iterator-code

James Holland

At first sight, the behaviour of the running code, as reported by the student, seems impossible. Fortunately, a little investigation reveals what is really going on. The solution to the puzzle centres on the call to data() in the output statement of test_char(). It would be reasonable to assume that the function data() defined in the studentâ€™s code is being called but this is not the case. In fact it is std::vector::data() that is being called.

One of the mechanisms that contribute to determine which function is invoked (when there is more than one candidate) is called Argument Dependent Look-up (ADL). The argument of data(), i2, is of type std::vector<char> and by means of a fairly complicated set of ADL rules, the compiler determines that std::vector::data() should be called in preference to version of data() defined in the studentâ€™s code sample.

The function std::vector::data() returns a pointer to the data contained within the vector i2 which, in our case, is an array of chars. The array is then displayed by std::cout. It is fortunate that end() is called before being printed to the screen. The function end() adds an element to the end of the vector, i2, that has a value of zero. Without calling end(), the array of characters printed would not be null-terminated and so random characters would be printed after "AB" until the character with a value of zero was chanced upon. This also explains why there is no comma in the output, std::vector::data() simply prints the characters in the vector without any intervening characters.

The student noted that with older compilers, the printed result is "65,66". This is because std::vector::data() was only introduced in C++11. When using a pre-C++11 compiler, ADL would not have found a function named std::vector::data() and so the studentâ€™s version of data() would have been called.

Unfortunately, there is a â€˜featureâ€™ of the studentâ€™s data() function that converts the characters to be printed to the ASCII representation. The call to std::to_string() performs this transformation. The ASCII representation of 'A' and 'B' is "65" and "66" respectively. The statement containing std::to_string() should be replaced by result += v, or something equivalent.

There are several ways to ensure the studentâ€™s version of data() is called. One way is to qualify the call of the function data() with the scope resolution operator, ::. Doing this will disable the ADL mechanism and force the compiler to look in the global namespace for the function where it will find the correct version of the function. Another way is to call the required function with an explicit template parameter, i.e. data<char>(). Perhaps the safest way is to rename the function to something that is not in the std::vector namespace and is more specific to the application at hand.

The problems described above illustrates that an addition to the C++ standard can break existing code. In this case, the addition of std::vector::data() in C++11 was the cause of the trouble. It is important that when code is compiled with a new or different compiler, the software is retested before release.

Marcel MarrÃ© and Jan Eisenhauer

There are two aspects to the code for Code Critique 114. One is the behaviour on compilers that do not support C++ 17, and the other is the behaviour under the current standard. Finally, there are some general comments on the code.

Behaviour pre-C++ 17

Type char is actually an integer type. Therefore, when result += std::to_string(v); is executed, the number in v, which is the ASCII code of the character, is turned into a string and added to the result. Hence, the result is 65,66. Leaving out std::to_string might work, i.e. result += v;. Alternatively, using std::string::append certainly should work with type char: result.append(v);. Of course, the comma works as expected between the output of the ASCII codes of the characters in the vector.

Behaviour under C++ 17

The problem here is a combination of factors â€˜conspiringâ€™ to hide the userâ€™s data() function for the call made at the end of test_char().

The new standard provides a function template std::data() with a very similar signature to that the user has defined. Due to argument dependent lookup, this is in scope for the call in question. Argument dependent lookup means that because the collection i2 is in fact a std::vector, functions in the containing namespace, std, are valid candidates.

Since std::data() also offers a template for non-const parameters â€“ template <class C> constexpr auto data(C& c) -> decltype(c.data());, see https://en.cppreference.com/w/cpp/iterator/data, overload (1) â€“ this is a better fit than the userâ€™s own definition of data().

The std::data() function invokes the containerâ€™s own data(), which in the case of a std::vector<char> returns "AB" as a single string.

This problem may also hold for end(); while there is no std::start() I am aware of, a non-member std::end() exists to get the iterator one past the end of a collection. In this example, the code works even if end() does not add a default-constructed item and the iterator returned by std::end() is ignored, but it does constitute a â€˜code smellâ€™.

To avoid such problems, it is a good idea to follow the advice of the (forthcoming) Standard Library Compatibility Guidelines (see https://isocpp.org/std/standing-documents/sd-8-standard-library-compatibility). At CppCon 2018, Titus Winters introduced these in his talk â€˜Standard Library Compatibility Guidelines (SD-8)â€™ (see https://www.youtube.com/watch?v=BWvSSsKCiAw). Pertinent to this code is the fact that the standard library reserves the right to add new functions, so if you have any functions that can potentially take data types from the std-namespace as parameters, these should be put into a namespace and the namespace specified explicitly in the call to avoid problems.

General comments on the code

In its current form, coll, start() and end() are not a proper abstraction, because to add an item, the user has to use the underlying containerâ€™s support for adding items. Furthermore, it is assumed that operator bool() const of type T is equivalent to a function that would best be called isPrintable or is_printable (although snake case is discouraged, see again the aforementioned talk by Titus Winters).

This could lead to bugs, such as if a type has an operator bool() const that does not yield false for a default constructed object, so the assumed end marker (which is not required anyway, because the range-based for-loop prevents overstepping the vector bounds anyway) would be printed, adding an item that was added by the userâ€™s end() function, rather than an explicitly added item.

The other way around, there might be types where some values yield true, but isPrintable should really be false for them. And, in fact, if chars were not treated as numbers, char is such a type with many of the ASCII-values below 32 representing non-printable characters like bell and end of text.

Jim Segrave

This one was interesting â€“ I tried it with g++ 4.85 and clang++ 3.4.2 (the default Centos 7 distributions) and got the 65,66 output as described in the problem statement. I then switched to c++17 compilers (g++ 7.31 and clang++ 5.01) and got the AB output.

The 65,66 output is a normal result, thereâ€™s no reason to use to_string() if you want the ASCII character. Changing the line

  result += std::to_string(v);

  result += v;

fixed the 65,66 problem when compiled with the option -std=c++11.

To identify the cause of the AB output when using the -std=c++17 option, I tried placing some output statements within the data() function, because I could not see why it wouldnâ€™t at least output values separated by commas. I discovered that, when using c++17 as a standard, the function was never called. Checking with the nm utility showed it was in the compiled executable with the expected return type and parameter list. Two quick experiments told me it was time to dig deeper into the changes from c++11 to c++17.

Changing the line:

  std::cout << data(i2) << '\n';

  std::cout << xdata(i2) << '\n';

caused the function to be invoked. It also gets invoked if that line is changed to:

  std::cout << data<char>(i2) << '\n';

This tells me that the compiler isnâ€™t seeing data(i2) as a function call, but something else entirely. I thought it was time to look deeper still into the c++17 changes and discovered this (described in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4280.pdf) which details changes to allow getting the size of a container, testing if the container is empty and getting a pointer of the appropriate type to the containerâ€™s data all as constexprâ€™s rather than requiring a member function call on the container.

From the above document, the standard now adds:

template <class C> constexpr auto

data(const C& c) -> decltype(c.data());

Returns: c.data().

In other words, the line in test_char:

  std::cout << data(i2) << '\n';

does not call the function data(), it simply replaces data(i2) with a pointer to the data in the vector i2 (equivalent to i2.data()). This is a pointer to char, pointing to i2[0], i2â€™s values are stored contiguously in memory and, luckily, the sample program pushes an empty value onto the vector (a NUL char), so cout outputs the contents of i2 in order as a C-string. Itâ€™s not clear to me that in the absence of the call to end() which appends a NUL to i2 that you might not get a core dump or a collection of output for locations not a part of i2â€™s data at all.

As a side note, perhaps itâ€™s just my own personal prejudice, but the method of building a string of values separated by commas is better done by always outputting a separator and the next value. At the start, the separator is set to a null string, after outputting a value, the separator is changed to the actual string to separate the outputs:

  const char *sep = "";
  std::string res;
  for (auto v: i2) {
    res += sep;
    res += v;
    sep = [<CodeComment>
whatever you want as a separator</CodeComment>
]
  }

The advantages of this method are that the string being built is always ready for use, it contains values separated as desired with no separator before the first value or after the last one. If you need to use the string while itâ€™s still being built, itâ€™s ready for that. It also means that if you decide to change the separator from a single character ',' for example, to include a space as well: ", ", you only change one line of code, you donâ€™t need to add a second pop_back() call. The separator could even be passed as a pointer to const char.

My submitted code makes this change, it will call test_char() once with the default separator, giving output A,B, then call it again with a separator of "-->", giving an output of A-->B.

  #include <iostream>
  #include <string>
  #include <vector>
  template <typename T>
  using coll = std::vector<T>;
  // Start the collection
  template <typename T>
  void start(coll<T> &up) {
    up.clear();
  }
  // End the collection
  template <typename T>
  void end(coll<T> &up) {
    up.push_back({});
  }
  // Extract the non-zero data as a string
  template <typename T>
  std::string data(coll<T> const &up,
    const char *separator) {
    std::string result;
    const char *sep = "";
    for (auto v : up) {
      if (v) {
        result += sep;
        result += v;
        sep = separator;
      }
    }
    return result;
  }
  void test_char(const char *separator = ",") {
    auto i2 = coll<char>();
    start(i2);
    i2.push_back('A');
    i2.push_back('B');
    end(i2);
    // actual test code elided
    std::cout << data<char>(i2, separator)
      << '\n';
  }
  int main() {
    test_char();
    test_char("-->");
  }

Commentary

This issueâ€™s critique was inspired by recent discussions and example code demonstrating how code could be broken by the addition of new functions into namespace std, including cases where many programmers would not be expecting trouble.

The second angle in this critique is the dual role of char in C++ to represent characters and as an arithmetic type. When a char argument is passed to std::to_string(), the best overload is selected using the integral promotion rules. This will typically select the int overload of std::to_string() (although it could select the unsigned overload on some platforms.)

Note, too, the final subtlety of the rules for overload resolution and ADL in that the call to end() picks the locally defined function template while the call to data() picks the one found through ADL â€“ this is because the of the different const qualification of the argument in the two functions.

The winner of Code Critique 114

The critiques between them seemed to give good coverage of the issues in the code. There were a variety of ways used to explain the name lookup issue that resulted in finding an â€˜unexpectedâ€™ overload of data() â€“ this is the sort of explanation that is much easier face-to-face as you can quickly adjust the explanation to the degree of understanding of the recipient!

There are several ways to avoid the unfortunate name lookup â€“ itâ€™s good to know more than one technique as the best one to use depends on the circumstances of the case and James provided three. Alastair pointed out that avoiding the use of a simple alias for std::vector side-steps the problem here, Marcel and Janâ€™s critique suggested putting the function into its own namespace, and multiple critiques mentioned renaming the function and using the scope resolution operator.

Alastair reported that end() was not called â€“ which puzzled me since it should be called in all versions of C++. However, he was correct that it can be removed as this then allows overload resolution to pick the less specialized end() function from namespace std. (Adding a suitable static_assert to the end() function template is one way of demonstrating whether or not the template is used.) This led him slightly astray â€“ but into an interesting discussion about whether or not the â€˜one past the endâ€™ character would be zero.

Pal pointed out a couple of details in the implementation where there might be optimisation opportunities: one of the challenges with writing templates in C++ is to try and consider the possible types that might be used in instantiations and ensure that good choices are made for the general case, such as avoiding unnecessary copies.

Jim gave a useful side-note about delimited string, or alternatively you might like one of the iterator solutions provided by Alastair.

Both Alastairâ€™s and Marcel and Janâ€™s critiques also gave some useful discussion about the design issues with the code as presented (this is always a bit of a challenge given the simplified nature of the critiques presented in this column).

Taking all the points into consideration, I decided to award the prize for this issue to Marcel and Janâ€™s critique.

Postscript to Code Critique 113

Balog Pal writes:

I almost wrote my entry at first sight â€“ it was so perfect a bait â€“ just one of my solutions didnâ€™t compile and I held backâ€¦ 2 months is plenty of time, so obviously I realized what I forgot 2 days after the deadline. I then thought still to post it as a late entry, but expected plenty of submissions with similar points, so thought it better to wait and only post if something was missed.

And there is such a point: there was much talk about undefined behavior and what to expect under specific ABIs, yet the first critique never mentioned it. Some may even believe that if an implementation specifies how arguments are passed and they are placed as we would like them, the example would be correct. It is not.

The expression &first + 1 + sizeof...(rest) has defined behavior only when the last part is 0. Let me quote the standard at [epr.add] on p4:

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements⁸⁶, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 _ i + j _ n; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i - j] if 0 _ i - j _ n; otherwise, the behavior is undefined.

⁸⁶⁾ An object that is not an array element is considered to belong to a single-element array for this purpose; see 7.6.2.1. A pointer past the last element of an array x of n elements is considered to be equivalent to a pointer to a hypothetical element x[n] for this purpose; see 6.7.2.

Now, first is a solo object, so the pointer &first is good for the addition of 0 and 1. Anything beyond that is undefined behavior.

IME, itâ€™s a point too often overlooked. Probably because it happens to â€˜workâ€™ on most current flat-memory platforms. Those who remember the old DOS or Windows 16-bit time where you had __huge pointers and such a pointer had a 16-bit paragraph or selector part and a 16-bit offset part. And using a wild + (or especially -) that looked outside an â€˜objectâ€™ could easily produce broken result or even a direct crash.

Code Critique 115

(Submissions to scc@accu.org by Feb 1st)

I am trying to write a generic â€˜fieldâ€™ that holds a typed value, optionally with a â€˜dimensionâ€™ such as â€˜kgâ€™. One use case is to be able to hold attributes of various different types in a standard collection. I want to be able to compare two attributes to see if theyâ€™re the same (and maybe later to sort them?) â€“ two simple fields are only comparable if they are the same underlying type and value and two â€˜dimensionedâ€™ fields need to be using the same dimension too. Iâ€™ve put something together, but Iâ€™m not quite sure why in my little test program I get true for comparing house number and height:

    Does weight == height? false
    Does weight == house number? false
    Does house number == height? true
    Re-reading attributes
    Unchanged
    Reading new attributes
    some values were updated

Can you help with the problem reported, and point out some potential flaws in the direction being taken and/or alternative ways to approach this problem?

The code is in Listing 2 (field.h) and Listing 3 (field test.cpp).

#pragma once 
#include <memory>
#include <string>
// Base class of the 'field' hierarchy
class field
{
protected:
  virtual bool
  equality(field const &rhs) const = 0;
public:
  // convert any field value to a string
  virtual std::string to_string() const = 0;

  // try to compare any pair of fields
  friend bool operator==(field const &lhs,
    field const &rhs)
  {
    return lhs.equality(rhs);
  }
};
// Holds a polymorphic field value
struct field_holder : std::unique_ptr<field>
{
  using std::unique_ptr<field>::unique_ptr;
  bool
  operator==(field_holder const &rhs) const
  {
    return **this == *rhs;
  }
};
// Implementation in terms of an underlying
// representation
template <typename Rep>
class field_t : public field
{
public:
  field_t(Rep const &t) : value_(t) {}
  std::string to_string() const override
  {
    return std::to_string(value_);
  }
protected:
  bool equality(field const &f) const override
  {
    // compare iff of the same type
    auto p =
      dynamic_cast<field_t const *>(&f);
    if (p)
      return value_ == p->value_;
    else
      return false;
  }
private:
  Rep value_;
};
// Add a dimension
#include <typeinfo>
template <typename dimension,
  typename Rep = int>
class typed_field : public field_t<Rep>
{
public:
  using field_t<Rep>::field_t;
  std::string to_string() const override
  {
    return field_t<Rep>::to_string()
      + dimension::name();
  }
protected:
  bool equality(field const &f) const override
  {
    // compare iff the _same_ dimension
    return typeid(*this) == typeid(f) &&
      field_t<Rep>::equality(f);
  }
};

Listing 2

#include "field.h"
#include <iostream>
#include <map>
// Example use
struct tag_kg{ static const char *name()
  { return "kg"; } };
struct tag_m{ static const char *name()
  { return "m"; } };
using kg = typed_field<tag_kg>;
using metre = typed_field<tag_m>;
using attribute_map = std::map<std::string,
  field_holder>;
// Dummy method for this test program
attribute_map read_attributes(int value)
{
  attribute_map result;
  result["weight"]
    = std::make_unique<kg>(value);
  result["height"]
    = std::make_unique<metre>(value);
  result["house number"]
    = std::make_unique<field_t<int>>(value);
  return result;
}
int main()
{
  auto old_attributes = read_attributes(130);
  // height and weight aren't comparable
  std::cout << "Does weight == height? "
    << std::boolalpha << 
     (old_attributes["weight"] ==
      old_attributes["height"])
    << '\n';
  // weight and house number aren't comparable
  std::cout << "Does weight == house number? "
    << std::boolalpha << 
     (old_attributes["weight"] ==
      old_attributes["house number"])
    << '\n';
  // house number and height aren't comparable
  std::cout << "Does house number == height? "
    << std::boolalpha << 
     (old_attributes["house number"] ==
      old_attributes["height"])
    << '\n';
  std::cout << "Re-reading attributes\n";     
  auto new_attributes = read_attributes(130);
  if (old_attributes == new_attributes)
  {
    std::cout << "Unchanged\n";
  }
  else
  {
    // update ...
    std::cout << "Some values were updated\n";
  }

  std::cout << "Reading new attributes\n";
  new_attributes = read_attributes(140);
  if (old_attributes == new_attributes)
  {
    std::cout << "Unchanged\n";
  }
  else
  {
    // update ...
    std::cout << "some values were updated\n";
  }
}

Listing 3

You can also get the current problem from the accu-general mail list (next entry is posted around the last issueâ€™s deadline) or from the ACCU website (http://accu.org/index.php/journal).

This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.

Roger Orr Roger has been programming for over 20 years, most recently in C++ and Java for various investment banks in Canary Wharf and the City. He joined ACCU in 1999 and the BSI C++ panel in 2002.

Notes:

More fields may be available via dynamicdata ..