Title: Code Critique Competition 114

Author: Bob Schmidt

Date: 02 November 2018 17:14:16 +00:00 or Fri, 02 November 2018 17:14:16 +00:00

Summary: Set and collated by Roger Orr. A book prize is awarded for the best entry.

Body:

Note: if you would rather not have your critique visible online please inform me. (Email addresses are not publicly visible.)

Last issueâ€™s code

This is a very short code sample but nonetheless an interesting problem.

The writer is using a variadic template to pass multiple arguments to the least function, which uses std::sort to find the smallest input number. They report the code used to work in 32-bit on a couple of compilers, but itâ€™s not reliable when used in more modern 64-bit projects.

Can you identify the problem(s), and suggest any fixes (or alternative approaches)?

The code is in Listing 1.

#include <algorithm>
#include <iostream>

// find the least value in the arguments
template <typename T, typename... Ts>
T least(T first, Ts... rest)
{
  std::sort(&first,
    &first + 1 + sizeof...(rest));
  return first;
}
int main()
{
  std::cout << "least(1): "
    << least(1) << std::endl;
  std::cout << "least(1,2,3,4): "
    << least(1,2,3,4) << std::endl;
  std::cout << "least(10,9,8,7,6,5,4,3,2,1): "
    << least(10,9,8,7,6,5,4,3,2,1)
    << std::endl;
}

Listing 1

Critiques

Gennaro Prota

Of course, the problem with the code is that it treats distinct variables (function parameters) as if it were accessing elements of the same array. It simply invokes undefined behaviour.

IMHO, the easiest thing to do is to leverage the min_element() algorithm and use std::initializer_list<>:

  #include <algorithm>
  #include <initializer_list>
  #include <iostream>
  
  // find the least value
  template <typename T>
  T least(std::initializer_list<T> list)
  {
    // assume list is not empty; it would be nice
    // to be able to do the following:
    // static_assert(list.size() != 0, "");
    return *std::min_element(list.begin(),
    list.end());
  }
  int main()
  {
    std::cout << "least({1}): "
      << least({1}) << std::endl;
    std::cout << "least({1,2,3,4}): "
      << least({1,2,3,4}) << std::endl;
    std::cout << "least({10,9,8,7,6,5,4,3,2,1}): "
      << least({10,9,8,7,6,5,4,3,2,1})
      << std::endl;
  }

Stewart Becker

So, first off â€“ kudos for wanting to use an <algorithm> rather than rolling your own. This is good practice. Unfortunately, thatâ€™s about all that can be said in praise of this code. There are three main issues with it:

Implementation-defined behaviour
Undefined behaviour
Choice of algorithm

The key approach used by the function is using the addresses of function arguments as iterators to a standard algorithm. While pointers can be used as iterators, we do need to make sure they form a valid iterator range. The assumption seems to be that the function arguments will be contiguous in memory so as to behave like an array, meaning that we can use pointer arithmetic to determine the â€˜endâ€™, and incrementing the address of the first argument will give the addresses the others in order. Indeed, it seems all the major compilers and calling conventions involve pushing arguments onto the stack in reverse order, so this effect is to be expected. However, this is a convention and implementation-defined. Furthermore, if a function is inlined then all bets are off regarding the relative memory locations of the arguments.

What I suspect is happening is that in the 32-bit world, an int is the same width as a stack element so the pointer arithmetic trick worked. However, in the 64-bit world ints are only half as wide, so alignment issues come into play. The pointer increment will move forward only in multiples of 32 bits rather than to the next value which is 64 bits away. Trying to force the pointer arithmetic to work in 64-bit might make the trick work again, but I wouldnâ€™t want to count on it. It would be better if we could program to the standard alone, rather than relying on implementation details.

As written, the function exhibits undefined behaviour when passed different types. As written, the Ts... pack can expand to any list of types. We could, for example, pass in an int, a double and an object type, e.g. std::string. The result would still compile, but at runtime it would perform pointer arithmetic based only on the first type, and dereference that memory location as type T, not the Ts... types. Even if we know the types are of the same size this is still undefined behaviour. If we are lucky, sizeof(first) will be no greater than the average of sizeof(rest)... . We then â€˜onlyâ€™ have to worry about the issues of pointers being dereferenced as the wrong type, and that some values passed lie beyond the end iterator. Weâ€™re still deep in UB territory, but we should just return a wrong result, at least if weâ€™re working with native types like int. With object type then we could very easily get an object in an invalid state, which will cause very hard to debug problems later on, but thatâ€™s a secondary effect. The worst case scenario is if sizeof(first) is large compared to the other arguments. We would then be asking std::sort to operate on memory locations we donâ€™t actually have the right to use. Given that std::sort is a modifying algorithm this means we are overwriting arbitrary memory â€“ a disaster waiting to happen!

We can make a start on both issues at once by changing the function signature. If we want a contiguous memory layout with all arguments of the same type, we can do so by using a collection type that does just that. This is the definition of an array, so letâ€™s start by using one â€“ in the guise of std::array because we want to be modern. Also, we can guarantee check at compile-time when it is empty, which other collections (in particular the usual first choice std::vector) cannot detect. One thing the old code did not have to worry about was finding the least of an empty collection. Weâ€™ll also use actual begin and end iterators instead of pointers:

  template<typename T, size_t N>
  T least(std::array<T, N> array)
  {
    static_assert(N > 0, "Empty array");
    std::sort(array.begin(), array.end());
    return *(array.begin());
  }

Oh dear! Now our calls donâ€™t compile. This is because they are passing many arguments as integers, instead of one array. Weâ€™ll need to modify them to pass as one argument. Instead of least(1,2,3,4) weâ€™ll add curly braces round the arguments and call least({1,2,3,4}).

Nope, that still doesnâ€™t compile. The problem is that {1,2,3,4} isnâ€™t a typed C++ value, therefore the compiler canâ€™t deduce its type. However, there C++ is one type that will accept such syntax: std::initializer_list. But now we need a runtime check for emptiness. Again weâ€™ll be modern and use ADL to lookup the begin and end iterators, just in case we change the type again (hint: we will):

  template<typename T>
  T least(std::initializer_list<T> list)
  {
    using std::begin;
    using std::end;
    auto first = begin(list);
    auto last = end(list);
    assert(first != last);
    std::sort(first, last);
    return *first;
  }

WHAT NOW, COMPILER?! We experience the joy of compile errors longer than our codebase that are the signature of programming with templates. It seems that std::sort wonâ€™t work with std::initializer_list. The key error message is error: assignment of read-only location '* __first' (gcc). In English, it means that std::sort is trying to manipulate the values inside the list, but std::initializer_list only provides read-only iterators, even when we passed it by (non-const) value to the least function. Surely it must be possible to get the smallest element from a list without needing to re-order it?

This brings me to the final issue â€“ choice of algorithm. The use of std::sort has two problems. As we have just seen, it is a modifying algorithm, so we cannot use it on ranges that are read-only. Furthermore, it is overkill for finding just the least element as it is takes O(n log n). We really should be able to achieve our goal in a single, read-only pass. Indeed we can, by using the algorithm std::min_element. Since weâ€™re now using a read-only algorithm, weâ€™ll take the list by const reference too:

  template<typename T>
  T least(const std::initializer_list<T>& list)
  {
    ...
    return *std::min_element(first, last);
  }

Hooray! It compiles, runs and whatâ€™s more actually works without relying on implementation details. However it only works on containers of type std::initializer_list, which arenâ€™t particularly common. It really would be better if we could pass any container, especially the ubiquitous std::vector and our first attempt, std::array. Also, what it we want to find the greatest element? One of the benefits of std::min_element is that we can pass a customised comparison object, including a lambda. We should template the whole range type:

  template<typename Range>
  auto least(Range range)
  /* code unchanged */

Oh no â€“ our examples donâ€™t compile again! The issue is that because brace expressions arenâ€™t actually C++ values, they canâ€™t be type deduced as std::initializer_list objects. Weâ€™ll need to add an overload with the original signature. Since the code in both will be identical. weâ€™ll do this by writing an implementation function and delegating to it in the overloads.

Also, weâ€™ll add a customisable but defaulted comparator:

  template<typename Range, typename Comp>
  auto least_impl(Range range, Comp comp)
  {
    /* ... only the last line changes */
    return *std::min_element(first, last, comp);
  }
  template<typename Range,
  typename Comp=std::less<>>
  auto least(Range range, Comp comp={})
  {
    return least_impl(range, comp);
  }
  template<typename T, typename Comp=std::less<>>
  T least(const std::initializer_list<T>& list,
  Comp comp={})
  {
    return least_impl(list, comp);
  }

Hmm, that visible least_impl function is a bit ugly. If this were a class, weâ€™d probably make it private. One trick used is to move it into a separate namespace, often named â€˜detailâ€™ or â€˜implâ€™. But the class approach appeals to me. As for why, when I started this, I forgot about std::min_element and tried to roll my own as std::accumulate(first, last, *first, std::min), which wouldnâ€™t compile either! The problem is that because std::min is a function overload, the compiler couldnâ€™t work out which one I meant. Our overloads of least mean that trying build algorithm up by passing it as a parameter will suffer the same problem. But if we make it a functor then we can just reference a single unambiguous object. All that changes is how and where we declare the functions, and one final object declaration:

  class Least {
  private:    
    template<typename Range,
    typename Comp = std::less<>>
    auto least_impl(Range range, Comp comp) const;
  public:
    template<typename Range,
    typename Comp=std::less<>>
    auto operator()(Range range,
    Comp comp={}) const;
    template<typename T,
    typename Comp=std::less<>>
    T operator()(
    const std::initializer_list<T>& list,
    Comp comp={}) const;
  };
  static constexpr Least least = {};

Joe Wood <joew60@yahoo.com>

By sheer chance I am currently using a 32bit Ubuntu 18.04 distribution. Sometimes it works as expected, sometimes it produces an impossible result, e.g. outside the parameter range, and sometimes it reports a stack smash, so there is obviously something wrong.

Letâ€™s start by trying GCCâ€™s sanitizers, e.g.

  g++ -g -O -fsanitize=address \
    -fsanitize=undefined code.cpp

Well, running the compiled code produced some interesting information. (The output is a little too long to reproduce here.) The source of the problem is in the call to std::sort, and the actual bug is in reading and comparing two iterators deep inside std::sort. The offending line is

  { return *__it1 < *__it2; }

where __it1 and __it2 are both iterators.

Why does this fail? We know that sorting an STL conforming container with random access iterators works. Hmm, sorting a valid STL container relies on the data been contiguous [1]. Is our data contiguous? Unfortunately not; to understand why, we have to turn our attention to how parameters are passed to functions (the calling convention) and the Application Binary Interface (ABI). Firstly, both C and C++ appear to put all parameters on the stack starting at the right hand side and moving leftwards. This is a must for functions like printf, so it knows where to find the format string and hence process all the arguments. Secondly, I said appears to put the arguments on the stack. As an optimisation, a small number of variables may be passed in the CPU registers. The exact details vary by processor, compiler and ABI.

But what exactly is being passed to std::sort? Well letâ€™s look at the original code

  std::sort(&first,
    &first + 1 + sizeof...(rest));

&first is clearly the address of the first parameter.
&first + 1 + sizeof...(rest) is the address of the end of the region to be sorted. This assumes that the region is contiguous starting at &first. However, due to C++ calling conventions, this may or may not be true.

Hence, the straight forward reason for differing behaviour of least, is that the parameters may not be in contiguous memory, hence std::sort may experience a memory access error.

Before, looking at possible solutions lets get another problem into the open. The student wants to find the smallest number of a supplied set by sorting the set and taking the first element of the sorted set. This at best is going to be of order O(NlogN), and we have absolutely no interest in the rest of the sorted set beyond the smallest. Using std::min to find the smallest element is of order O(N) and we do not need to carry any additional information around.

To recap, we need to find the smallest element of a set of numbers supplied as parameters to a variadic function, without any memory â€˜gapsâ€™ due to the calling convention. There are two possible approaches, viz.: recursion on the rest parameter or use std::initializer_list. initializer_list guarantees that its memory is contiguous.

Letâ€™s start with the recursive approach, we will call the procedure least_r. As with all recursive processes, we need a base case, which for this problem must be one item, as in the sample code. That is just

  template <typename T>
  inline T least_r(T first) { 
    // single input, so just return it
    return first;
  }

That is easy. Now for the recursive case

  template <typename T, typename... Ts>
  T least_r(T first, Ts... rest) {
    // to prevent non-contiguous memory access, 
    // we peel the arguments off in turn
    return std::min(first, least_r(rest...));
  }

The second approach, using std::initializer_list is just as simple. We will call this procedure least_il forinitializer_list.

  template <typename T, typename... Ts>
  T least_il(T first, Ts... rest) {
    // to prevent non-contiguous memory access, 
    // we must create an initializer list
    const std::initializer_list<T> il {first,
      rest...};
    return std::min(il);
  }

Both methods work, with the sample inputs and a set of 100 elements, and are acceptable to the -fsanitize test above. For added reassurance, both pass Clangâ€™s scan-build utility.

There is one more consideration, namely, efficiency. Whilst it is true that to use the initializer_list we have to make a complete copy of the input set, it runs quicker than the recursive version, because it does not have to keep calling a smaller version of itself.

Note

[1] Strictly speaking, this is not an explicit requirement, rather the exact requirement is that random access to the container happens in constant time. In practice all containers satisfying the random access criteria are also contiguous. [Ed: this is incorrect â€“ for instance std::deque is a counter-example] Being able to â€˜splitâ€™ memory on updating an iterator would be a challenge.

James Holland

I can see that the student is trying to use std::sort() to iterate through the parameters of least() in order to find the one with the lowest value. Unfortunately, the code relies on the parameters being stored contiguously starting with first. This may not be the case and is not guaranteed by the standard. One way around this problem is to copy the parameters into a container from which std::sort() can operate, as shown below.

  #include <algorithm>
  #include <array>
  #include <iostream>
  template <typename T, typename... Ts>
  T least(T first, Ts... rest)
  {
    std::array<T, sizeof...(rest) + 1> a{first,
      rest...};
    std::sort(a.begin(), a.end());
    return *a.begin();
  }

The container I have chosen is of type std::array and is named a. In order to declare a, the type of its elements needs to be specified. This is conveniently provided by the template parameter T. The declaration of a also needs to specify the length of the array. The array has to accommodate however many parameters there are in the parameter pack rest plus the parameter first, and is simply calculated by the expression sizeof...(rest) + 1.

Finding the smallest value of the array does not require std::sort(). The algorithm std::min_element() will be faster and, in this case, slightly easier to use.

  template <typename T, typename... Ts>
  T least(T first, Ts... rest)
  {
    std::array<T, sizeof...(rest) + 1> a{first,
      rest...};
    return *std::min_element(a.begin(),
      a.end());
  }

There is also a recursive approach that can be used to find the minimum value as shown below.

  #include <algorithm>
  #include <iostream>
  template <typename T>
  T least(T first)
  {
    return first;
  }
  template <typename T, typename... Ts>
  T least(T first, Ts... rest)
  {
    return std::min(first, least(rest...));
  }

When least() is called with more than one parameter, the version with the parameter pack is invoked. The returned value is the minimum of its first parameter and the minimum of all the other parameters. The minimum value of all the other parameters is obtained by calling least() again but this time not including the first parameter. This recursive arrangement keeps going until least() is called with just one parameter, at which point the version without the parameter pack is invoked. This version simply returns its parameter (the minimum of just one value is the value itself). Eventually, when all the invocations of least() have returned, the minimum value of the entire sequence is obtained.

It is interesting to note that although I have described this technique as being recursive, it is only recursive from a source code point of view. The compiler generates a series of least() functions each with a different number of parameters ranging from 1 to n where n is the number of elements in the sequence. Which one is called depends on the number of parameters provided. In this way, when least() â€˜recursesâ€™ it is a different function that is called each time.

It is possible to do away with the recursion and let std::min() operate on the whole sequence in one go as shown in my final version of least() below.

  template <typename... Ts> 
  auto least(Ts... all)
  {
    return std::min({all...});
  }

Here, the parameter pack is expanded and used to create an unnamed object of type std::initializer_list that is passed to std::min(). The return type of least() is automatically deduced.

Jason Spencer

The headline bug of the unreliability of this code is down to function calling conventions for different platforms/languages/architectures.

Specifically, the greater number of registers available on 64-bit CPUs may mean the compiler passes function arguments in registers rather than on the stack, so we canâ€™t use pointer magic to calculate the end iterator.

The issue can be seen by entering the following program in to Matt Godboltâ€™s Compiler Explorer [1]:

  A0> int fn (int a, int b, int c, int d, int e,
      int f, int g) {
  A1> 
  A2>     return (a+b+c+d+e+f+g);
  A3> }
  A4> int main() {
  A5>      return fn(1,2,3,4,5,6,7);
  A6> }

With the x86-64 gcc 8.1 compiler and no switches (therefore defaulting to amd64) the call looks like:

  B0> push    7
  B1> mov     r9d, 6
  B2> mov     r8d, 5
  B3> mov     ecx, 4
  B4> mov     edx, 3
  B5> mov     esi, 2
  B6> mov     edi, 1
  B7> call fn(int, int, int, int, int, int, int)
  B8> add     rsp, 8
  B9> nop

The first six arguments are copied to registers (lines B1 to B6), but the seventh, the literal 7, is pushed to the stack (line B0).

When compiled with the same compiler and the -m32 switch (telling the compiler to generate a binary for x86) the call looks like this:

  C0> sub     esp, 4
  C1> push    7
  C2> push    6
  C3> push    5
  C4> push    4
  C5> push    3
  C6> push    2
  C7> push    1
  C8> call fn(int, int, int, int, int, int, int)
  C9> add     esp, 32
  CA> nop

The subtraction in line C0 is to keep the stack pointer correctly aligned (on x86 the esp value has to be aligned to a 16 byte boundary at the function call).

The return value in both cases is returned in register EAX.

Lines B8 and C9 put the stack pointer to where it was before, effectively wiping that part of the stack used to transfer arguments.

C++ templates are instantiated at compile time, and the same template, but with a different number of arguments, is effectively a different function (which is why some people complain that templates cause "code bloat").

As such the instantiation, aka function, with four arguments may have all arguments reside in registers, on both x86 and amd64, but the function with 10 arguments will have some on the stack and some in registers.

The bit thatâ€™s missing from the above snippets is what happens to the passed arguments in the called function. They can be pushed from the registers on to the stack, which is what actually happened in the case above (but not shown here), but that is not always the case.

So why do we care about how itâ€™s passed? Well, in the studentâ€™s least templated function we take the address of the first argument and we use that as the start iterator passed to std::sort, and from that we do some arithmetic to find the end iterator (the one after the last element). But because of the calling conventions, and because we are dealing with raw pointers, the elements to be sorted may not be contiguous in memory and they may also be in registers. The arguments could also be pushed on the stack in reverse order (but thatâ€™s not what happens above since the stack grows â€˜downwardsâ€™ in memory, hence the add in C9 and B8 actually unwinds the stack). If you attempt to take the address of one of the arguments, ie &first, and it is in a register, then the compiler will copy it first to the stack â€“ but the other elements it may not.

Donâ€™t take this is as a hard rule, though â€“ the way arguments are passed and whether the calling or the called function does the cleanup depends on a number of things, not just the CPU architecture, and can change between OSes, the language, the compiler, and even compiler version â€“ see [2], [3], [4], [5], [6] and [7] for more details. You can also add attributes to functions to change the calling convention on a per-function basis, for example [8]. It is these differences that causes problems for the student.

Also, since weâ€™re dealing with literal values, when optimisation is enabled most compilers will calculate the least value at compile time, rather than at runtime anyway (and magically work!).

Put simply, you should never ever make such assumptions about the calling conventions.

A few other things stand out straight away in the studentâ€™s code:

The code unnecessarily uses sort to get the lowest value. std::sort is guaranteed to be O(N ln N) in time complexity, but we donâ€™t need all of the values sorted, we just need the lowest value. To do this we can simply scan through the values keeping a record of which is the lowest value. This can be done in O(N) since each value has to be checked only once.
The use of pass by copy â€“ this requires the type to be copyable â€“ fine if itâ€™s a POD (plain-old-datatype), not so much if itâ€™s a UDT (user-defined-type). Prefer to pass by const reference or const copy (the latter may lead to copy elision, in some circumstances).
There is a potential reinterpret_cast of arguments: std::sort deduces the type of the elements being sorted from the iterators being passed to it (they must both have the same type). With a variadic template, the type of each argument can be different. So if the first argument is a char (say, one byte in size) and the remaining three arguments are long long (say, 8 bytes), then the type to be sorted is deduced as char but the end iterator points to the char four (1+sizeof...(rest) => 1+3) bytes later, which is not even the whole way through the first long long argument. And the bytes extracted from the first long long argument are effectively nonsense anyway. The precise values are dependant on the endianess of the architecture. This wasnâ€™t seen here because the literal digits were assumed to be of type int, but adding a float in the middle will easily corrupt the result.
And the elephant in the room is that the least function is already part of the STL since C++14: template< class T > constexpr T min( std::initializer_list<T> ilist ); Bear in mind, though, that initializer_lists pass by copy, so T has to be again copyable.

As to an alternative implementationâ€¦ I donâ€™t knowâ€¦ it all depends on what you really want from it.

To go completely the other way from a pass-by-copy function, and if youâ€™re particularly weird, you could pass by universal reference everywhere and return such a reference, so you could write a function that will let you overwrite the left-most lowest value:

  template <typename T> T&& least(T&& first) {
    return (std::forward<T>(first));
  }
  template <typename T, typename... Ts> T&&
    least(T&& first, Ts&&... rest)
  {
    auto && l =
      least(std::forward<Ts>(rest)...);
    return std::forward<T>(((first <= l ) ?
      first : l));
  }

The template here starts from the last value and works forward, so make the ternary test (first < l ) if you want a reference to the right-most lowest value. And obviously >= if you want your least function to perform a left-most most search.

So now that weâ€™re dealing with references we can do weird things like this:

  int main() {
  int a = 3, b = 1, c = 2, d = 0;
  std::cout << "Before : " << a << ',' << b <<
    ',' << c << ',' << d << '\n';	
  least(a,b,c,-1,d) = 42;
  std::cout << "TP1 : " << a << ',' << b <<
    ',' << c << ',' << d <<'\n';	
  least(a,b,c,d) = 42;
  std::cout << "TP2 : " << a << ',' << b <<
    ',' << c << ',' << d <<'\n';	
  std::cout << "TP3 : " << least(2,3,4,5,6,78,9)
    << '\n';
  }

Where the output is:

  $ ./a.out 
  Before : 3,1,2,0
  TP1 : 3,1,2,0
  TP2 : 3,1,2,42
  TP3 : 2
  $

So now least returns a reference to the lowest value and we can overwrite it. I think this is iffy for a few reasons, the least of which is that it might be confusing as to what the function really does.

Another issue is that the type of each argument can be different and the values may undergo a narrowing conversion (by implicit cast) to different types with loss of precision in the less than comparison. For example, least ( 0.9, 2, 3, 4 ) will return 0.9, but least ( 1, 2, 3, 0.9 ) will return 0.

My preferred approach, however, is to decompose the problem â€“ weâ€™re reducing/folding according to some operator. So why not implement a fold/reduce template, pass the reduction operation as a parameter and let the compiler do the heavy lifting. Of course C++17 also has fold expressions.. but they only work for a handful of operators (+, -, * etc) and we can forsake the syntactic sugar of fold expressions and go further with a customisable reduction.

So letâ€™s create a template thatâ€™ll do our reduction and which takes the reducing functor as a parameter:

  template < typename OP> struct lfold {
  OP op;
  template <typename T, typename... Ts>
  constexpr const T operator()(const T & first,
    const T & second, const Ts&... rest)
  {
    if constexpr ( sizeof...(rest) == 0 )
      return op(first, second);
    else
      return operator()(op(first, second),
        rest...);
  }
  };

Weâ€™re returning by copy deliberately â€“ because op may not be returning a reference, and may be returning a temporary.

Weâ€™re also using an if constexpr so we only need one template. Note also weâ€™re taking two arguments at a time, and the adjacent arguments must have the same type, so they effectively all have to be the same type. Passing a series of ints and a double will cause a compile error, rather than a silent implicit conversion.

We could now write a functor that returns the lower of two values and be done with itâ€¦ instead letâ€™s write a functor that selects one argument according to a comparator that is provided as a parameter:

  template < typename CMP > struct selector {
  CMP cmp;
  template <typename T> constexpr T &
    operator()( T & first, T & second )
  {
    return (cmp(first,second)?first:second);
  }
  };

Ok, so now we can do something like this:

   using least =
    lfold < selector<std::less<void>> >;
  using most =
    lfold < selector<std::greater<void>> >;

Note the use of void as the template parameter â€“ since C++14, this means that the type deduction is deferred, so it means we donâ€™t have to provide an explicit type in the typedef.

Now that we have a general-purpose reduction template we can also do things like:

  using sumer = lfold < std::plus<void> >;
  using producter =
    lfold < std::multiplies<void> >;

And it gets crazier â€“ std::plus works on anything that has an operator+ overload, so we could even concatenate strings:

  using namespace std::string_literals;
  std::cout << sumer{}("a"s,"b"s,"c"s,"d"s)
    << '\n';

outputs abcd.

Note, though, that the order of summing is significant when the type is a stringâ€¦ which is why our reduction template is called lfold (left fold). A right fold would look like:

  template < typename OP> struct rfold {
  OP op;
  template <typename T, typename... Ts>
    constexpr const T operator()(
      const T & first, const T & second,
      const Ts&... rest)
  {
    if constexpr ( sizeof...(rest) == 0 )
      return op(second, first);
    else
      return op(operator()(second, rest...),
        first);
  }
  };

And with:

  using rsumer = rfold < std::plus<void> >;
  std::cout << rsumer{}("a"s,"b"s,"c"s,"d"s)
    << '\n';

would output dcba.

If the creation of a least, most, rsumer, sumer, etc. temporary (through the use of {} initialisation) is not acceptable then there may be mileage in calling the operator() from a variadic template constructor which takes the actual arguments, and the temporary object returns the result by a conversion operator. I prefer the approach above, however, because we can initialise the reduction operator and we can do things like this:

  struct ewma_op {
    double ewma;
    template <typename T> constexpr T
      operator()(const T & data1,
        const T & data2) {
      return (((1.0-ewma)*data1) +
       (ewma*data2));
      }
  };
  using ewma = lfold < ewma_op >;
  std::cout << ewma{0.1}(0.1,0.2,0.3,0.4)
    << '\n';

The last line calculates the exponential weighted moving average of the arguments with a smoothing co-efficient of 0.1.

The {0.1} is used to initialise the lfold::opâ€¦ which is also a struct that supports brace initialisationâ€¦ so that value is used to set ewma_op::ewmaâ€¦ and we can magically set the smoothing co-efficient.

And the coolest thing about this (if you supply literals, and depending on the types used) is that itâ€™s constexpr and the compiler will optimise out much of it, even finding the entire result at compile time and thereâ€™ll be no runtime cost.

You could also use lfold with a hash_combine_op that is initialised with a seed, then takes a list of arguments for reduction, calculates the hash (with a hash function specified as a template parameter), combines the hash value (see boost:hash_combine) with the seed and continues (thanks to lfold) through the list of arguments.

But for now, I think that solves the original problem.

References

[1] https://gcc.godbolt.org/

[2] https://en.wikipedia.org/wiki/Application_binary_interface

[3] https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64

[4] https://www.agner.org/optimize/calling_conventions.pdf

[5] https://software.intel.com/en-us/node/682402

[6] https://en.wikipedia.org/wiki/X86_calling_conventions

[7] https://en.wikibooks.org/wiki/X86_Disassembly/Calling_Conventions

[8] https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html

Commentary

The code in this critique relies on being able to sort a set of function arguments as if they were a range. As all our entries pointed out to greater or lesser extent, this assumption is flawed in a number of ways.

Firstly, the standard requires that a range refers to a single C++ object, so irrespective of the specifics of the implementation treating the separate objects as a range is undefined behaviour.

Secondly, the code is assuming that the arguments are contiguous in memory with increasing addresses. While this may be true in some cases â€“ for example on some platforms with un-optimized 32bit calls using the default calling convention â€“ it is not in general true. Some calling conventions push pass the arguments left-to-right, other push them right-to-left, and in many calling conventions the first few arguments are passed in registers and others are passed on the stack; if the address of one of the former arguments is taken then the register value will be copied into a memory location and its address will be used. This address may have no obvious relationship to the addresses of the arguments passed on the stack!

Thirdly, the code is assuming that the arguments are the same size as the amount of stack that they consume. So, even on a 32-bit platform, the original code was highly likely to fail if called with char arguments.

As everyone noticed, the code did too much work in that the whole range was sorted although only the single least value was required. Over the years I have noticed a number of places that used std::sort when only a small part of the sorted output was being used; there are other algorithms that can be used such as std::partial_sort.

One open question is what the code should do with arguments of different type. The code provided would accept them but do the wrong thing. As is common with these critiques, and is also the case for many of us in our working life, thereâ€™s no formal specification.

Postscripts to recent Code Critique competitions

Jason Spencer has sent us some reflections and updates to his entries.

Postscript to Code Critique 111

In an attack of involuntary stupidity, not only did I not comment on the sequence point issue in Code Critique 111 but â€“ even worse â€“ I made the mistake myself! I spotted it in the first pass when looking at the Code Critique, but then I context switched to other languages while doing the write-up. Itâ€™s down to an annoyance between many C-like languages â€“ they look similar, but they donâ€™t always act similar. Java, C# and Javascript would interpret:

  a = 42;
  a = ++a + a++;

as:

  a = 43 + a++;
  a = 43 + 43; // but a is 44 (because of the post-
               // increment) before it is assigned
               // the value of the summation
  a = 86;      // and a is now 86, overwriting the
               // 44 value in a

Thatâ€™s because they guarantee (to the best of my knowledge) to evaluate expressions from left to right once operator precedence has been taken into consideration. Because the assignment operator has a low precedence and right-to-left associativity the expression on the right of the operator will be computed first.

C and C++ do not guarantee the order in which expressions are evaluated.

The good news is that g++ flags this as undefined behaviour when invoked with the -Wall or -Wsequence-point switch, but clang++ and VS donâ€™t complain, even with -Wall /Wall.

In practice, it looks like this:

VC++	-	87
VC++	-O3	87
g++	-	87
g++	-O3	87
clang	-	86
clang	-O3	86
ICC	-	86
ICC	-O3	86

Alas, I donâ€™t have the compiler versions to hand, but you get the point â€“different compilers use different ordering of evaluation as itâ€™s not guaranteed by the standard. Check out godbolt.org to try the code snippet above.

Mistakes like this are a good reason to spread out code and program in a more defensive and explicit way, with the increment on a line of its own.

Postscript to Code Critique 109

While working towards a zipit::operator* implementation, I mimicked std::vector<bool>::reference by implementing a proxy object which acts like a reference.

Throwing caution to the wind, in a spasm of flippancy, I suggested an implementation of operator-> technically could (but itâ€™s really ugly and it shouldnâ€™t) return a (smart) pointer to a dynamically allocated proxy object.

vector<bool> does not do this, nor Boost.ZipIterator, and for good reason â€“ itâ€™s not only hideous, itâ€™s also against the specs.

It has come to my attention through a discussion between Niels Dekker and Anthony Williams on the accu-general mailing list that in fact the C++ specification requires operator-> to return a raw pointer, which would forbid the use of a proxy object. So the idea of returning a smart pointer to a proxy object is even more wrong.

Right, now I need to go and boil my head.

The winner of Code Critique 113

The critiques all identified the problem and proposed alternative approaches without the troublesome behaviour. I found Stewartâ€™s phrasing of the difference between the size of an int and the size of a stack word to explain why the code doesnâ€™t work on 64bits particularly clear,

I like Joe Woodâ€™s mention of using sanitizers to help debug the problem: there are a lot of good tools out there to help with identifying problematic code in an automated fashion.

Jason Spencer provided some very nice examples of where you might take this technique further (although the universal reference version of least may behave unexpectedly if the returned reference is to a temporary.) The suggestion of using godbolt is a good idea to help see the generated output in a readable format.

I liked James Hollandâ€™s final version of his solution to the problem, which simply provides a forward to the standard min function taking an initializer_list. This solution makes great use of pre-existing standard facilities.

Stewart provides a good dialogue as he works towards his solution, and I think many programmers would find this approach would not only help them to understand the solution, but also how one might reach such a solution.

After some deliberation on the strengths of the various entries, I have awarded this issueâ€™s prize to Stewart.

Code Critique 114

(Submissions to scc@accu.org by Dec 1st)

I am trying to write a simple test for some C++17 code, but I am getting some unexpected output. I have simplified the code quite a bit, and in the resultant code below I was expecting to see the output "A,B" but on the latest version of both gcc and MSVC I get just "AB". Where has my comma gone?! Even more oddly, if I use an older compiler I get different output: "65,66", which gives me back the comma but loses the letters. Has C++17 broken something?

The code is in Listing 2.

#include <iostream>
#include <string>
#include <vector>
template <typename T>
using coll = std::vector<T>;
// Start the collection
template <typename T>
void start(coll<T> &up)
{
  up.clear();
}
// End the collection
template <typename T>
void end(coll<T> &up)
{
  up.push_back({});
}
// Extract the non-zero data as a string
template <typename T>
std::string data(coll<T> const &up)
{
  std::string result;
  for (auto v : up)
  {
    if (v)
    {
      result += std::to_string(v);
      result += ",";
    }
  }
  result.pop_back();
  return result;
}
void test_char()
{
  auto i2 = coll<char>();
  start(i2);
  i2.push_back('A');
  i2.push_back('B');
  end(i2);
  // actual test code elided
  std::cout << data(i2) << '\n';
}
int main()
{
  test_char();
}

Listing 2

You can also get the current problem from the accu-general mail list (next entry is posted around the last issueâ€™s deadline) or from the ACCU website (http://accu.org/index.php/journal).

This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.

Roger Orr Roger has been programming for over 20 years, most recently in C++ and Java for various investment banks in Canary Wharf and the City. He joined ACCU in 1999 and the BSI C++ panel in 2002.

Notes:

More fields may be available via dynamicdata ..