Title: The Universality and Expressiveness of std::accumulate

Author: Martin Moene

Date: 04 December 2015 09:43:44 +00:00 or Fri, 04 December 2015 09:43:44 +00:00

Summary: Folding is a highly generic operation available through std::accumulate. Paul Keir goes beyond reduction, with the help of C++14â€™s polymorphic lambdas.

Body:

Graham Huttonâ€™s 1999 monograph: A tutorial on the universality and expressiveness of fold [Hutton99] provides a fascinating and eminently readable analysis of an elementary reduction operator from functional programming: fold. Huttonâ€™s tutorial is punctuated by increasingly compelling examples, written in Haskell, of the use of fold to implement common list operations, including summation; concatenation; reversal; function composition; left folds; and concluding with the Ackermann function. The fold combinator is directly comparable to the C++ standard library function: std::accumulate. In this article, I focus on the fold examples from Hutton, and present equivalent generic C++ incarnations of each; making thorough use of appropriate C++14 library and language features, including polymorphic lambda functions¹.

The four-parameter accumulate function constructs a result from the repeated pairwise application of a given binary function, to elements supplied from both an iterator range; and a single â€œzeroâ€ value, having the type of the result. A canonical example is summation; the value returned from the call to accumulate in listing 1 is 10.

int xs[]{1,2,3,4};
accumulate(cbegin(xs), cend(xs), 0, plus<>{});

Listing 1

Huttonâ€™s first four examples use fold to implement arithmetic and logic operations using built-in functions to specify the relevant binary operations. Similar function objects are provided by the C++ standard library within the <functional> header. The example above demonstrates the use of plus<> for operator+; while operator*, operator&&, and operator|| correspond to multiplies<>, logical_and<>, and logical_or<> respectively. Note that C++14 conveniently defines a default template argument for such function objects; std::plus<> invokes a specialisation which infers the relevant types from its arguments.

Accommodating std::accumulate

A binary operator is said to be associative when an expression involving a sequence of its applications produces the same result, regardless of the order of evaluation. The four C++ function objects from the previous section all denote associative operations. Consider addition: both (1 + 2) + 3 and 1 + ( 2 + 3 ) produce the same result; 6. Operator precedence is irrelevant when faced with a ⊕ b ⊕ c; the question is whether to evaluate from the left; or from the right. In particular, which order does accumulate use? It uses a left ordering.

An elementary non-associative binary operation is subtraction. The call to accumulate in listing 2 would then produce a result equal to (( 0 - 1) - 2 ) - 3, i.e. - 6; evaluating from the left. In contrast, an evaluation ordered from the right, say 1- ( 2 - ( 3 - 0 )), produces a result of 2. Alas, the remaining examples from Hutton [Hutton99] firmly assume the fold operation evaluates from the right.

vector<double> xs{1,2,3};
accumulate(cbegin(xs), cend(xs), 0, minus<>{});

Listing 2

Producing a result from accumulate equal to a right fold requires two interventions: we need the order of traversal across the container elements reversed; and for the order of the arguments given to the binary operation to be switched². Listing 3 defines such a wrapper for accumulate, called foldr. The C++14 functions crbegin and crend return const iterators to the reverse beginning and reverse end of the container argument c. Meanwhile, the flip function, uses std::bind to reverse the argument order for the binary function object; e.g. flip(minus<>{})(0,1) evaluates to 1; not - 1.

template <typename F>
auto flip(const F &f) { return bind(f,_2,_1); }

template <typename F, typename T, typename C>
T foldr(const F &f, const T &z, const C &c) {
  return accumulate(crbegin(c), crend(c), z,
    flip(f));
}

Listing 3

The definition of foldr in listing 3 removes the need to call crbegin, crend and flip. It also allows a single container argument to drive the C++ fold; much as with C++ ranges [Niebler15]. This allows listings here to remain concise; while also facilitating a comparison of the syntactical differences between Haskell and C++. We can now invoke a right fold. Assuming `C` creates an arbitrary standard sequence container, with inferred value type³, the call to foldr in listing 4 returns the integer 2.

foldr(minus<>{}, 0, C(1,2,3))

Listing 4

Non-reducing folds

Using a fold to concatenate two containers first requires a small helper function, which should return a new container, by adding a single element to the front of an existing one. Haskell provides the (:) operator for this job. Listing 5 defines this using its traditional Lisp name: â€œconsâ€.

auto cons = [=](auto x, auto xs) {
  decltype(xs) ys{x};
  copy(begin(xs), end(xs), back_inserter(ys));
  return ys;
};

Listing 5

Like subtraction, the cons function is non-associative; and non-commutative. cons though, expects two different argument types. Listing 6 provides foldr with cons as the binary function, and an empty container as the â€œzeroâ€ or starting value; to define an identity fold. That is, id(C(1,2,3)) will return a container object of the same type; with the same 3 elements. Meanwhile, the genericity of C++ allows a similar invocation which only changes the container type: foldr(cons, list<int>{}, C(1,2,3)).

auto id = [=](auto xs) {
  return foldr(cons, decltype(xs){}, xs);
};

Listing 6

To append one container to another, listing 7 again uses cons for foldrâ€™s first argument; while providing the second, a container, as its â€œzeroâ€ value. Note that while the elements of, say, append(C('a'),C('b')) and C('a','b') are equal, so too are they equal to append(C<list>('a'),C<vector>('b')); as the definition is sufficiently generic.

auto append = [=](auto xs, auto ys) {
  return foldr(cons, ys, xs);
};

Listing 7

Folding with lambda arguments

The three functions⁴ of listing 8 provide each corresponding foldr invocation with a binary lambda function, as, like Haskell, no equivalents exist within the standard library. The length function returns the size of its container argument, using a lambda function with unnamed first argument. Both reverse and map return a container⁵; with map utilising the closure aspect of lambda expressions to capture f.

auto length = [=](auto xs){
  return foldr(
    [=](auto, auto n){ return 1+n; },
    0,
    xs);
};
auto reverse = [=](auto xs){
  return foldr(
    [=](auto y, auto ys)
      { return append(ys,decltype(xs){y}); },
    decltype(xs){},
    xs);
};
auto map = [=](auto f, auto xs){
  return foldr(
    [=](auto y, auto ys){ return cons(f(y),ys); },
    vector<decltype(f(*xs.begin()))>{},
    xs);
};

Listing 8

Tuples allow a single fold to perform more than one calculation. For example, listing 9 defines a function which returns both the size of a container, and the sum of its elements⁶.

auto sumlength = [=](auto xs){
  return foldr(
    [=](auto n, auto p){
      return make_pair(n + p.first, 1 + p.second);
    },
    make_pair(0,0),
    xs);
};

Listing 9

Functions from folds

The result of applying the composition of two functions f and g to an argument x can be achieved in C++ with the expression: f(g(x)). In Haskell an elementary binary operator, (.), can also be used; accepting two functions as arguments, and returning a new function representing their composition. In listing 10, the foldâ€™s binary function argument is a comparable lambda expression for composition. The result of invoking compose with a container of callable elements is a function object representing the chained composition of each function in sequence. The â€œzeroâ€ argument of foldr uses a simple lambda identity function; though notice it is wrapped by the type of the container element: an instantiation of std::function. Why? While the type of each lambda expression is unique, the type of each container element is the same. std::function provides exactly the required homogeneity; each lambda accepting and returning say an int, becomes simply std::function<int(int)>. The â€œzeroâ€, meanwhile, needs the same wrapper, as it provides the type of the foldâ€™s result.

auto compose = [=](auto fs){
  using fn_t = typename decltype(fs)::value_type;
  return foldr(
    [=](auto f, auto g)
      { return [=](auto x){ return f(g(x)); }; },
    fn_t([=](auto x){ return x; }),
    fs);
};

Listing 10

One of the most intriguing functions capable of definition by a right fold, such as our foldr, is a left fold. Listing 11 provides such a definition. As before, an identity function is required for the foldâ€™s starting value, and again this wild lambda needs the guiding hand of std::function; though the type in question is calculated in a different manner. Unlike compose, the function object returned by foldr is invoked within foldl; upon z. Our journey has brought us full circle to a left fold, akin to std::accumulate; an invocation such as foldl(minus<>{}, 0, C(1,2,3)) will produce -6; much as listing 2.

auto foldl = [=](auto f, auto z, auto xs){
  using z_t  = decltype(z);
  using fn_t = std::function<z_t(z_t)>;
  return foldr(
    [=](auto x, auto g){
      return [=](auto a){ return g(f(a,x)); };
    },
    fn_t([=](auto x){ return x; }),xs)(z);
};

Listing 11

One last comment regarding left and right folds: should you ever be in the embarrassing situation of being uncertain of the handedness of your fold definition, the expression in listing 12 could be useful. Simply replace fold with either foldr or foldl; for a true or false evaluation respectively.

fold([](auto x, auto){ return x; },
  false, C(true))

Listing 12

Our final fold example, and so too in [Hutton99], is Ackermannâ€™s function [Ackermann28]. Ackermannâ€™s function is commonly cited as a recursive function which is not primitive recursive in a first-order programming language. John Reynolds, however, demonstrated [Reynolds85] that the function is primitive recursive in a higher-order programming language. The C++ implementation in listing 13 includes similar idioms to previous examples, but is given additional treatment to avoid the use of currying seen in Huttonâ€™s definition. While the binary lambda function provided to the innermost fold in the curried original appears unary, Î» y → g, the C++ version must be uncurried: Î» y as → g(as). Note too, that these Ackermann folds encode the traditional natural number arguments within the size of the input and output container values.

auto ack = [=](auto xs, auto ys){
  using ys_t = decltype(ys);
  using fn_t = std::function<ys_t(ys_t)>;
  return [=](auto zs){
    return foldr(
      [=](auto, auto g){
        return [=](auto ws){
          return foldr(
            [=](auto, auto as){ return g(as); },
            g(ys_t{1}),
            ws
          );
        };
      },
      fn_t([=](auto bs){ return cons(1,bs); }),
      zs
    );
  }(xs)(ys);
};

Listing 13

Summary

C++ lambda functions, including the polymorphic variety now available in C++14, allow the generic fold operation supported by std::accumulate to extend well beyond simple reductions. While a complex fold can be less readable or idiomatic than the traditional form, the approach can be refined to construct and transform programs, as well as prove specific program properties; while building on established software engineering principles of reuse and abstraction.

The article places less emphasis on performance considerations, instead focusing on pedagogy and algorithmic aspects; while maintaining parity with the equivalent Haskell, with consistent use of auto for type inference.

C++17 will introduce fold expressions [Sutton14]. Here, a finite set of operators, will share the brevity of the comma in pack expansion; consider * versus std::multiplies<>{}. One restriction is the variadic template pack length must be known at compile-time.

Appendix

All examples, along with code for dropWhile, filter and other folds are available at https://bitbucket.org/pgk/accumulate.

References

[Ackermann28] W. Ackermann. Zum Hilbertschen Aufbau der reellen Zahlen Mathematische Annalen, 1928.

[Bird88] R. Bird and P. Wadler. Introduction to Functional Programming Prentice Hall, 1988.

[Gill93] A. Gill, J. Launchbury and S.P. Jones. A Short Cut to Deforestation. ACM Press, 1993.

[Hutton99] G. Hutton. A tutorial on the universality and expressiveness of fold Cambridge University Press, 1999.

[Niebler15] E. Niebler. Working Draft, C++ extensions for Ranges. The C++ Standards Committee, 2015.

[Reynolds85] J.C. Reynolds. Three approaches to type structure Springer-Verlag, 1985.

[Sutton14] A. Sutton and R. Smith. Folding expressions. The C++ Standards Committee, 2014.

Note that most of these fold examples can also be found in [Bird88] or [Gill93].
This is described as the third duality theorem in [Bird88, p68].
Assume a vector default; i.e. C(1,2) becomes: C<vector,int>(1,2).
A definition of filter from [Hutton99] appears in the appendix.
Note that map returns a vector object here solely for brevity.
A similar tuple-based definition of dropWhile appears in the appendix.

Notes:

More fields may be available via dynamicdata ..