Title: Letters to the Editor(s)

Author:

Date: 02 February 2004 21:55:35 +00:00 or Mon, 02 February 2004 21:55:35 +00:00

Summary:

Body:

Regarding "A more flexible container" by Rich Sposato, Overload 58 (December 2003)

Although it is certainly interesting to consider alternative implementations of standard containers, I think Rich has chosen the wrong solution to the problem.

What he want to do is to extract all members of a collection satisfying a given condition. In his driving example, he has a set of Employee records, and he wants to extract those elements of the container that matches a given surname, say. If one only ever wants to deal with sets of such records, Rich's solution is probably the best one can do. But if one later wants, as Rich does in the article, to perform a similar operation on another type of container, the limitations of his approach become clear.

Instead of modifying each container by defining a new one built upon the standard containers, it is much better to have a generic algorithm working with any container (standard or not).

So it seems to me, a better, or at least more generic, solution would be to define an algorithm working roughly like this:

result = filter(collection,comparator);
  // filter elements of a container

where result and collection are containers of the same kind and comparator is a function, or functor, taking an element and returning a Boolean value.

A template version would look something like this (in pseudo- C++ to show the structure more clearly)

template<class Coll, class Comp>
Coll filter(Coll coll, Comp cmp) {
  Coll res(0); // create empty result set
  for (iterator it=coll.begin();
       it != coll.end();
       ++it)
    if (cmp(it))
      res.insert(it);
  return res;
}

This will then work for any container having a standard iterator interface and an insert() member function as well as for any cmp having an operator() defined taking a container element of the right type as argument and returning something which can be interpreted as a Boolean.

All the extra work now goes into defining the comparator function or functor, which is probably precisely where the effort should be concentrated anyway.

While this solution is simpler and more generic than Rich's, it is not purely object oriented. In fact, it wouldn't be possible to carry it over directly to Java or C#. Instead, it is an example of what one might call a "functional programming design pattern". The filter algorithm above is a C++ analogue of a so-called "higher order function" in functional programming (FP), i.e. a function taking another function as argument. Actually, very often OOP design patterns can be translated into FP higher order functions.

The particular higher order function used here, filter, exists in all major FP languages (Lisp, Haskell, ML) and in some other multi-paradigm ones (Perl, Python).

What this illustrates is the idea that, however powerful a concept, OOP just isn't the right solution in all circumstances. Luckily, C++ is a multi-paradigm language, supporting not just OOP and C-style procedural programming, but also FP-style programming as above. That being said, C++ has a very, very strong support for efficient OOP code, allowing most compilers to make very efficient assembler code. In contrast, the FP-aspects, although introduced with the STL, have received less attention, and not all compilers will generate efficient code in these cases. This, however, may be compensated by the optimisations made for the standard containers.

On the other hand, such an FP-like solution is easier to maintain, the code is concentrated in one place instead of being duplicated in all containers. Moreover, the FP-like solution above is more generic. Suppose, for instance, that you defined an iterator interface for files, with the "elements" being the individual lines. The filter algorithm can then be turned into a utility like UNIX's grep, extracting lines meeting a specific criterion.

Whether to put something into a member function or not is a difficult question to answer in general. All sorts of issues may be involved: design principles, maintainability, and efficiency etc. My own pragmatic rule of thumb is, if the same code is copied with very little change in many classes, it is probably worth considering putting it outside, either in a special class or as an algorithm.

Rich Sposato's Reply

My article had two purposes. One was to create associative containers that allow searching using a type other than the key type. The other goal was to remove the unnecessary objects used to search through the standard associative containers. Creating a generic algorithm to find all elements of any container type that satisfy a predicate was not my intention. Indeed, such an algorithm is missing from the STL, but I will get to that later. First, let us look at the associative containers and the generic search algorithms.

The STL's stand-alone search (binary_search, lower_bound, upper_bound, and equal_range) algorithms provide logarithmic complexity only for random-access iterators, and linear complexity for all other iterator types. The standard associative containers are typically implemented using red-black binary trees, and with their bi-directional iterators, they are not served well by these functions. I find it ironic that associative containers were designed for logarithmic complexity but their iterators only provide linear complexity with the STL algorithm functions, while three of the sequential containers (vector, string, and deque) have a linear internal arrangement, but their iterators provide logarithmic complexity with the same STL algorithms. The STL's stand-alone count and find functions provide linear complexity for all iterator types. So that irony does not apply to these functions.

These associative containers have their own find, count, lower_bound, upper_bound, and equal_range member functions, because as members, they can use the internal structure of the containers to provide greater efficiency than linear complexity. The set's operations require only logarithmic complexity. These functions should be used instead of the generic stand-alone functions by the same names. (See item #44 in Scott Meyer's book: "Effective STL".)

Frank Antonsen's example for filtering elements from a container is not the most generic solution for C++. His filter function requires the collection to have a constructor that accepts a single integer parameter; a constraint not met by five STL containers. This example is not truly container-independent code. (See item #2 in Scott Meyer's book: "Effective STL".) By acting on a container rather than an iterator, the filter will not work with arrays - a requirement of the STL algorithms. This points out another irony: algorithms that act only on iterators are more likely to provide "container-independent" code than algorithms that act on containers. As Raoul Gough eloquently explained in his article ("Choosing Template Parameters" also in Overload issue #58), picking the correct template parameter types can make all the difference between a flexible and inflexible class or function.

Using the std::copy function as an example, we can make a generic filter named copy_if that uses a predicate. The filtering function will accept an iterator type as a template parameter, and use the typename keyword instead of the class keyword within the template specification. Since copy_if will always have linear complexity no matter which type of iterator or container uses it, developers might be better off using std::set::equal_range and then calling std::copy. The copy_if algorithm fulfills all the abilities of Frank's filter function with fewer constraints.

The copy function's signature is:

template< typename InputIter,
          typename OutputIter >
OutputIter copy(InputIter first,
                const InputIter& last,
                OutputIter output );

The copy_if function would have a signature of:

template< typename InputIter,
          typename OutputIter,
          typename Predicate >
OutputIter copy_if(InputIter first,
                   const InputIter& last,
                   OutputIter output,
                   Predicate pred );

Alas, the STL has no copy_if function, although it has copy and copy_backward. This omission becomes more obvious when considering that other functions such as find, count, remove, and remove_copy have predicated versions named find_if, count_if, remove_if, and remove_copy_if. The implementation for copy_if is trivial, and in a time honoured tradition, I shall leave that as an exercise to the reader.

Other languages may have filter functions that act as Frank described them, but my article was not about making generic filters for C++ or any other language.

Notes:

More fields may be available via dynamicdata ..