Journal Articles

Overload Journal #117 - October 2013 + Programming Topics + Design of applications and programs
Browse in : All > Journals > Overload > o117 (7)
All > Topics > Programming (877)
All > Topics > Design (236)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: C++ Range and Elevation

Author: Martin Moene

Date: 02 October 2013 22:38:59 +01:00 or Wed, 02 October 2013 22:38:59 +01:00

Summary: C++ provides many features for higher-level programming, but lacks some common ones present in other languages.

Body: 

Back in 2009, Andrei Alexandrescu gave a presentation at the ACCU Conference about Ranges. The short version is that although you can represent a range in C++98 using a pair of iterators, the usage is cumbersome for most simple tasks. Even for a simple loop, using iterators can be a trial.

  for( vector< int >::iterator pos = data.begin();
    pos != data.end(); ++pos )
  {
    .... do interesting things with ints
  }

The C++ Standard Library provides a simple enough algorithm which improves on this a bit, but you need to provide the action to be performed as a functor argument, thus losing the locality of reference for that action.

  for_each( data.begin(), data.end(), action );

And all that’s just using the iterators the C++ Standard Library gives you; defining your own iterator is notoriously complex. So, Andrei introduced a much simpler – and more powerful – abstraction: the Range [Alexandrescu09].

There have been a few attempts to implement Andrei’s ideas in C++ (he implemented them for D, and they form the basis of the D Standard Library) but for one reason or another, it just hasn’t caught on. Part of the reason for that is that in order to take advantage of this new abstraction as Andrei envisioned it, you need a rewrite of the Standard Library that uses Ranges instead of Iterators. For some reason, there seems to be little appetite for this. Some implementations have arisen that are interoperable with C++ Standard Library algorithms [Boost], [Github1], but even they appear to not have had as much traction in the C++ community at large as might have been hoped. Similarly, there have been proposals to the C++ Standards effort [Standards], but still, not much apparent interest in something that is little more than a thin wrapper around the existing C++ container types which are, after all, really just iterator factories.

Part of the reason Andrei himself implemented his idea of Ranges in D, rather than C++, was that C++ at the time didn’t provide good enough language support to make it straightforward. In particular, there was a C++ proposal at the time for auto variable declarations, which would have been crucial, but had experimental support in only one widely used compiler. Now, C++ officially has auto, and it’s very widely supported. But not a widely-used – let alone standard – range type.

All of this is highly relevant, of course, but misses an important point: what do these range types actually achieve? What problem are they attempting to solve?

Time for a quick segue into a different world....

The cross-pollination conundrum

It’s common knowledge among experienced programmers that an intimate understanding of a few different languages (and a possibly less intimate knowledge of many) is a good thing. Techniques from one language can inform and inspire neat solutions to problems in other languages. It’s also common knowledge among experienced programmers that idiomatic features of one language are not necessarily transferable to other languages, and that doing so can result in code that is truly incomprehensible to its readers. With both of those things in mind, I want to explore a little modern C# idiom: IEnumerable, the C# Iterator.

This interface is what permits the foreach loop in C#:

  foreach( var item in container )
  {
      .... do interesting things with items
  }

C# has had IEnumerable from the very beginning, although it’s undergone a few revisions over the years.

IEnumerable forms the basis of a much higher-level abstraction than merely accessing the contents of containers, however. It underpins all the functionality of LINQ1, introduced in Visual Studio 2005 with .Net 3.5, and builds on one key feature of .Net 2.0 – the yield keyword, which creates an IEnumerable on demand (actually, it creates an implementation of IEnumerator<T>, which is the real iterator type). This facility means that iterating over an IEnumerable is lazy – access to an element isn’t performed until it’s asked for. In C#, this is referred to as Deferred Execution.

 
 var results =008512"
   container.Where( item => item.Id == expected )
   .Select( filtered => filtered.Count.ToString() )
   .Take( 2 );

The reason lazy access is important is that no matter how many elements container has, the clauses for Where and Select will be called a maximum of 2 times. Obviously, this is significant if container has 20 million items in it.

So what has all this to do with C++? In the first case, the reason that IEnumerable works as an interface in C# is that it is implemented by all the standard containers, and is in fact very simple to implement for your own container types. C++ has no such interface, and in fact, there is no actual relationship – inheritance or otherwise – between the standard containers, or their iterator types. In the second case, does this entirely idiomatic C# translate at all into C++? Or does it make for an incomprehensible mess? That is the nature of the cross-pollination conundrum.

The missing lin[qk]?

It should be clear that IEnumerable in C# has much more in common with Andrei Alexandrescu’s vision of a Range than it does with C++ iterators – or even iterator pairs. C++11 introduces many language features which allow a neat syntax for it, such as lambda and type deduction facilities. Suppose there were a range type in C++; would it on its own enable the implementation of something like Select or Where from C#? What might that look like?

  auto result =
     container.where( [&]( const thing & t )
     { return t.name == expected; } )
     .select( []( const thing & t )
     { return to_string( t.count ); } )
           .take( 2 );

It’s not inconceivable.

But.

What type is container? We could implement a single type that has where, select, take and all the other things needed as member functions. What really makes C#’s IEnumerable work is the existence of extension methods. The power of that mechanism really shines through when you need to write your own function that fits in with other LINQ facilities, and C++ has no analogue for that.

A much neater idea would be closer to the Alexandrescu range concept whereby select returns a specific kind of range, and where returns a different kind of range. The chaining of those operations together as shown above would still require some common base interface, and would still be hard to extend.

Lastly there is also the question of lazy evaluation (or Deferred Execution) and efficiency. It’s hard to see how to make lazy evaluation safe in the above scenario, without resorting to passing functions around under the hood and capturing state. As for efficiency, this idea of a single base class is raising the spectre of allocating stuff on the heap (gasp!), and too many C++ programmers have got used to the flexibility and efficiency of templates and type deduction of iterators to give it up that easily.

So, there are questions.

Perhaps we should wind back our expectations a little, start with something very, very simple, and see if it can be implemented. Then we can look to see if we can build on small beginnings to do something more elaborate.

So. Where do we start?

Write a test

For the purposes of testing examples, assumptions and results, I’m using Catch [Github2] because it’s easy to read, clear to write and not too verbose.

It should be obvious by now that the first step is to define a very simple and lightweight range type, upon which we can somehow ‘hang’ all of the operations we need. This range should be trivially initialisable from some standard container. I’m going to make a conceptual leap here, because it’s clear the range type needs a way to access the ‘current’ element, and to move forwards one position. With these operations, it’s possible to make a simple check that the ‘contents’ of the range match the original data. Andrei Alexandrescu asserted that the pointer-like interface for iterators is a Bad ThingTM, but I think it makes for a neat syntax, so I will stick with it. (See Listing 1.)

TEST_CASE( "Range is constructable from standard
  collection and is iterable",
  "[range][init][iterable][stl]" )
{
  std::vector< int > data { 1, 2, 3 };
  auto range = make_range( data );
  REQUIRE( *range++ == data[ 0 ] );
  REQUIRE( *range++ == data[ 1 ] );
  REQUIRE( *range++ == data[ 2 ] );
}
			
Listing 1

What is required to make this test compile? The most obvious thing is make_range – is that a function or a class? In order to make it as general as possible, it should (obviously) be a template, and to take full advantage of type-deduction it should be a function returning....what? Some type which exhibits the right interface. With just the information in the test, we can sketch it out. Note that, with the use of auto, the actual type is never named. This isn’t a crucial observation, but does give us a lot of leeway on the choice of name. I have been unimaginitive, however. (See Listing 2.)

template< typename iterator_type >
class iterable_range
{
  public:
    iterable_range( iterator_type begin,
                    iterator_type end )
      : pos{ begin }, end_pos{ end } { }
    auto operator*() const -> 
      typename iterator_type::value_type
    {
      return *pos;
    }
    auto operator++( int ) -> iterable_range
    {
      iterable_range tmp{ *this };
      ++pos;
      return tmp;
    }
  private:
    iterator_type pos, end_pos;
};
template< typename container_type >
auto make_range( const container_type & ctr ) ->
   iterable_range
   < typename container_type::const_iterator >
{
  return iterable_range
    < typename container_type::const_iterator >
  { begin( ctr ), end( ctr ) };
}
			
Listing 2

There is nothing particularly startling about this. The iterable_range class just squirrels away a pair of iterators (the observant will already have noticed that end isn’t used, but its purpose should be obvious!), and the make_range function is really just a convenience for creating an instance of the class. The iterable_range is a kind of ‘proto-range’; it’s not terribly useful on its own, but it does form the basis for other things.

True to type

It’s already time to do something a bit harder. Listing 3 is another test.

TEST_CASE( "Transformation of elements results in
   new range leaving originals intact",
   "[range][transform]" )
{
  std::vector< int > data { 1, 2, 3 };
  auto range = make_range( data );
  auto result = select( range, []( int i ) {
    return std::to_string( i ); } );
  std::string expected[] = { "1", "2", "3" };
  REQUIRE( *result++ == expected[ 0 ] );
  REQUIRE( *result++ == expected[ 1 ] );
  REQUIRE( *result++ == expected[ 2 ] );
  REQUIRE( *range++ == data[ 0 ] );
  REQUIRE( *range++ == data[ 1 ] );
  REQUIRE( *range++ == data[ 2 ] );
}
			
Listing 3

I already mentioned the idea of having different range types, e.g. select provides a ‘selecting range’, and where provides a ‘filtering range’. The line auto result = select( ... now needs some function select, and something to return – a transforming range type. It’s going to need the same basic operations as the iterable_range, and it needs to operate on an underlying range. The operations on the new range defer to that underlying range type – which in this case is an iterable_range.

template< typename range_type,
  typename transformation >class transforming_range
{
  public:
    transforming_range( range_type r,
      transformation fn ) : r{ r }, fn{ fn } { }
  private:
    range_type r;
    transformation fn;
};

The incrementing operator should be straightforward enough – it needs to increment the underlying range object and return a copy of its previous self.

  auto operator++( int ) -> transforming_range
  {
    transforming_range tmp{ *this };
    r++;
    return tmp;
  }

What about the dereference operator? It needs to call the transformation function fn with the current element, and return its result.

  auto operator*() const -> ???
  {
    return fn( *r );
  }

C++11 provides an army of tools for determining the types of things at compile time for situations like this. The one with the most visibility, decltype, provides the declared type of an expression – including the result of calling a function.

  auto fn( int ) -> bool { return true; }
  auto fn( double ) -> int { return 10; }
  auto type = decltype( fn( 10 ) );
  // type is bool - type returned if fn were
  // called with an int

This makes determining the result of operator*() very simple. The only restriction on this use is that both fn and r need to have already been ‘seen’, which leads to the need to declare them before they are used in the decltype expression.

  private:
    range_type r;
    transformation fn;
	
  public:
    auto operator*() const -> decltype( fn( *r ) )
    {
      return fn( *r );
    }

I normally much prefer to declare classes with the public section at the top, where it’s most obvious and visible. However, it seems a fair trade in this case to allow the use of decltype in such a simple fashion. The alternative is much worse!2

With this addition, the transforming_range class should pass all the tests, after we add the now-trivial select function. (See Listing 4.)

template< typename range_type,
          typename transformation >
auto select( range_type r, transformation fn ) -> transforming_range< range_type, transformation >
{
  return transforming_range< range_type,
         transformation >{ r, fn };
}
			
Listing 4

Just the good ones

Time for another test (Listing 5).

TEST_CASE( "Filtering elements contains just the matches", "[range][filter]" )
{
  std::vector< int > data { 1, 2, 3 };
  auto range = make_range( data );

  auto result = where( range, []( int i ) {
    return i % 2 != 0; } );

  REQUIRE( *result++ == data[ 0 ] );
  REQUIRE( *result++ == data[ 2 ] );
}
			
Listing 5

With all the funky stuff learned implementing the transforming_range, implementing a filtering_range to be returned by a where function should be pretty straightforward (Listing 6).

template< typename range_type, 
          typename unary_predicate >
class filtering_range
{
  private:
    range_type r;
    unary_predicate fn;
  public:
    filtering_range( range_type r,
       unary_predicate fn ) : r{ r },
       fn{ fn } { }
    auto operator*() const -> decltype( *r )
    {
      return *r;
    }
    auto operator++( int ) -> filtering_range
    {
      filtering_range tmp{ *this };
      r++;
      while( !fn( *r ) ) r++;
      return tmp;
    }
};
			
Listing 6

The cleverness is all in operator++(), which keeps incrementing until the predicate is false. The where function is simplicity itself – almost identical to select, just returning a different range type.

  template< typename range_type, typename filter >
  auto where( range_type r, filter fn ) ->
    filtering_range< range_type, filter >
  {
    return filtering_range< range_type, filter >{
      r, fn };
  }

Run the tests....and they all pass. Time for the next one....

Wait a moment!

Once again, the observant among you will have already spotted the completely fatal flaws in that code. Just to labour the point, Listing 7 is a test that exposes one of them.

TEST_CASE( "Filtering range returns correct matches when first element is a mismatch", "[range][filter][empty]" )
  {
    std::vector< int > data { 2, 3, 4 };
    auto range = make_range( data );

    auto result = where( range, []( int i ) { return i % 2 != 0; } );

    REQUIRE( *result++ == data[ 1 ] );
  }
			
Listing 7

The problem here is that the filtering range is fine if the first element in the range matches the predicate. In any other case, it fails the test. Hopefully, this starts a chain of thought leading to questions like ‘what if no elements match?’ and then ‘what if the initial range is empty?’. Now we know why the iterable_range has an end iterator! We need some sort of check that the range object is valid – that it hasn’t run out of elements. Fortunately, it is trivial to implement a safe conversion to bool on all three of our range types. For iterable_range, it returns false if pos==end_pos and true otherwise. For transforming_range and filtering_range, it simply returns the value for the underlying range. C++11 provides an explicit conversion operator – which means the new function won’t take part in any arithmetic or otherwise unsafe operations.

  explicit iterable_range::operator bool() const 
    { return pos != end_pos; }
  explicit transforming_range::operator bool()
    const { return !!r; }

The slightly odd !!r says what it does: ‘not not’, invoking the underlying range’s bool conversion. Because the conversion is explicit, just return r; won’t work, of course!3

With the addition of this rather important facility, we can add a helper function to filtering_range called find_next_match.

  void find_next_match()
  {
    while( r && !fn( *r ) )
      ++r;
  }

This function is invoked by operator*() (thus making filtering_range lazy-evaluated), so finds the first matching element. The increment operator also needs to invoke it to ensure a range with no further matches becomes invalidated.

This implies that client code must first check that the range is valid before invoking operator*()4, and so operator bool() must also invoke it in order to detect a range with no matching elements.

Time for some more tests! (Listing 8)

TEST_CASE( "Filtering range is immediately
  invalid when no elements match",
  "[range][filter][nomatch]" )
{
  std::vector< int > data { 2, 3, 4 };
  auto range = make_range( data );
  auto result = where( range, []( int ) 
  { return false; } );

  REQUIRE( ! result );
}

TEST_CASE( "Filtering range returns correct
   matches when first element is a mismatch",
   "[range][filter][empty]" )
{
  std::vector< int > data { 2, 3, 4 };
  auto range = make_range( data );
  auto result = where( range, []( int i ) 
  { return i % 2 != 0; } );

  REQUIRE( !!result );
  REQUIRE( *result++ == data[ 1 ] );
}
			
Listing 8

Whilst we’re at it, we’ll add a prefix operator++() to all the range types, too (did I manage to slip that one past you in the implementation of find_next_match?), since efficiency is one of our design principles.

For filtering_range, that operator is used by the postfix version, and looks like the following:

  auto operator++() -> filtering_range &
  {
    ++r;
    find_next_match();
    return *this;
  }

Now we’re really cooking. It must be time to turn the world upside down yet again.

Joined up

It should be fairly clear how to implement take using the techniques already discussed here. What’s missing is the ability to chain expressions together. In fact, our existing API allows a limited form of composing expressions.

  auto result = select( where( []( int i ) {
    return i % 2 != 0; } ),
  []( int i ) { return i * 2; } );

This is somewhat unwieldy, however, especially when composing more than a small number of expressions.

As already noted, we could add a common base class declaring all the high-level functionality like select, where and so on. The trouble with this approach is that it’s inflexible; it would be difficult to extend with new operations.

It’s not possible to overload operator.() in C++, so directly mimicking the syntax of extension methods is out, but there are other operators we could use. There is something appealing about hijacking the ‘pipe’ operation common in filesystem operations. Time for another test. (Listing 9.)

TEST_CASE( "Range results can be composed using
   simple syntax", "[range][composition]" )
{
  std::vector< int > data { 1, 2, 3 };
  auto range = make_range( data );

  auto result = range 
  | where( []( int i ) 
    { return i % 2 != 0; } )
  | select( []( int i ) 
    { return i * 10; } );

  REQUIRE( *result++ == data[ 0 ] * 10 );
  REQUIRE( *result++ == data[ 2 ] * 10 );
  REQUIRE( ! result );
}
			
Listing 9

The main thing to note here is that the signatures of the range functions select and where have changed – they no longer take a range object – almost as if the range is being passed in through some ‘standard input’. This suggests some global operator, perhaps like this:

  template< typename left_range_type,
            typename right_range_type >
  auto operator|( left_range_type left,
                  right_range_type right ) -> ???
  {
    ???
  }

The question is – what should it return? Perhaps it would be simpler to see the solution to this using actual named funtions instead of overloaded operators.

  auto result = 
  range.apply( where( []( int i )
             { return i % 2 != 0; } ) )
       .apply( select( []( int i ) 
             { return i * 10; } ) );

This is similar enough to actual chaining, there appears to be some merit to following it to see where it leads.

The first obstacle is this redefinition of select and where. To take where as an example, it cannot construct an instance of filtering_range, because that type requires an underlying range object on which to operate. The range object isn’t supplied until apply is called – whatever that is. It looks from here very much like a member function common to all range types. Given that where can’t return a filtering_range, but must capture the filtering predicate somehow, another intermediate type is indicated. This intermediate is what gets passed to apply, and is a factory for the real range type. The apply function then invokes that factory to create a real range type.

One approach to this might be to have a separate intermediate factory for every range type; this would certainly make the implementation simple: the filtering_range would have a filtering_range_factory, and the implementation of that would know how to construct a filtering_range given an underlying range object to construct it with.

However, for the cases in this example at least, there is a more general solution. Once again we look to Andrei Alexandrescu, and make use of a simple policy template. The where function ‘knows’ it requires a filtering_range, but has insufficient data or template parameters to actually create one. If the filtering_range_factory makes use of a template-template parameter, it can create a filtering_range when all the parameters required are available.

This can be generalised out to a factory that can create any range type that takes two template parameters. (See Listing 10.)

template< template< typename, typename > 
  class range_type, typename expression_type >
  class range_factory
{
  public:
    range_factory( expression_type action ) :
      action{ action } { }
    template< typename range_of >
    auto operator()( range_of r ) const 
      -> range_type< range_of, expression_type >
    {
      return range_type< range_of,
        expression_type >{ r, action };
    }
  private:
    expression_type action;
};
			
Listing 10

Now, the where function instantiates the factory with the required range type (filtering_range), and the predicate function to be captured.

  template< typename unary_predicate >
  auto where( unary_predicate fn )
    -> range_factory< filtering_range,
       unary_predicate >
  {ID="pgfId-1007831">
    return range_factory< filtering_range,
           unary_predicate >{ fn };
  }

The select function can do the analogous operation using transforming_range.

With this information, we can now implement the apply function. Our original vision was to replace apply with an overload of operator|(). Whilst apply needed to be a member function to allow chaining calls together, the overloaded operator does not need to be a member. We can sidestep apply altogether, and jump straight to the operator|() implementation, to make our original test for this pass. All that’s needed is to call the function-call operator on the provided factory with the provided underlying range object. (Listing 11.)

template< typename range_type,
   typename range_factory_type >
  auto operator|( range_type range,
    range_factory_type factory ) -> 
    decltype( factory( range ) )
{
  return factory( range );
}

TEST_CASE( "Range results can be composed using
   simple syntax", "[range][composition]" )
{
  std::vector< int > data { 1, 2, 3 };
  auto range = make_range( data );

  auto result = range | where( []( int i ) 
    { return i % 2 != 0; } )
    | select( []( int i ) { return i * 10; } );

  REQUIRE( *result++ == data[ 0 ] * 10 );
  REQUIRE( *result++ == data[ 2 ] * 10 );
  REQUIRE( ! result );
}
			
Listing 11

Extensions

One of the motivations for the ease of chaining operations was to make the API easy to extend. Let’s see how well we’ve achieved that, and write take. The idea is that take produces a range of up to a given number of elements.

If the original range has fewer than the requested number, then all are returned. (See Listing 12.)

TEST_CASE( "Range can be limited to a number of
   elements with take", "[range][take]" )
{
  std::vector< int > data { 1, 2, 3, 4, 5 };
  auto range = make_range( data );

  auto result = range | take( 2 );

  REQUIRE( *result++ == data[ 0 ] );
  REQUIRE( *result++ == data[ 1 ] );
  REQUIRE( !result );
}
			
Listing 12

Implementing the range used – let’s call it limited_range – looks straightforward; operator bool() just needs to return false after a certain number of iterations of the underlying range. The problem with this one is the implementation of the helper function: take itself. Up to this point, the range_factory has had a simple task of creating any one of a number of different range types that all had in common their template parameters and construction; each one with two template parameters, and a constructor that took a range-type and functor-type.

Let’s take a quick look at the basic construction of a limited_range:

  template< typename range_type >
  class limited_range
  {
    public:
      limited_range( range_type r, int count )

Only one template parameter, and the count, which can’t be made into a template parameter because it might not be a constant. The range_factory now needs to do something different, i.e. be able to create objects having either one or two template parameters.

The existing range_factory exists because the type of the underlying range isn’t known until the range operation (e.g. select) is invoked via the operator|() mechanism. Introducing a new range type with only one template parameter means the original ‘action expression’, which for take is just a number, still needs to be captured before the range object is created, but the range type is still not known.

This means the basic mechanism is the same, but the implementation of operator()() needs to vary according to the number of template arguments for the target range type. The ideal solution to this would be to create a partial specialization of range_factory using variadic templates to represent the differences. Something like Listing 13.

template< template< typename, typename... > 
   class range_type, typename expression_type >
  struct range_factory { };

  template< template< typename, typename,
     typename... > class range_type,
     typename expression_type >
  struct range_factory< range_factory
    < range_type, expression_type > > { };
			
Listing 13

This would rely on factory types with two or more template parameters matching the specialisation, and those with only one template parameter matching the primary. However, the rules of partial specialization don’t allow this; the primary template’s implicit types (range_type and expression_type) aren’t distinguishable from the specialization, so the second struct is ambiguous.

It doesn’t prevent a new factory type which understands how to create types having only one template parameter, and having the take function use it directly.

  template< template< typename > class range_type,
     typename expression_type >
  class range_factory_1
  {

It is otherwise idential to range_factory, except for operator()()

  template< typename range_of >
  auto operator()( range_of r ) const 
     -> range_type< range_of >
  {
    return range_type< range_of >{ r, action };
  }

The only difference here is the number of template arguments provided to the range_type in the factory function.

This solution works, but it does require any other extensions to know which ...factory class to invoke. It’s not a catastrophe, but it could be made easier. Instead of variadic templates and specialization, we turn to ordinary function overloading to come to the rescue. Instead of creating the correct factory type directly, take can use a call to a function which is overloaded based on the same template-template upon which we wished we could specialize range_factory. (Listing 14.)

template< template< typename, typename > 
  class range_type, typename expression_type >
auto make_range_factory( expression_type expr ) 
  -> range_factory< range_type, expression_type >
{
  return range_factory< range_type,
               expression_type >{ expr };
}

template< template< typename > class range_type,
   typename expression_type >
  auto make_range_factory( expression_type expr )
     -> range_factory_1< range_type,
                         expression_type >
{
  return range_factory_1< range_type,
                        expression_type >{ expr };
}
			
Listing 14

This pair of function overloads selects the correct factory class to construct based on the number of template parameters required by the range_type template-template. Any function that now needs a range_factory or range_factory_1 can just use the overloaded function and provide the correct type for that factory to create, and overloading will do the rest (Listing 15).

auto take( size_t count ) -> 
  decltype( make_range_factory< limited_range >
  ( count ) )
{
  return make_range_factory< limited_range >
    ( count );
}

template< typename unary_predicate >
auto where( unary_predicate fn ) -> 
  decltype( make_range_factory
    < filtering_range >( fn ) )
{
  return make_range_factory< filtering_range >
  { fn };
}
			
Listing 15

Here is the take function implemented to use the new factory factory function (!), and using the same function return type deduction as used previously, with decltype. I’ve also re-implemented where to show the usage is identical; these two functions make use of different range factory types, but that is merely ‘implementation detail’.

It’s not unreasonable to imagine range implementation classes needing more template arguments. In such cases, the corresponding range_factory_N and make_range_factory overload pair would be needed, but in practice, one and two template parameter range types cover most of the most useful things.

All done

This article set out with the stated aim of implementing something not dissimilar to the simple C# LINQ expression which does simple filtering, transformation and range-limiting. With the implementation so far, it’s very similar (but not exactly the same, for some good reasons). See Listing 16.

var result = container.Where
   ( item => item.Id == expected )
 .Select( filtered => filtered.Count.ToString() )
 .Take( 2 );

auto result = range 
  | where ( [&]( const thing & t )
    { return t.name == expected; } )
  | select( []( const thing & t )
    { return to_string( t.count ); } )
  | take( 2 );
			
Listing 16

There was a mention in passing, however, of being able to interoperate with the existing C++ Standard Library implementations, without losing efficiency. The high-level API achieved here is certainly not just a thin wrapper around C++ iterators; it provides a very rich and type-safe platform which is extended fairly easily (at least, no more difficult than extending C#’s IEnumerable facilities). How hard is it to go the extra step, and allow this new range type to play nicely with C++ algorithms?

With some restrictions, it’s very simple. Those restrictions again are inspired by C#’s LINQ: IEnumerable is an immutable interface. You cannot modify elements of it, nor modify the range represented by it, without going via some concrete container type, which in C# means calling ToList() or ToArray() on the range.

I believe immutability isn’t an unreasonable restriction on this kind of programming. With increased focus on concurrency and multi-processing to achieve better performance, and with both of those techniques benefiting greatly from the use of immutable data, making the ranges that this API operates on immutable isn’t a restriction, it’s a design principle. Turning an immutable range into mutable data which works with mutating algorithms requires the C++ equivalent of ToList().

Let’s see if we can express what is required for that in another test (Listing 17).

TEST_CASE( "Range can be used to populate a
   standard lib container",
   "[range][export][stl]" )
{
  std::vector< int > data { 1, 2, 3 };
  auto range = make_range( data ) |
               select( []( int x )
  { return std::to_string( x ); } );

  std::vector< std::string > result 
  { begin( range ), end( range ) };

  REQUIRE( result.size() == data.size() );
  REQUIRE( std::to_string( data[ 0 ] ) ==
     result[ 0 ] );
  REQUIRE( std::to_string( data[ 1 ] ) ==
     result[ 1 ] );
  REQUIRE( std::to_string( data[ 2 ] ) ==
     result[ 2 ] );
}
			
Listing 17

This highlights the fact that in C++, containers and algorithms work with pairs of iterators – the [begin, end) ‘range’, whereas the range types described in this article encapsulate the pair of iterators into a single item.

Implementing begin for the range type looks straightforward – perhaps the necessary operations (only an InputIterator’s operations are required) could be added to the range types, so equivalence operators != and ==.

The possible difficulty might be in implementing end.

There is also precedence for writing end when an end position isn’t known: std::ostream_iterator uses a sentry type (effectively an invalid position created by the default constructor), so a similar technique could be employed here. The only way to find the end of a Range is to consume it, which is obviously undesirable.

There’s more to a C++ Iterator than ++, *, != and == however, in practice. There are some embedded typedefs to consider as well, which is usually captured by inheriting from std::iterator. So, instead of making changes to all of the range types, or some common base class, it seems to be a better separation of concerns to implement the necessary iterator type separately. As already noted, only the operations and types associated with InputIterators are needed if range types are immutable. Most of the required operations are easy enough to write (see Listing 18).

template< typename range_type,
  typename value_type >
  class range_output_iterator : public
     std::iterator< std::input_iterator_tag,
                    value_type >
{
  public:
    range_output_iterator( const range_type & r )
      : r{ r }
    {
    }
    auto operator==
       ( const range_output_iterator & )
       const -> bool
    {
      ???
    }
    auto operator!=
       ( const range_output_iterator & r )
       const -> bool
    {
      return !operator==( r );
    }
    auto operator++() -> range_output_iterator &
    {
      ++r;
      return *this;
    }
    auto operator*() const -> value_type
    {
      return *r;
      }
  private:
    range_type r;
};
			
Listing 18

The sticking point is to decide how to implement operator==(). How to compare two ranges to see if their current positions are the same, in the same range of iterators? Time to step back and consider: how would the std::vector constructor taking two iterators be implemented? Something very like:

  vector( iterator b, iterator e )
  {
    while( b != e )
      push_back( *b++ );
  }

The point here is that the most important consideration is the comparison to the ‘end’ position, which our ranges already do to determine operator bool(), needed for other things internally. If we make the assumption that such comparisons are only ever between a valid range and the end of the same range, then operator==() can use operator bool() – two ranges always compare equal if the first is valid.

A side effect of this is that the right-hand-side of an expression range_l == range_r is never used, so it doesn’t matter what end returns – it doesn’t even need to be a default-constructed sentry value (which would be difficult, given the lack of default constructors of the range types).

  auto operator==
    ( const range_output_iterator & ) const -> bool
  {
    return !r;
  }

The implementations of begin and end therefore become as shown in Listing 19

template< typename range_type >
auto begin( range_type r ) -> range_output_iterator< range_type,
                       decltype( *r ) >
{
  return range_output_iterator< range_type,
                       decltype( *r ) >{ r };
}
template< typename range_type >
auto end( range_type r ) ->
  range_output_iterator< range_type,
                         decltype( *r ) >
{
  return range_output_iterator< range_type,
                         decltype( *r ) >{ r };
}
			
Listing 19

That’s not a printing error: they are identical in implementation. end merely has to return something of the correct type – it is never used, not even to compare with anything.

A side effect of this is that interoperability with non-mutating C++ algorithms is achieved for free, for example

  {std::copy( begin( r ), end( r ),
              std::back_inserter( my_list ) ); }.

To conclude

There have been a few range libraries developed for C++ over recent years, but none seem to have had the same take-up as ranges have in D, for example. I think it’s partly because C++ iterator pairs have become such a central part of writing C++ programs that use the Standard Library, partly because such range libraries that there are are either clunky to use, or have disappointing behaviour regarding the C++ Standard Library, and partly because there are actually no really compelling use cases for them that can’t be achieved using other techniques.

In this article, I set out to demonstrate some uses for a very lightweight range type that’s very easy to use, and provides facilities that are very widely used in a different language – C# – which in turn took the ideas of functional languages and coupled them with what some people describe as the ultimate declarative language, SQL.

C++ has many functional facilities, and many of the standard algorithms mirror functional constructs. std::transform is essentially a list comprehension, but lacks the brevity and simplicity of transformations in truly functional languages, at least in part because it uses two iterator-pairs to represent separate ranges.

The C# Select expression to transform one IEnumerable into another captures the simple case that is most common – turn every element into a new range of some new type – and C++’s transform doesn’t really even compete. A simple range type for C++ overcomes the problems with having that simplicity, whilst still allowing simple and efficient interoperability with existing C++ Standard Library facilities.

All the code in this article compiles with GCC 4.6.3 (with -std=c++0x) on Ubuntu, and GCC 4.8.1 (with -std=c++11) and Microsoft Visual Studio 2013 CP on Windows. It’s available from https://github.com/essennell/narl.

Acknowledgements

Many thanks to Andy Sawyer and Roger Orr for fixing some of my misunderstandings of decltype and trailing return types, Jon Wakely for helping me understand why partial specialisation of types based just on the number of template-template parameters doesn’t work, and to Frances Buontempo, Roger Orr and Chris Oldwood for reading early revisions of the article and spotting my inevitable errors.

References

[Alexandrescu09] http://accu.org/content/conf2009/AndreiAlexandrescu_iterators-must-go.pdf

[Boost] http://www.boost.org/doc/libs/1_54_0/libs/range/doc/html/range/reference.html

[Github1] https://github.com/dietmarkuehl/kuhllib/wiki/STL-2.0#andrei-alexandrescus-ranges

[Github2] https://github.com/philsquared/Catch

[Standards] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3350.html

  1. Specifically, LINQ for Objects, which operate on containers rather than DataSources.
  2. Alright, you’d need something like auto operator*() const -> typename std::result_of< transformation( decltype( *std::declval< range_type >() ) ) >::type I hope you agree that declaring private at the top is a small price to pay!
  3. More verbose but perhaps more descriptive might bereturn static_cast< bool >( r );
  4. The ‘functional’ approach to this problem is any, which returns false if a range contains no elements.

Notes: 

More fields may be available via dynamicdata ..