ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinA Fistful Of Idioms - Giving STL Iterators a Base Class

Overload Journal #38 - Jul 2000 + Programming Topics   Author: Steve Love

Some little while ago, I posted a question on the accu-prog-questions list regarding virtual template functions. The short answer I already knew - "it cannot be done." The slightly longer answer was provided by Kevlin Henney - "it can be done if you know how."

A template function cannot be made virtual because, as Kevlin put it, "The dynamic binding you want ... is more significant than the polymorphism required to handle any type." The virtual function table for the class would contain the address of the function normally, but a template function may never be instantiated, or may be instantiated multiple times. What is a poor compiler to do? Spit it out with a diagnostic, probably (which is exactly what mine did).

I thought that this could be overcome in a simple and idiomatic manner. I was not far off the mark; the solution provided by Kevlin is, in his words, a "collaboration of idioms." I will try to identify them as we go along.

The Good, the Bad and the Ugly

Consider a class to write to a file. The format of a given file is probably fixed (few files contain both fixed width and comma-separated values), but a more general solution will provide for several formats. Even better, a good solution will also provide for extensions; today you may want only CSV and fixed width, but tomorrow you may also require an ODBC data source. An obvious way to do this is provide an interface class, with pure virtual functions defining the interface to be implemented in derived classes. Each derived class has knowledge of the layout and delimiter requirements of the target.

class file_writer {
public:
  // pure virtual functions define the interface
};
class csv_writer : public file_writer {
  // implement virtual functions according to CSV format
};
class fixed_w_writer : public file_writer {
  // implement virtual functions according to fixed width format
};

The original requirement was for something along these lines. A record writer function would take a range of values to be written as a single record:

class file_writer {
public:
  // ...
  template<class Iterator>   virtual void write (Iterator begin, Iterator end) = 0;
};

As already mentioned, this can not be done directly. Three possibilities immediately present themselves:

  virtual void write (std::vector<std::string>::const_iterator begin, 
                     std::vector<std::string>::const_iterator end);
  virtual void write (std::vector<std::string> container);

Neither of these is very pleasing; both tie you, as the client, into a specific container and contained type. The first has advantages over the second, in that you, the client, can pass only a partial range if you wish. The write() function itself does not have to change. Both suffer from having to specify the particular iterator to be used. For example, the function would need to be overloaded for a reverse iterator.

The third option is to pass each value in the container in turn to the write() function. This has the improvement over the previous solutions in that the container type is no longer an issue, but has other limitations, a burden on the client not least among them.

virtual void file_writer::write (const std::string & value) = 0;
// ...
std::vector<std::vector<std::string> > values;
file_writer * csv_file = new csv_file_writer (file_name);
// ...
for (std::vector<std::vector<std::string> >::iterator record = values.begin();
    record != values.end(); ++record){
  std::vector<std::string>::iterator i = record->begin();
  while (i != record->end() {
    csv_file->write (*i);
    ++i;
    if (i != record->end()) {
      csv_file->write (",");
    }
  }
  csv_file->write ("\n");
}

This code would probably do the job, but is prone to error, slightly obfuscated (which could be helped with judicious use of using declarations), and the client is required to know the format of the output file - defeating the original object.

The clue to the solution is in the requirement. Ideally, we want to be able to write

  virtual void write (any_iterator begin, any_iterator end);

So, a good starting point is a class called any_iterator.

The Type With No Name

The reason for writing a function which operates on an iterator range as a template function, is so that it will work with any iterator (provided the function itself uses a conformant interface). Given that fact, we need to be able to create an any_iterator from, well, any iterator. In this case, it is sufficient to be able to create one from any Input Iterator.

The concept of Input Iterator limits the scope of what we need to achieve in any_iterator. We need the following operations:

  • default construction

  • assignment/copy

  • equality compare

  • dereference

  • member access

  • pre increment

  • post increment

  • post increment and dereference

On this basis, then, our any_iterator class will provide the following member functions:

  • any_iterator()

  • any_iterator (const any_iterator &)

  • operator= (const any_iterator &)

  • operator== (const any_iterator &)

  • operator!= (const any_iterator &)

  • operator*()

  • operator->()

  • operator++()

  • operator++(int)

This provides our interface, but we have to decide, for example, what operator*() should return. We can specialise the entire class on its contained type (in our original requirement, std::string), or we can parameterise the class on the contained type:

template<typename Contained> class any_iterator {
// ...

This latter method means that the signature for our write() function becomes

  void write (any_iterator<std::string> begin, any_iterator<std::string> end)

which seems a reasonable compromise.

With this in mind, we can provide an implicit conversion from any Input Iterator (actually, from anything, but the restrictions will come later) using a member template function for a constructor.

template<typename Iterator>
any_iterator (const Iterator & rhs);

Painting Your Wagon

The real magic of the solution is expressed in a Pattern called "External Polymorphism" which is published as having the following intent:

Allow C++ classes unrelated by inheritance and/or having no virtual methods to be treated polymorphically. These unrelated classes can be treated in a common manner by software that uses them. (Cleeland 1996)

Standard Library Iterators are expressly not related by inheritance; they are related by concept (Gamma+, 1995). They do not have virtual functions, and cannot be treated polymorphically, in the late-binding sense. Our requirements for any_iterator are to treat Iterators as if they had a common ancestor. They could then be treated in the common manner described above.

  class interface   {
  public:
    // pure virtual interface declarations
  };
  template<typename ConcreteType> class adaptor : public interface {
  public:
// implementations of pure virtuals in base class
  private:
    ConcreteType adaptee;
  };

The adaptor class has member functions which forward the call to the member function of ConcreteType. In this sense, this structure is similar in some respects to the Adaptor pattern (Gamma+, 1995), but it has differing motivations; it does not set out to convert the interface of ConcreteType in any way (athough it could), only to provide ConcreteType with polymorphic behaviour. The client class, using a pointer to interface, can use it as if it were a ConcreteType.

In our any_iterator class, ConcreteType is an iterator. We need to provide in the interface those functions which will allow us to provide an iterator-like interface in any_iterator. A start is the following:

class wrapper {
public:
  virtual void next () = 0;
  virtual std::string & current () const = 0;
  virtual bool equal (const wrapper * rhs) const = 0;
  virtual void assign (const wrapper * rhs) = 0;
};
template<typename Iterator> class adaptor : public wrapper {
public:
  adaptor (const Iterator & rhs);
  virtual void next () 
    { ++adaptee; }
  virtual std::string & current () const 
    { return *adaptee; }
  virtual bool equal (const wrapper * rhs) 
    { return adaptee == static_cast<adaptor<Iterator> *>(rhs)->adaptee; }
  virtual void assign (const wrapper * rhs)
    { adaptee = static_cast<adaptor<Iterator> *>(rhs)->adaptee; }
private:
  Iterator adaptee;
};

Note the addition of a conversion constructor in adaptor's interface; that will become clear shortly. There are a couple of gotcha's here. The first one to go is the return from current(). This is the same contained type we already fixed in the any_iterator class by using a template parameter to the class. The easiest way to parameterise this one is to nest interface and adaptor<> inside any_iterator.

template<typename Contained> class any_iterator {
public:
  // ...
private:
  class wrapper   {
  public:
    virtual void next () = 0;
    virtual Contained & current () const = 0;
    virtual bool equal (const wrapper * rhs) const = 0;
    virtual void assign (const wrapper * rhs) const = 0;
  };
  template<class Iterator>   class adaptor : public wrapper {
    // ...
  };
};

The next problem is with equal(). The rhs pointer could have a different adaptee type to this - in our case, e.g., a list<>::iterator when this one is a vector<>::iterator.

It is the polymorphic type which interests us, so we can make use of RTTI to determine consistency of type:

  adaptor<Iterator> * tmp = dynamic_cast<adaptor<Iterator>*>(rhs);
  return tmp && adaptee == tmp->adaptee;

The nature of using dynamic_cast on pointers means that tmp will be null if the conversion fails, i.e. the runtime type of rhs is different to this. To make this fail more noisily, we could have adaptor::equal() take a reference, and make tmp a reference, thus forcing dynamic_cast to throw std::bad_cast on failure.

Note that assign() also suffers from this problem, but we will address that one differently.

We now need access to this framework in the any_iterator class. The declarations are already nested; all that remains is to define such a thing.

template<typename Contained> class any_iterator {
public:
// ...
private:
  class wrapper   {
// ...
  };  
  template<typename Iterator> class adaptor : public wrapper {
// ...
  };
  wrapper * body;
};

Now the member functions of any_iterator can operate on body which will forward all requests to adaptee.

  const any_iterator<Contained> any_iterator<Contained>::operator++ (int) {
    body->next();
    return *this;
  }
  bool any_iterator<Contained>::operator== (const any_iterator & rhs) {
    return body->equal (rhs.body);
  }
  // ...

A Few Idioms More...

So far, so good. However, the functions we have created only work if we have an actual instance of body. Recall the converting constructor being a member template function. The template type of the Iterator is known at this point, and we can construct a new adaptor object accordingly:

template<typename Contained>
template<typename Iterator>
any_iterator<Contained>::any_iterator (const Iterator & rhs)
  : body (new adaptor<Iterator>(rhs)) { }

The copy constructor is more of a problem; we have no way of identifying the Iterator type for rhs->body

  any_iterator<Contained>::any_iterator (const any_iterator<Contained> & rhs) {
    // ???
  }

nd so have no way of creating a new instance of it. This identifies a need for a new idiom in the adaptor class: a Virtual Copy Constructor (Henney, 1999, Cline, 1991, Koenig, 1997).

A constructor for a class cannot be virtual. Normally, you know the concrete type of the object being created, and so this does not cause a problem. In some circumstances (here for example), we have only access to the base class type of the object we want to construct. It is meaningless to create an empty one, but perfectly reasonable to require a copy, or a clone. It forms part of the interface to the wrapper class

  wrapper * wrapper::clone () const = 0;

and is implemented in adaptor:

  template<typename Iterator>
  wrapper * adaptor<Iterator>::clone () const {
    return new adaptor<Iterator>(adaptee);
  }

This uses the already given conversion from Iterator (the type of adaptee), which has been used before in the conversion constructor of any_iterator. The return from clone() could be covariant, i.e. return a pointer to the actual class rather than a pointer to its base class, but there is no benefit in this, since the target for the returned pointer is a pointer to the base class; it will not be used as part of an expression such as

  Contained string_value = body->clone()->current();

(Aside to users of Borland C++Builder 4/5 - I originally declared a copy constructor for adaptor thus:

  adaptor (const adaptor<Iterator> & rhs);

but the entire class refused to compile. Removing the template type from the rhs parameter solved the problem legally, a language construct I had not previously encountered - but the diagnostics were not helpful in reaching this conclusion!)

This allows us to write a copy constructor for any_iterator as

template<typename Contained>
any_iterator<Contained>::any_iterator (const any_iterator & rhs)
  : body (rhs.body ? rhs.body->clone () : 0) { }

Having defined a copy constructor, we now need a copy assignment operator, which brings us to another idiom - Copy Before Release (CBR). Self assignment is not a problem in a class with only intrinsic data types (or those that provide the same copy semantics of intrinsics); but then, if only intrinsic data types are present, we can do without copy construction and copy assignment altogether, leaving the compiler to provide the necessary logic.

This actually leaves us at the Rule of Three - if you need any one of copy constructor, copy assignment operator, destructor, you generally need all three. (Cline, 1991, Coplien, 1992).

Since the data member of any_iterator is a pointer, we require a proper deep copy to be performed on assignment. We also need to ensure that assignment to self does not occur. Finally, assignment should be exception safe, meaning it will operate correctly in the presence of an exception.

The idiom which expresses all these things is Copy Before Release (Henney, 1998). At its most basic level, using the current example, it is implemented as

any_iterator<Contained> & any_iterator<Contained>::operator= (
                            const any_iterator<Contained> & rhs){
  wrapper * tmp = body->clone();
  delete body;
  body = tmp.body;
  return *this;
}

As you can see, the name given this idiom is apt. It turns out that this can be simplified using a suitable copy constructor. We have already defined the copy constructor for any_iterator, and so can use that. Further more, exception safety can be guaranteed by using std::swap().This leads to a remarkably simple implementation (Sutter,1999):

any_iterator<Contained> & any_iterator<Contained>::swap(any_iterator<Contained> & rhs){
  std::swap (body, rhs.body);
  return *this;
}
any_iterator<Contained> & any_iterator<Contained>::operator= (
                            const any_iterator<Contained> & rhs){
  swap (any_iterator<Contained> (rhs));  
  return *this;
}

The swap() member function merely swapping two pointers may look a little suspect, but the call to it provides the insight. We create a new copy of the rhs and swap it with this. When the call to swap() completes, the temporary object constructed in its arguments goes out of scope, taking its pointer with it. At that point, its body pointer is the memory which used to be attached to this, whereas this->body points at newly allocated memory containing a copy of rhs.

This safely handles null assignment, self assignment and exceptions (such as std::bad_alloc), as well as a successful copy. If an exception does occur, via the any_iterator copy constructor, then this will remain in its previous state, because the copy will not occur.

Conclusion

The whole premise of this technique revolves around being able to silently convert from a STL iterator to an any_iterator. In the wider sense (see Cleeland, 1996), it can be used to superimpose polymorphic behaviour on other concrete types. This property of encouraging silent conversions is, rightly, regarded as dangerous (when C++ is described as a language which enables you to shoot yourself in the foot, this is one of the big shooters), but under specific conditions, such as those described here, it may be inescapable.

The circumstances under which it is considered dangerous are generally where the converted type appears as a parameter to a function; however, this is exactly the circumstance under which this idiom is intended to be used. Here is an example.

  void csv_writer::write (const any_iterator<std::string> &begin,
                          const any_iterator<std::string> & end) { /* ...*/ }
  // ...
  csv_writer cw;
  std::vector<std::string> records;
  // initialisation of vector with strings here

  // using ordinary STL iterators from vector. It's also perfectly
  // valid to use reverse_iterators or const_iterators
  cw.write (records.begin(), records.end());
  // similarly, we can use istream_iterators because the basis for 
  // any_iterator is the Input Iterator, which is modelled by 
  // istream_iterator
  std::ifstream  strm (file_name);  
  cw.write (std::istream_iterator<std::string>(strm), 
            std::istream_iterator<std::string> ());

This extract demonstrates the use of any_iterator. The intention is that client code never uses - indeed, probably does not even need to know about - the any_iterator type directly, but is able to use it transparently because it exhibits properties to allow this.

See bottom of next page for acknowledgements and bibliography (sorry, but I seem to have mislaid themduring the process of preparing this for publication FG)

Overload Journal #38 - Jul 2000 + Programming Topics