Title: Standard containers for polymorphic types

Author:

Date: 31 January 1997 08:54:00 +00:00 or Fri, 31 January 1997 08:54:00 +00:00

Summary:

Body:

I think that the terminology that is adopted by programmers is often the cause of many of the problems even relatively experienced programmers have. What is a container? Well common sense says that it something into which you can put things. In an object-oriented world surely that means a container is an object into which you can put other objects. Well if you think that you are totally wrong in so far as C++'s standard containers are concerned.

What about a collection? Ordinary experience might lead us to suppose that this too was some sort of assemblage of objects. We have stamp collections, art collections and so on. Once again in an OO world we have every right to think that it is somehow related to objects.

Now in C++ we have two mechanisms for han dling objects, references and pointers. Both these mechanisms support polymorphism. We also have the concept of copying which is somehow related to the idea of a value.

The pointer mechanism is a low level technique with pointers being a kind of object whose 'value' is the 'address' of another object. There are problems when the later object moves or disappears because the pointer will provide a kind of ghostly existence for the original object. If we are being sensible we should try to avoid long distance pointers. By that I mean we should avoid using a pointer as more than a very temporary handle for an object that is not immediately visible. We also need some mechanism for marking a pointer as being out of use. C provides that via its null pointer, but for efficiency purposes all use of this facility is left to the user. C++ allows skilled programmers to produce their own pointer like objects which can have many kinds of enhancement including more robust mechanisms for handling non-addresses. Of course there is a price for using these which you have to consider but until you can develop suitable 'smart pointers' you are no more than an apprentice C++ programmer.

The C++ reference mechanism is designed to allow alternative (often local scope) identifiers to be attached to existing objects. The problem is that these identifiers are bound to a specific object either statically or dynamically. The dynamic version is via a reference parameter or return type. To understand this consider how you can bind an auto identifier to a dynamically selected instance. Given:

#include <iostream>
struct Base {
  int i;
  virtual void fn()
  { cout<< "I'm a Base" << endl;}  
}; 
struct Derived: public Base {
  void fn()
  { cout << "I am Derived" << endl; }
};

I cannot write something like:

main() { 
  for (int i=0; i<3 ;i++) {
    switch(i) {
    case 0:
      Base & item = *new Base; break;
    case 1:
      Base &item = *new Derived; break;
    default:
      Base &item= *new Base;
    }
    item.fn();
  }
}

Instead I must write:

Base & fx(int i) {
  switch(i) {
    case 0: return *new Base; break;  
    case 1: return *new Derived; break; 
    default: return *new Base;
  }
  return item;
}

main(){
  for (int i=0; i<3; i++){
    Base & item = fx(i);  
    item.fn();
  }
  return 0;
}

Of course I could use a pointer, but the point I wish to make is that dynamic binding of a ref erence is not simple. This results in a further problem, you cannot have standard containers of references. Without these there is no way to implement the concept of putting an object into a standard container. The standard C++ containers work by copying. When I mentally put an object into a vector, the actual process is to copy its value into an object already in, or created for the purpose in the container. As I can only have a container of Bases I cannot have a mixture of Base and Derived objects. Think hard about the implications of that. C++ standard containers are collections of homogeneous objects that can be manipulated by copying. If you sort a vector you do not move the objects you move the values they contain. This is not a design error, it is a result of wanting to design efficient code in the ab sence of any form of rebinding for references.

Now clearly we are going to be constantly irri tated by the lack of containers for mixed types so we must have some way to handle these. That is the subject of the rest of this article.

Use a pointer

The obvious step is to use a container of pointers to hold the objects that I create, or that al ready exist elsewhere. While this lacks a certain elegance it might be a solution were it not for two dangerous aspects of using such containers.

The first is that I must worry about what is go ing to be responsible for destroying a dynamic object held by a pointer in a container. If the object is 'owned' by the container I must destroy it before the container is destroyed (or transfer its ownership elsewhere). If it is not owned by the container then I must not destroy it via the pointer in the container. That means that either my container must only contain pointers to 'owned' objects, or it must contain only pointers to non-owned ones, or it must somehow track which are which. Simple pointers cannot do the latter, applying constraints on programmers that the compiler cannot enforce is a recipe for disaster.

We could use a simple class to track pointer and ownership. If we do this we might as well generalise it for all types. That indicates that it should be a template class. Possibly something along the lines:

template <typename T> class Owned_ptr {
  T * handle;
  bool owned;
public:
  // constructors, destructor, access
  // functions etc.  
};

It would probably be worth adding functional ity to make this a smart pointer, but not yet as we have first to consider if it can do the job.

The second problem is that the STL algorithms for containers are also value based. It may not be obvious but the result of this is that any algorithm that 'moves' objects actually only copies values. Where the algorithm is one that essentially preserves all items in a container this is not a problem in this context.

(It is a problem in other contexts, you have to be especially careful of pointers (and their more generalised cousins, iterators) into containers. Applying algorithms to containers provide a whole new potential for invalid pointers for the unwary. Indeed even adding an item to a vector can invalidate all current pointers to items in it. Why? An insertion could require the vector to expand, that would require more memory, that almost certainly results in mov ing the whole vector to a new location with sufficient contiguous memory to hold the resized vector. Break the habit of using raw addresses. Funny how often that theme comes up in C++.)

But some STL algorithms do not preserve all the items. In such cases values may be copied from one to another. If the values happen to be addresses then the copying can overwrite the only pointer to a dynamic object and hence cause a resource leak (probably memory, but it could be other things. C programmers get very blas6 about closing files because exit() will close them anyway, as far as I know that is not necessarily the case in C++. It is the job of the destructor of an fstream object to close the file, if that destructor is never called ...)

The result of this is that you will need to be careful with your use of a container of pointers because some algorithms will break them. Once again we have a constraint placed on the programmer. That should always be a court of last appeal.

The simplest protective measure we can take is to inhibit copy assignment in our 'smart pointer.' Then if we try to do something silly we will get objections from the compiler. Note that this does not prevent us providing a copying facility for our own use, it just won't be called operator=(). Stick the latter declaration in the private interface of our class and declare a T & copy(T &) const in the public interface. Of course that may inhibit more of the STL algorithms than you would ideally wish for but sometimes convenience must give way to safety.

Surrogates

This is not the time for me to explain how you write a surrogate class, I'll do that another time. However you should know what one of these is and then you will understand how they can assist with our problem of a container of sub-types.

Basically a surrogate is a way of encapsulating the behaviour of a polymorphic hierarchy so that the many low level details are hidden from the client programmer. Consider all the behaviour you have to provide when handling instances of a polymorphic hierarchy via pointers. You have to remember to construct instances, destroy them, manage copying and so forth. You also have to handle the interface provided by the (probably abstract) base class of the hierarchy. In an ideal world you probably wish that you could have instances of your abstract base class that behaved temporarily like a specific concrete derived class. The mechanism of writing a surrogate almost meets that wish.

Using a surrogate allows you to have a con tainer of polymorphic objects. The copying and cloning problems will be handled by the surrogates member functions. As far as the user is concerned they will be handling a single type that happens to have variable behaviour determined initially by the constructor, and then by its subsequent history. If you do not know about this idiom, I strongly advise you to go and master it, every journeyman C++ programmer should have it in her toolkit.

Summary

If you want a container with a mixture of owned and non-owned objects you will need to use a container of a handle type that can track ownership. You may well enhance this into a smart pointer. You certainly will need to consider how to manage the STL containers value based behaviour by somehow restricting the algorithms that can be applied to your container.

If you want a container of objects from a po lymorphic hierarchy and you need to recover the type information (because you need to use sub-type specific interface extensions for example) via dynamic_cast<> then you will need to work rather harder. In such cases you should seriously consider a sufficiently powerful smart pointer. Indeed you need positive reasons for not using one. And as in the previous case you must handle the consequences of containers being managed by value copying.

If you have the case of objects with a uniform interface but polymorphic behaviour they should be handled via a surrogate class. If you are not already writing surrogates for such type hierarchies you should examine your reasons because this is the natural idiom for such types.

Note that in all cases raw pointers are wrapped up and hidden from the high level programmer. On second thoughts, perhaps we might even reconsider smart pointers. Are they really that valuable? What do you think? Anyone for the prosecution? What about the defence? How and when should we use smart pointers? To be honest, I do not know. I think it is time that some of the rest of you added your thoughts. If I can stand our esteemed editor's criticisms, I am sure you can.

Finally, some of you may wonder why I have not suggested using auto_ptr<>. It is part of the Draft Standard C++ Library and you know how keen I am on using libraries rather than rolling your own (probably inferior) product. The Standards Committees broke it in one of their recent revisions. I have no idea why, and as far as I can find out, all they did in Hawaii was to mend it with sticking plaster. The result is too dangerous for use by non-experts, and experts can do better anyway. I think this is C++'s equivalent of C's gets(), good idea but dangerously flawed.

Notes:

More fields may be available via dynamicdata ..

Journal Articles

Title: Standard containers for polymorphic types

Use a pointer

Surrogates

Summary