Journal Articles

CVu Journal Vol 10, #6 - Sep 1998 + Programming Topics

Browse in :

All > Journals > CVu > 106 (12)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Life Stories

Author: Administrator

Date: 03 September 1998 13:15:27 +01:00 or Thu, 03 September 1998 13:15:27 +01:00

Summary:

Body:

This article is aimed at everyone except OO and C++ experts. If you're a novice C programmer and the following does nothing for you, then I've failed. If you're an expert, then go ahead and tear it to pieces.

A program is essentially code acting on data. What we're concerned with here is the lifetime of any part of the data in relation to the flow of the code. Some data has a lifetime that is independent of that of a program instance. A database stored on a disk drive is a good example. An e-mail message bouncing around the Internet is another. This kind of data isn't the subject of this article. I am only concerned with what goes on 'inside' a program, i.e. that which is in memory and accessible to your code!

Global data

In the bad old days both in C programming and elsewhere, 'global' data was commonplace. In C, this is data that is declared and defined outside of a function. The lifetime of global data is essentially that of the program. Global data is generally considered to be a 'bad thing'. In particular, using global data as a means of communication between different parts of the program is bad. Occasionally, the fact that just one storage location is accessible from anywhere in the program is useful. For example number or currency formats may change if the regional settings are changed. Even so, providing functions for this purpose rather than making the data visible globally is a better style.

Strictly speaking we should distinguish between data with external and internal linkage. The latter is really global only to the translation unit and is still basically a bad thing.

Eliminate global data.

Local data

Variables declared inside function bodies have a scope that is limited to that function. In fact, the scope is the enclosing block in C, so that a variable may be restricted to only part of a function. The scope limits your use of a variable but doesn't in itself define its lifetime. If a local variable is marked as static, it holds its value even after execution has left the enclosing block. However, for all variables that aren't marked as static, their lifetime is from the declaration to the end of the block. In C this is the whole block, in C++ the declaration may be anywhere in the block.

As well as the above-mentioned declarations, parameters passed by value have similar scope and lifetimes.

In the simplest cases, when working with simple values, the lifetime of a variable is of academic interest only; its scope is the dominant consideration when coding:

int GreatestCommonDivisor(int m, int n) {
  int remainder;
  while(n) {
    remainder = m % n;
    m = n;
    n = remainder;
  }
  return m;
}

The fact that the loop is working with two parameters and a local variable that exist only during the call is no great concern. The fact that these variables may be modified without fear of side effects on the rest of the program is reassuring however.

Now for some recursion:

int Factorial( int m ) {
  int result;
  if( m > 1 )
    result = m * Factorial(m - 1);
  else
    result = 1;
  return result;
}

Here the scope and the lifetime of the parameter 'm' and the local 'result' are within the function call. So calling Factorial(GreatestCommonDivisor(30, 12)) leads to a sequence of calls, with storage being set aside at each call.

So first GreatestCommonDivisor(30, 12) is called with storage being used for 'm' and 'n' initially containing copies of the literal values 30 and 12. This storage along with that used for the remainder variable is modified by the loop until the first parameter has been reduced to 6. This value is returned from the function and the storage used by the function is discarded.

This value (6) is then passed to the Factorial function, inside of which, storage is set aside for the result variable. The value in 'result' is eventually passed back to the calling code to be dealt with as it may.

With a parameter of 6, the function passes a value of 5 to Factorial. Inside this invocation, the outer invocation's variables remain untouched, fresh storage is used for its copy of the local data. Each successive call is made with values of 4, 3, 2 and 1 when finally, the conditional allows the calls to progressively unwind, and building the result at each level until the first invocation finally returns 720.

Although it is more efficient to write a non-recursive Factorial with a simple loop, the allocation of storage for these variables is still very efficient. Each function invocation has a 'stack frame' which is simply an area on the call stack, accessed by offsets from a frame pointer. That is, adjusting the stack pointer to leave room for them allocates space for local variables. That is why local variables have arbitrary values until you initialise them.

The Heap

C provides heap allocation library functions (malloc etc), C++ provides new and delete operators. Both do much the same job, in that you can allocate space away from the call stack and access the allocated space by using pointers. Although local pointers have a lifetime limited by their scope, the data pointed at has a lifetime entirely under your control. Without even thinking about classes and the like, life has suddenly become much more complicated. The main thing that makes life difficult is that simple pointers are merely values and may be copied freely. Consider a simple use of the heap:

  char * pBuffer = new char[buff_size];
  //... use pBuffer
  delete [] pBuffer;

For C just substitute calls to malloc and free. This is pretty innocuous stuff. Using C++ as it should be, if the new operator fails, an exception will be raised, so the subsequent use of the buffer would be skipped.

For every successful call of new/malloc, there should be a corresponding call to delete/free. If memory is allocated repeatedly and not freed, the result is a memory leak. That is to say that the heap will slowly fill up until there is no memory left for your or anyone else's program. This is a bad thing. Even more catastrophic (at least for your program) is what happens if you access data after it's been freed. Actually, not crashing is probably the most dangerous behaviour! Even worse, freeing the same heap object twice almost always results in a nasty crash.

There are two ways that the above fragment could give you grief. Firstly, if an exception is thrown and it is not caught between the allocation and freeing of the buffer, a memory leak will occur. The other case is if a copy of the pointer is taken and then used after the last line above. That's aside from direct use of pBuffer after the delete while it's still in scope or overflowing the buffer of course!

For now, note that the heap must be used with care.

Objects

C++ brings the concept of constructors and destructors into the fray. Objects in C++ (structs and classes) have lifetimes that start with a constructor and end with a destructor. Lifetimes of such objects are like those of the plain data types above. If a class is instantiated as a simple variable in a block, it will live to the end of the block. If it is instantiated using the new' operator, it will live till it is deleted. The former case has two benefits.

Firstly, you don't have to destroy the instance explicitly, that will happen automatically. This eliminates one source of errors.

Secondly, its destructor will be called even if an exception is thrown that isn't caught in that function. It has to be mentioned that this is only true for 'modern' compilers, i.e. those that support exceptions properly rather than tacky macros.

So what am I trying to say? Obviously, you will want to create objects on the heap and yet we know that doing so can lead to memory leaks if you fail to delete them at the proper time. So how can this be done safely?

Objects can contain other objects. It is simple to compose a class with instances of other classes. It is nearly as simple to contain pointers to classes and instantiate them on the heap. The latter case is likely to be forced on you when you need to create a number of objects that is determined at run time. A typical example would be to create a linked list of objects populated from a file, or a varying number of open documents in an editor.

Ownership

It is a common style to compose a class of many different objects. Thus you might declare:

class MyObject {
  AnObject oneObject;
  AnObject anotherObject;
public
  MyObject(int a, int b);
};

The AnObject instances must be initialised on construction. Supposing their constructor requires an integer parameter, the MyObject constructor will look something like this:

MyObject::MyObject(int a, b)
        : oneObject(a),
          anotherObject(b)
{}

Alternatively, you might want to create the objects independently on the heap. Suppose that we were writing an old-fashioned linked list class.

class ListOfAnObjects {
  struct AnObjectNode {
    AnObject *pItem;
    AnObjectNode *pNext;
  }
  AnObjectNode *pHeadNode;
public
  ListOfAnObjects() : pHeadNode(0) {}
  ~ListOfAnObjects();
  // insertion and deletion functions...
  // iterator class...
};

In this case the constructor simply initialises to an empty list - a null terminated single linked list.

The insertion functions will create new AnObjectNode instances with the new operator. The destructor will have the responsibility of deleting the nodes at the very least:

ListOfAnObjects::~ListOfAnObjects() {
  while( pHeadNode ) {
    AnObjectNode *pThisNode = pHeadNode;
    pHeadNode = pHeadNode->pNext;
    delete pThisNode;
  }
}

Please note that the above loop maintains a valid list while it is gobbling it up. This can be quite important in more complex situations. But anyway…

Assuming that the other methods of this class are also correct, this class isn't going to cause memory leaks by leaving orphaned list nodes littering the heap. On the other hand, what about the AnObject instances?

As presented, this class can maintain a list of pointers to these objects but their lifetimes are independent of the list. That is, users of the list class must manage the lifetimes of these objects.

Before rushing ahead, let's look at some of the benefits of what we've got so far.

Firstly, if a list class like the above is instantiated as part of another class its destructor is sure to be called, so long as the enclosing class' destructor is called.

Secondly, if it is declared locally in a block of code, its destructor is sure to be called, even if an exception is raised in the intervening code.

Note

For a useful example of this benefit, look at using auto_ptr as a means of looking after local pointers to heap objects.

Assuming that the other member functions are correctly written, the list class is able to take full charge of the lifetimes of the node instances. This power of life and death over objects is usually termed 'ownership'. The list class owns all its list nodes. This is what you would expect, as the sole purpose of these nodes is to support the collection of objects in a list.

Can we take the concept of ownership further and have the list class own the AnObject instances that it is listing?

Extending the class to do this is simple, wherever we delete a node, we could delete the AnObject at the same time e.g.:

delete pThisNode;
delete pThisNode->pItem; // *@~%?

Just kidding, obviously the order of these lines should be reversed!

The good thing about this approach is that you don't have to worry about deleting your AnObject instances, the list class will do it for you. The bad thing is that only one instance of the list class can reference any one AnObject instance.

One refinement that I have found useful is to give the list class an ownership attribute. That is, it may or may not be an owner. This allows you to create a master list of objects that owns them and outlives all other lists and have other lists that simply reference the objects without owning them. This approach does avoid unnecessary copying of objects^[1].

Don't use this idea unless you are confident that it reflects your requirements accurately. If you need finer grained control of such object's lifetimes, this becomes unmanageable.

Another extension to the ownership concept is to allow ownership to be passed from one owner to another (see auto_ptr again). In the case of a list class, this implies that a method must be provided that can remove an object from the list without destroying it, even when the destructor would do so. Such a method might be called 'relinquish'. I think that passing ownership around is something to consider only in unusual cases^[2]. For example, it is a useful concept when an object wants to 'commit suicide' and pass itself on to something that will destroy it later at its own convenience.

The list class in non-owner mode is an example of what is generally known as a 'collection class'. Ownership is a property of container classes, although they also are usually responsible for the creation of the objects as well as their destruction. Container semantics are basically value based rather than reference based and are exemplified by the STL. So go and read a good book on STL if you want to know more.

Reference counting

A more sophisticated approach to managing object lifetimes is to track references to each object, so that destruction occurs only when there are no references to an object. Imagine something like the following:

void SomeFunction( AWordList & outerList ) {
  AWordList list;                 // ref. count:
  PtrAWord pWord = new AWord("Hello");// hello: 1
  list.Add(pWord);                    // hello: 2 
  pWord = new AWord("World");         // world: 1, hello:1 
  list.Add(pWord);                    // world: 2
  outerList.Add(pWord);               // world: 3
}

Where AWord is a class that holds a single word and is reference counted by some means.

The rules are that when you assign to a pointer to AWord, the reference count is incremented and when you destroy the pointer or assign something else to it, the count is decremented.

The two AWord instances that are created in the above fragment have different lives.

First, the 'hello' instance is created and assigned to 'pWord', this gives it an initial reference count of one. Then it is also added to a local list for reasons that will never become clear. This operation increases its reference count to two, i.e. both 'pWord' and 'list' refer to it.

Now a new AWord instance is created initialised to 'World' and its reference count is set to one as the assignment is made to 'pWord', then because 'pWord' no longer references the 'hello' instance, its reference count is decreased to one (just the list owns it). Note the order of operations - it matters. Why?

Similarly, the 'World' instance is also added to the local list and a referenced list that exists outside the scope of this function. These assignments bring its reference count up to three.

The function now finishes, causing the destruction of 'list' and 'pWord'. The destruction of list decrements the reference counts of both AWord instances, causing the 'hello' instance to drop to zero. With the pWord destruction also decrementing it, the 'World' object still has a reference count of one, corresponding to the outerList referencing it.

When a reference count drops to zero, it causes the destruction of the object.

Oh Yeah! You might say, wondering what language that is in. As is well known, this model of object lifetimes is directly supported as an intrinsic part of many modern languages. It is also supported in C++ using what are commonly referred to as smart pointers, i.e. what looks like a simple C style pointer is really a class. The implementation of such classes is beyond the scope of this article. There are several ways that this can be achieved, depending to a certain extent on the classes being collected. To take one common example, COM objects must support this reference counting as part of their most fundamental interface, hence the use of smart pointers when using COM from C++ is more or less obligatory. It is informative to watch as your program fails to release one little part of - say - Excel and thus fail to leave the machine in the state that it found it - i.e. with Excel still loaded!

Well, that's about it. Please don't treat this article as definitive. Most C++ publications are awash with much more detailed articles on these subjects.

May your objects lead long and fulfilling lives (and not live happily ever after)!

^[1] even better is to provide this facility as a property of a node object so that the constructor and destructor for AnObjectNode track ownership and delete any owned AnObject. Francis

^[2] Anyone who was involved in the saga of designing auto_ptr can attest to just how difficult it is to get any where near viable ownership semantics when ownership can be transferred. You finish up with such horrors as assignments that change their rhs and copy constructors that change the object being copied. Francis

Notes:

More fields may be available via dynamicdata ..