Title: Multithreading (2)

Author:

Date: 03 February 1999 13:15:29 +00:00 or Wed, 03 February 1999 13:15:29 +00:00

Summary:

Body:

To summarise the previous article: Don't do it if you don't have to!

More specifically, we looked at the risk of conflicting use of data objects and resources. We touched on external and internal locking. It should also have been apparent that established 'good things' such as encapsulation, weak coupling and avoiding static data all help to reduce these locking issues.

Next on my agenda of doom is yet another way that threads can fail.

Deadlock

This tragic condition can suddenly strike when you least expect it. Imagine two threads, each needing to share access to two objects. These objects need to be locked for overlapping periods with respect to each thread. Suppose the first thread locks the first object, does some work and then locks the second object. Also suppose that the second thread locks the second object and then locks the first object. I'm sure that you will have spotted that if the second thread locked the second object after the first thread locked the first, both threads will now be waiting for their next lock forever. That is, each thread is waiting for the other thread to release one of the two objects but neither will.

In some cases, this problem can be avoided by waiting for both locks in one blocking call. In Win32, the function is WaitForMultipleObjects, passing a 'wait for all objects' parameter. Not all systems have this facility though. There may also be other reasons why it is difficult to acquire all locks together. As with many multithreading problems, this one is best avoided by cunning design.

What makes deadlock so insidious is that often it isn't easy to spot. This is particularly true when some of the locking is hidden inside methods of objects where the lock applies to more than just the current instance. That may not be very clear, so let me try to expand on that thought.

Remember the choice between internal and external locking? As an example, internal locking can work very well for operations on a queue; the queue will only need to be locked when putting a message in, removing it or perhaps clearing messages. Each of these operations will need locking only on the queue instance and only require protection against each other. Each queue instance will have its own mutex instance.

You may make a case for external locking purely for the reason of performance, where you may want to add or remove several messages in a burst of activity. Nevertheless, internal locking would be the best and easiest for most cases.

Suppose instead that an operation required a lock on an external object that is shared by many instances of the class. The worst case is a static instance that is shared by all such instances. Now, we already know that static objects are dangerous when accessed by more than one thread, so naturally they must be protected.

Now consider the case where, during an operation on an instance of this class, the shared object is locked and during that lock a second thread performs an operation that happens to call an operation on another instance of the class in question. There will be no problem, except if the second thread happens to have obtained a lock on another static object and the first thread needs to obtain the lock on the same object. If you allow static objects to be accessed inside library or other widely used classes, the chances of this kind of deadlock arising are high if you aren't very careful.

There are ways to avoid such problems. The best is to eliminate the need to lock class wide resources as opposed to instance resources by not having class resources. An alternative is to ensure that your internal locks truly are internal. If they are held while calling into external objects' code, where you have no control over that code, there is a risk that that code could lead to a deadlock. Circumstances where you have no control over the external code include obvious ones such as virtual functions, callbacks and templated classes. Don't forget that overloaded operators and constructors may hold surprises as well as the more obvious method calls.

A last resort, if the class wide resource is necessary, is to use external locking on the resource. By that, I mean that the client of such objects must obtain a lock on the class wrapping all operations fully. This puts a burden on the user of your class and makes the locking quite coarse-grained but at least it will be thread safe.

Sergey Ignatchenko gave a good worked through example of a problem of this kind in the C++ Report article 'STL Implementations and Thread Safety'. There was a reprinted copy in the ACCU conference pack, which is how I came to read it.

Priorities

For most purposes, you can safely work with threads all running across an even playing field. What happens at the edge of the field is beyond the scope of this article. If you really need to assign different priorities to your threads, there are a couple of things to bear in mind.

The first is simply that if a high priority thread is always busy, no lower priority threads will get a look in (except by special dispensation from the operating system). This is a statement of the obvious really, just a reminder that threads should always be suspended or waiting when there's no work to do.

The term given to busy high priority threads hogging the machine is 'thread starvation'. This is a symptom of either a badly written program or an overloaded (or dying) machine.

The second is that if a high priority thread shares a resource with a low priority thread, then the low priority thread will stop the high priority thread running until it has released that resource. That is to say that low priority threads should lock such shared resources for the minimum time.

This problem is known as 'priority inversion'. A more complicated scenario is known as 'indirect priority inversion'. This involves one or more threads with an intermediate priority holding off the low priority thread whilst it has the above-mentioned resource locked. This is easily avoided by not allowing very low priority threads to access resources shared with very high priority threads, so I will say no more about it.

In practice, there are usually only three levels of priority that you are likely to consider using:

High priority threads will handle external events that need to be serviced promptly. The worst case of this is some kind of real-time system where buffers might overflow if left too long.
Medium priority threads to do the main work of the application.
Low priority (or idle) threads that to do a low priority task, such as updating the progress bar!

Stopping your threads

In effect, there are two kinds of threads, those that iterate repeatedly and those that simply perform some operation and finish naturally. The latter aren't very interesting.

In one scenario, you might want to allow the loop body to wait for data, then process the data. If you want the thread to process all the available data, then finish gracefully, then it must be able to determine that the data is finished before exiting the loop. If the thread is reading messages from a queue, the message could tell the thread to stop. That is, the producer thread puts a stop message into the queue and the consumer thread eventually gets it, possibly putting a stop message into the next queue if it's a producer itself.

This mechanism only works for the simple case of one producer, one consumer.

Another, more popular mechanism is to use a 'stop' event associated with the thread. This is set when you want to terminate the thread. The thread checks for this event by waiting for either the stop event or the object that signals that data is ready. After the wait has returned, it can check if the stop event was set and break the loop accordingly. Managing the closedown process can be done by first signalling all running threads to stop and then waiting on each thread. This is possible because threads become signalled once they are terminated.

Oh, I suppose I should mention that what you don't do is to call the TerminateThread API.

More threads

My favourite design concept with threads is to de-couple them using the idea of passing messages between threads using queues. I can get quite boring on the subject, it is my one solution to all threading problems. However, there are many ways that this basic concept can be used.

Many producers and consumers

There are many circumstances where you will need an indefinite number of threads that produce messages. One that commonly occurs is in a server where each client connection is serviced by one thread that receives data on that connection. Many applications actually handle the whole of the processing of that data within that thread. However, they will still have to fight amongst themselves for limited resources.

Often you may want to limit the number of threads that do the bulk of the processing. The main reason for this is that the resources available may be limited. For example, a database cache that has many threads making conflicting demands on it may well be run ragged trying to service them all. At the very least, it will require more memory to run smoothly.

On the other hand, if there are multiple processors available, you will want to take full advantage of them. A single thread handling all database requests will only be able to use one processor.

In either case, a dynamically changing number of threads will be running, servicing a varying number of client connections. These threads can be responsible for putting each message into one or more queues. There may be more than one type of consumer thread; each type will need its own queue.

There are two ways that I know of that a producer may signal that data is available. One is to set an event that is waited for by the consumer. The consumer may then eat all the queued messages before waiting again.

The more common alternative is to use a semaphore. This is incremented for each message that put into the queue and decremented for each wait completed by the consumer. Note that a plain event can be used in place of the semaphore if you prefer. The important feature is that after each wait, the thread only consumes just one message. This allows the workload to be shared amongst many consumer threads.

Because this architecture is so common, it is worth thinking about how to complete the circle and return data to the client.

If data is returned to the client in response to client requests only, the producer thread described above could in fact wait for the response from the consumer that handles the message. Thus, its loop would consist of receiving a message, putting it in a queue for processing, waiting for a response and then sending the response.

If on the other hand, the server sends data at random intervals to the client, there will need to be a separate sending thread created for each connection.

Thread pools

In the example above, the consumer threads are acting as a pool of threads. How many are active at one time depends on the load the machine is under. In this case, the number of these threads is determined by you, either a constant number or a number derived at run-time from the number of processors.

The client connection handling threads could also be created as a pool. This is advantageous if the connections are likely to be brief but frequent. There is an overhead involved in creating and destroying threads that may dominate the actual data processing.

In this case, two strategies are available. One is to create a fixed number of threads, setting a limit on the number of simultaneous connections that may be supported. This moves the thread creation overhead to the server startup. Another strategy is to dynamically manage the thread pool by allowing it to grow as incoming connections are made but not necessarily exiting the threads as the connections are closed. This is an interesting exercise in statistics!

You might also refine your strategy to refuse connections when you detect that the server is overloading. This may sound difficult but the nature of the message queuing gives you a clear indicator in the form of the number of queued messages waiting for the main consuming threads. I'm assuming that these are the threads that are doing most of the work.

In some systems, you may wish to make a single thread pool available for threads performing a multitude of tasks. This is easily achieved but I have to say that I've never felt the need.

Under Win32 on NT, an interesting alternative to the thread per connection scenario is to use 'I/O completion ports'. These allow you to handle the incoming data with a thread pool with far fewer threads than you would otherwise need. Essentially, your threads handle data as it arrives for any connection. This mechanism can also be used for outgoing data.

Threads are more robust?

I've seen this argument put forward on occasion. The idea is that by using many threads, if one fails, it won't take the whole application down with it. Interestingly, the same case is often made to justify the use of multiple processes. There may be truth in this. However, there are many ways that a thread stopping may interfere with the operation of the remainder, particularly if they do so while in possession of a shared resource. On the other hand, taking the connection handling thread as an example, if a connection fails and the thread waits for a long timeout, the other connections will be unaffected. Good use of exception handling may allow a thread to fail in such a way that the rest can stagger on.

Well, that's it!

Looking back, these articles were neither a tutorial nor a complete overview of multithreaded programming! I hope that I've highlighted most of the pitfalls and that the design ideas at least help.

One final checklist then:

Write reentrant functions where possible, at least drop all static data!
General-purpose library code should either be intrinsically thread-safe, or should be conditionally built in a thread-safe version. If it needs internal locking, make sure you get it right!
Beware of performance degradation caused by fine-grained internal locking.
Document the external locking requirements of your objects.
Design the threading aspects your applications at the highest level; never try to bolt threads onto existing code.
De-couple threads and limit the reach of threads by limiting the binding between objects used by different threads.

Notes:

More fields may be available via dynamicdata ..