Title: Multithreading

Author:

Date: 03 November 1998 13:15:28 +00:00 or Tue, 03 November 1998 13:15:28 +00:00

Summary:

Body:

Writing multithreaded applications has always had a reputation for being fraught with difficulties. Typically, multithreaded projects have proved to be hard to design and debug and require much more care than other programming tasks.

Not many people realise that if you understand a few simple principles and tackle the problems in the right way, all will be plain sailing.

That's because in order to believe that, you have to both be on medication and have an exceptionally optimistic outlook.

The safest approach is to maintain a state of constant fear and paranoia. Please bear this in mind while reading on.

Although the principles are fairly general, threading models vary from platform to platform. As I use Win32, I will stick with its terminology. On the other hand, this isn't intended as an API reference, so don't expect too much detail!

Some questions

What is multithreading?

In the context of this article, multithreading is pre-emptively switching amongst several flows of control within an application. Threads run side by side, nominally independent of each other. In practice, each thread is a procedure or function. For example, the main thread in a conventional application is your main or WinMain function. In the latter case, it probably consists of a message loop and very little else.

Should I use it?

This is a question expecting the answer 'yes', hoping for the answer 'no' and likely to get the answer 'why'. If you don't need it, don't do it. One of the more obvious uses of multithreading is to reduce the impact of background processing on applications with live user interfaces. In other articles I have described ways of utilising 'idle time' for this purpose. In many cases, coding the same functionality using additional threads may be significantly simpler in structure than that approach. In the context of server applications, you may be forced to use multiple threads to efficiently handle client connections for example.

My advice is to tread cautiously and not in the manner of a lemming on holiday. What's the quickest way to the beach guys?

What are the pros and cons?

As above, smooth operation of your user inter-face is one big advantage. Users appreciate a slick application that allows you to go on working whilst it effortlessly does all the hard work in the background. The downside is that such applications are inevitably more complex at the design level.

Also as above, the use of additional threads is mandated for high performance server applications managing multiple connections and for many real-time applications.

The primary difficulty is simply that multithreaded code is hard to write and hard to debug. It is an unforgiving environment for your code to run in. Testing and debugging are hampered by the fact that many faults occur rarely. You may find that a line of code fails once a week despite being executed many times a second for the rest of the time. For desktop applications this may be annoying and believe me, I use some that have just these kinds of bugs. For server applications that are required to run continuously, these bugs are showstoppers.

Threads will use data and resources that are also used by other threads. The potential for conflicts is high. For this reason, you have to provide protection for these shared resources so that these conflicts don't occur.

Your design will also be concerned with efficient use of each thread. No thread should be 'busy' unless it is really doing work, otherwise it will simply be wasting processor time.

How does it work?

An instance of an application is generally referred to as a process. It has its own address space and at least one running thread. Each thread exists in the same address space but runs on its own stack.

The operating system manages the switching of contexts between processes and between threads within each process. It is important to realise that context switching between 'heavyweight' processes (i.e. applications) is much more expensive than between 'lightweight' processes (i.e. threads within your application). Essentially, it is a bigger task to swap address spaces than merely to move from one stack to another.

The illusion of completely asynchronous operation may be no illusion at all on a multi-processor machine. However, although you can fine-tune an application to take into account the number of processors, there is no logical difference between single and multiprocessor systems.

Imagine that your application consists of a number of threads that work along these lines:

void MyDataProcessingThread() {
  while( !Finished() )
    if( SomeData() ) ProcessData();
}

This represents a naïve approach to threading. The thread goes flat out in a loop, constantly checking for data to process. When it finds data it processes the data and then keeps looking for more.

If this is a bad approach, you may well ask why the main thread in your application is all right, as it looks similar in structure. It is after all running a message loop, repeatedly calling the 'GetMessage' function. There is a conceptual difference however.

GetMessage allows a context switch if there are no messages waiting in the queue. This is just as it's always been. The difference in Win32 is that a context switch will also occur at any other time during the processing of each message. This is pretty transparent to the application, though you must bear it in mind in terms of sharing external resources such as files.

So we need to achieve a similar effect in our own threads. There are a number of ways of doing this. They all use a mechanism that blocks a thread in a call that waits for some kind of event to occur, e.g. more data available to process.

Without using any such mechanism, your thread will get time slices that last for periods in the region of a few milliseconds. These are long intervals in terms of running code but short enough to allow the illusion of smooth concurrency.

Conflicts

Although the operating system does a lot to protect one thread context from another, it can't protect resources that are accessed by more than one thread. Re-entrant code is essentially thread safe. More than one thread may access an object simultaneously, so long as it doesn't modify the object in any way.

On the other hand, modifications must be protected, by locking the object in some way. Such protective locks are normally referred to as 'mutexes' or mutual exclusion locks. The level at which this protection is performed depends on whether an object may be accessed within the process or between processes. The latter case is more expensive and requires a kernel object called a 'mutex' in Win32, the corresponding intra-process lock is termed a 'critical section' in Win32.

Both types of object are used in much the same way.

On lower level systems, such as in interrupt programming, we would disable the interrupt mechanism whilst performing a critical task that cannot be interrupted. Of course at the lowest level, the operating system is doing much the same thing. Such a crude mechanism isn't directly available to the applications programmer.

An example

Consider one of the simplest objects that we can have - an integer. A common situation where an integer may be written to and read from by different threads is where it is maintaining a reference count:

ReleaseMyHoldOnTheThing(){
  if(--theReferenceCount == 0)
    FreeTheThing();
}
AcquireAHoldOnTheThing(){
  ++theReferenceCount;
}

Crassly oversimplified of course. Nevertheless, this is a common scenario. Machine code generated by the compiler will vary according to whim and optimisation settings as well as the context of the above code, e.g. whether 'theReferenceCount' is a member of a class.

Although the act of incrementing the reference count in the 'acquire' case may be a single instruction (having obtained the address first), the 'release' case must be more complex.

For example, in very pretend assembler:

Get the address of the count in a register
Decrement the integer at that address
Compare the integer at that address to zero
Jump past instruction 5 if it wasn't zero
Call FreeTheThing

Now suppose that the reference count is two and a thread executes the above code. Suppose that it executes instruction 2, then the scheduler interrupts this thread and executes other threads that are waiting. One of these threads also executes the ReleaseMyHoldOnTheThing function and decrements the count to zero and consequently calls FreeTheThing.

Eventually, the first thread resumes at instruction 3, and reads the zero (or possibly crashes if FreeTheThing was in fact 'delete this'). Then of course, it calls FreeTheThing and inevitably crashes then or very shortly, having corrupted the heap or whatever.

A term that is often applied to this situation is that the thread enters a 'race condition' when it executes instruction 2. That is, it is now in a race to get to the next instruction before the other thread gets a look in. Normally, it will win that race but if it loses then disaster will strike.

I chose this simple example to highlight the fact that even the simplest line of code may not be atomic in terms of context switching. For a single processor machine, the granularity is simply that of processor instructions. The processor cannot be interrupted during the execution of an instruction. I don't know what the precise rules are in a multiprocessor environment with regard to bus locking etc. However, the effect is much the same!

The cheapest available mechanism for this code would be InterlockedDecrement in Win32. It is worthwhile familiarising oneself with the full set of interlocked functions as they can be used to effi-ciently protect access to integers in several ways.

Two Strategies

There are two strategies that can be applied when you design your application to ensure thread safety.

The first and most exciting, is to ignore thread safety during the design phase and then bolt on some protection after you've written the bulk of the code. It's best to look for another job at this point so as not to be associated with a failed project, at least until after you're safely established in your new role.

Assuming that you want your project to succeed, make threading issues central to your design.

Think of your design as a set of objects that will be acted upon by a number of threads. The threads will each reach out and touch a subset of those objects. Some objects will be accessed by only one thread. Others will be accessed by more than one thread. These shared objects are obviously the ones that will need to be protected.

Clearly, if the number of these shared objects is minimised, it will be easier to protect them. Similarly, if their interfaces are simple, your job will be easier.

Incidentally, code that is re-entrant is generally thread-safe. Equally, static data and objects that are accessed by many different parts (or threads) of your program are inherently unsafe.

So, surprise, surprise, established good coding practice is also good for avoiding threading conflicts.

Object instances that are each accessed by more than two threads are likely to be particularly difficult to deal with.

De-coupling threads

Think about the nature of objects that are touched by more than one thread. If these are of classes that reference other objects due to associations in your basic class design, then the potential reach of each thread must be extended to include those objects.

Often, you can avoid a lot of difficulties by restricting the interactions between threads to one or more queues of messages. These messages may be of any type you like. Thread synchronisation can be limited to each thread's dealings with the queue. A thread either produces or consumes each message, with a clear separation between the two phases. That is, a consuming thread will not get a chance to access a message until the producer has finished with it.

You may be thinking that this sounds rather like pipes of data and you would be right! Indeed, using pipes is a legitimate and versatile means of achieving the above design.

Internal and external locking

Objects that need to be protected against multiple access will typically contain their own locking objects. For example, a list must be locked while an insert or remove operation is happening. Likewise, it must be locked while being iterated from end to end. The former operations may be protected inside those methods. The iteration situation is easier to deal with externally.

A queue on the other hand will be easy to protect within its own methods. It is list like but wouldn't normally need to be iterated. One advantage of internal locking is that it can be very fine grained, so won't lock objects for any longer than they need to be. This is also a significant disadvantage of internal locking. There may be much more overhead with the more frequent locking and unlocking, compared to a longer lock that encloses a series of operations.

The biggest advantage of internal locking is that it makes use of that object much simpler. This is a desirable feature in a class library but may be less useful otherwise.

There are some dangers with internal locking that may lock objects beyond the current instance. I'll have to explain why next time.

That's all for now.

Even at this very general level, there's still a lot more ground to cover, so I'll try and fill in some of the gaps next time.

If you haven't already read them, look up the excellent Win32 series in C Vu 8.5 to 10.2 by Simon Wood that cover multithreaded aspects of Win32 in much more detail than is intended here.

Notes:

More fields may be available via dynamicdata ..

Journal Articles