Journal Articles

CVu Journal Vol 26, #6 - January 2015 + Programming Topics
Browse in : All > Journals > CVu > 266 (9)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Delayed Copy Pattern

Author: Martin Moene

Date: 06 January 2015 21:36:39 +00:00 or Tue, 06 January 2015 21:36:39 +00:00

Summary: Vassili Kaplan presents some techniques for making efficient use of the STL containers in C++.

Body: 

In time-critical financial applications it is very important to process market data messages as fast as possible. In this article we are going to take a look at different ways of adding big messages to a message queue implemented as an STL container. We will discuss advantages and drawbacks of each method and measure an average time it takes using each method. Finally, we will discuss why the new STL emplace_back method is so important for the time-critical applications.

Background and introduction

Consider an application that subscribes to the market data for a few financial instruments and then processes market data updates. These applications often implement the Publish-Subscribe design pattern [1].

The application subscribes to the market data events by registering a callback with a market data publisher (e.g. Bloomberg, Reuters, etc.). On each market data event this callback is called by the publisher. In real life applications there can be thousands of subscriptions to different instruments and some liquid instruments can trigger up to a hundred of data events per second (new bids, asks, trades on different markets, etc.).

One such application is a ‘Market maker’ [2] application: based on some underlying instrument price, interest rates, time to expiry, and some other parameters, it calculates the price of a derivative instrument and quotes (i.e. buys and sells) this derivative instrument on an exchange. An example instrument might be Apple stock and its derivative an option, e.g. the right to buy Apple stock for $100 in 6 months. Obviously, the price of the option depends heavily on the current Apple stock price (among other things); this price can change a many times per second.

It is very important to react to the changes in market as quickly as possible, recalculating the option price when the underlying price changes (otherwise there may be an ‘arbitrageur’ who can buy the derivative too cheaply from the Market maker one millisecond before the price is updated and then sell it back to him once the Market maker finally updates the price – note that Market makers are obliged to buy as well as to sell – they usually make money from the spread between a bid and an ask price).

Very often the Black-Scholes equation [3] is used to calculate option prices. Since recalculation of the option price might take some time, usually market data events are only collected on the callback thread and added to a message queue to be processed from another thread. It is done in order to release the market data event thread so the next event can be added to the queue (the market data publishers also expect you to release the callback thread as soon as possible).

As a message queue an STL container is often used, e.g. a vector [4].

In this article we are going to look at different ways to efficiently add a market data object to the message queue, implemented as a vector. The use of locks and synchronization is beyond the scope of this article.

The classical way of adding an object to an STL container

Consider the following callback method where we receive a huge object with a lot of fields. Suppose that we do not own this object and want to add it to a container (see Listing 1).

int currentId = 0;
vector<Normal> normal;

void normalTest(const Huge& huge)
{
  normal.push_back(Normal(++currentId, huge));
}
			
Listing 1

Suppose that the Normal structure is a wrapper over the Huge object and is defined as in Listing 2.

struct Normal
{
  Normal(int id, const Huge& huge) : m_id(id),
                                     m_huge(huge)
  {
    cout << "<-- Normal Constructor,
            id: " << m_id << endl;
  }
  ~Normal()
  {
    cout << "--> Normal Destructor,
            id: " << m_id << endl;
  }
  private:
    int         m_id;
    Huge        m_huge;
};
			
Listing 2

When running the normalTest above the output will be the following:

  <-- Normal Constructor, id: 1
  --> Normal Destructor, id: 1
  --> Normal Destructor, id: 1

Why was the constructor called only once, whereas the destructor was called twice?

The explanation is that the first time the Normal structure constructor was called was when passing the object to the vector’s push_back() method, and second time the default copy constructor of the Normal structure was called when actually adding the Normal object to the vector’s container. So all of the Huge object fields were copied in that copy constructor.

How can we avoid copying the fields twice?

Emplace_back method

Starting from the C++ 11 release a new emplace_back() [5] method is available. Instead of creating/copying an object twice, it is created just once:

  void normalTest(const Huge& huge)
  {
    normal.emplace_back(++currentId, huge);
  }

When comparing the speed of adding a really big object with emplace_back() it is considerably faster than the classical push_back we discussed earlier (see time measurements below).

The only drawback is that emplace_back is available only starting from C++ 11 on. If you have an older compiler you have to use something else.

Using a vector of pointers

Another optimization is to use a vector of pointers instead of the vector of objects on the stack. Then even though there is still a double copying when using the push_back() method, one of the extra copies is copying pointers, which is much cheaper than copying our Huge objects. Instead of using pointers and then explicitly deleting them, we can use a std::shared_ptr instead:

  vector<shared_ptr<Normal>> normalPtr;
  void normalPtrTest(const Huge& huge)
  {
    normalPtr.push_back(shared_ptr<Normal>
      (new Normal(++m_currentId, huge)));
  }

When measuring the performance of creating objects on the heap, it is already much better than having a vector of the objects on the stack that we saw first. But it is still not as good as the emplace_back().

How can we improve it for the cases when a C++ 11 compiler is not available so we cannot use the emplace_back()?

Adding a temporary field to the wrapper structure

Since the copy constructor is always called when the object is added to an STL container (from the push_back() method), we should do the actual copy of the big and expensive objects in the copy constructor, making the normal constructor very light-weight. But for this we need to temporarily keep a reference or a pointer to the object to copy. Let’s call this temporal reference m_hugePtr, which will point to the object to copy and will be later used in the copy constructor (see Listing 3).

struct Tricky
{
  Tricky(int id, const Huge& huge) : m_id(id),
     m_hugePtr(&huge)
  {
    cout << "<-- Tricky Constructor,
       id: " << m_id << endl;
  }
  Tricky(const Tricky& t) : m_id(t.m_id),
     m_huge(*t.m_hugePtr), m_hugePtr(&m_huge)
  {
    cout << "<-- Tricky COPY Constructor,
       id: " << m_id << endl;
  }
  ~Tricky()
  {
    cout << "--> Tricky Destructor,
       id: " << m_id << endl ;
  }
  private:
    Tricky& operator = (const Normal& other) {}
    void *operator new(size_t s) {}
    int         m_id;
    Huge        m_huge;
    const Huge* m_hugePtr;
};
			
Listing 3

Now, when running the same code with the Tricky objects:

  vector<Tricky> tricky;
  void trickyTest(const Huge& huge)
  {
    tricky.push_back(Tricky(++currentId, huge));
  }

As output we get:

  <-- Tricky Constructor, id: 2
  <-- Tricky COPY Constructor, id: 2
  --> Tricky Destructor, id: 2
  --> Tricky Destructor, id: 2

Note that it is easy to get into a real problem when this pattern is not used as shown above (e.g. copying a Tricky object when the underlying Huge object does not exist anymore, i.e. has been destructed). This is why we disabled creating the Tricky object on the heap by making the operator newprivate [6]. We also made the assignment operator private since it is not supposed to be used with this object.

Testing different ways of adding objects to the vector

For simplicity, suppose that the Huge object is implemented as in Listing 4 (in the real financial world it would have a lot of different double, integer, and string fields).

class Huge
{
public:
  Huge(const string& data = "",
     size_t numberElements = 0)
  {
    if (numberElements > 0)
    {
      m_data.reserve(numberElements);
      for (size_t i = 0; i < numberElements; i++)
      {
        m_data.push_back(data);
      }
    }
  }
  virtual ~Huge()	{}
private:
  vector<string> m_data;
};
			
Listing 4

Table 1 contains the results of running four tests discussed earlier on a 4-core Windows 8.1 machine with an Intel i5 1.8 GHz and 4 GB in RAM. Each test was run 100 times and the data has the CPU times in milliseconds when adding 250 Huge objects to the message queue, implemented as a vector.

Test name Mean Min Max Standard Deviation
Normal Copy 58.59 46 79 8.95
Emplace C++ 11 45.94 31 63 8.71
Shared pointers 69.65 46 94 10.99
Delayed Copy 46.36 31 71 8.71
Table 1

We can see that the new emplace_back() functionality should be used whenever a C++ 11 or a later compiler is available. When it is not available, for time critical applications the Delayed Copy pattern could be used with care.

Conclusions

For time-critical financial applications every millisecond counts so it is important to develop very fast running software, otherwise there can be some faster guys who might punish you for being too slow.

Adding a big structure containing a big object to an STL container can be more expensive than it might appear: the objects belonging to the structure are copied twice – first time when we create and initialize the structure and a second time when it is added to an STL container.

C++ 11 has an excellent solution with the new emplace_back method which creates an object in-place, so no double copy is made.

But what can we do if we have an earlier compiler?

One solution is to have a container of pointers to the objects on the heap instead of a container of objects on the stack. But it turns out that on Windows this approach is suboptimal for really big objects.

Another solution is to delay the copy of the big object until the copy constructor is called by from the vector’s push_back() method. But we need to keep a pointer to the big object. This pattern should be used with care – it is easy to get into a real problem when this pattern is not used as shown above. Also note that this pattern makes sense only for objects having a lot of underlying data, where an extra copy does take some considerable time.

Finally, the new emplace_back() feature available starting from C++ 11, permits not to use any potentially dangerous code and deliver a very efficient functionality that can be used in time-critical applications, so it should be used whenever possible.

References

[1] Publish-Subscribe Patternhttp://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern

[2] Market makershttp://en.wikipedia.org/wiki/Market_maker

[3] Black-Scholes Equationhttp://en.wikipedia.org/wiki/Black%E2%80%93Scholes_equation

[4] Standard Template Library Programmer’s Guidehttp://www.sgi.com/tech/stl/

[5] STL Vector’s emplace_backhttp://en.cppreference.com/w/cpp/container/vector/emplace_back

[6] Scott Meyers, More Effective C++: 35 New Ways to Improve Your Programs and Designs. Addison-Wesley Professional; 1996

Notes: 

More fields may be available via dynamicdata ..