ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pin‘No Bugs’ Top Five C++ Cooking Recipes

Overload Journal #113 - February 2013 + Programming Topics   Author: Sergey Ignatchenko
Developers often have a few favourite tricks for solving problems. Sergey Ignatchenko shares his five top recipes.

Disclaimer: as usual, the opinions within this article are those of ‘No Bugs’ Bunny, and do not necessarily coincide with the opinions of the translator or the Overload editor. Please also keep in mind that translation difficulties from Lapine (like those described in [Loganberry04]) might have prevented providing an exact translation. In addition, both the translator and Overload expressly disclaim all responsibility from any action or inaction resulting from reading this article.

Today, I’ve decided to recall that I’m a developer (at least at heart), and pause my usual philosophical blabber about the place of developers in the Universe to talk about actual programming a little bit.

Most (if not all) of the stuff in this article is rather well-known, however, such things are frequently overlooked, so I feel that they are worth mentioning once again.

Recipe #5 – set<>, sort(), and ‘strict weak ordering’

With sets/maps/sorts there is a well-known headache: how to write a compliant operator <() (or a functor). It is known that for classes involved in map<>/set<>, operator <() must comply to a so-called ‘strict weak ordering’; if this is violated, all kinds of weird things can happen (from identical multiple entries in a supposedly-unique collection, to memory corruption). Unfortunately, looking at an arbitrary operator <(), it is usually rather (or very) difficult to tell if it complies with ‘strict weak ordering’. Recipe #5 shows how to cook an operator <() which does comply with ‘strict weak ordering’ and therefore can be safely used with sets/maps (and also with sorts etc.) – see Listing 1.

class X {
  int i;
  int j;
  string s;
  Y y;
  ZZ zz;

  bool operator <( const X& other ) const {
    if( i != other.i )
      return i < other.i;
    if( s != other.s )
      return s < other.s;
    //it is ok to ignore some fields
    return false;
  }
};
			
Listing 1

operator <() in Listing 1 is guaranteed to comply with ‘strict weak ordering’ (assuming that nobody has redefined comparison for strings).

If necessary, class members can also be included into the comparison (provided that their respective operators <() are also compliant with ‘strict weak ordering’):

  bool yl = y < other.y;
  bool yg = other.y < y;
  if( yl || yg ) // this is just a way to write
                 // "y != other.y"
    return yl;

In addition, functions can also be used (as long as their arguments are constant):

int fthis = f( i );
int fother = f( other.i );
if( fthis != fother )
  return fthis < fother;

Multi-parameter functions are also possible (as long as all arguments are constant, and as long as they’re used exactly as shown below):

  int fthis = f( i, s );
  int fother = f( other.i, other.s );
  if( fthis != fother )
    return fthis < fother;

It is very important to note that while in any of the above examples, ‘strict weak ordering’ is guaranteed, any deviations from the forms described above, can be deadly. For example, if you replace return false; with innocent-looking return true;, it would break the code (as for certain a and b our operator <() would return that a < b is true, and simultaneously that a > b is true, which is obviously wrong). In another example, comparing f(i, j) with f(other.j, other.i) would be risky (while comparing f(i, j) with f(other.i, other.j) is guaranteed to be ok under the conditions stated above).

Recipe #4 – Measuring run-time performance

In production code, it is often desirable to gather statistics to measure code performance under real conditions. The problem is how to gather statistics without hurting performance too much. Here are a few tricks to help deal with this issue.

First, for statistical purposes it is usually not necessary to use any kind of thread synchronization. Yes, if you’re writing plain stats.nCalls++ without using mutexes or atomics, you might get an error if two threads try to modify the same stats.nCalls simultaneously; however, if all the statements modifying stats.nCalls, are incremental ones (like the one above) the error from a single race condition cannot exceed 1. In addition, the chance of race conditions is very slim (even if you try really hard, getting more than 1/100 of assignments to have race conditions would be difficult, and in practice it is usually more like 1e-4 to 1e-5 or so). We can usually say (given enough calls) that statistical errors due to race conditions are negligible.

Another issue which often arises in relation to measuring run-time performance is time measurement. The problem here is the following: usually the time frames to be measured are very small (often within a microsecond), and high-precision methods available to measure time are not always readily available. (For example, an x86 RDTSC instruction was observed to provide incorrect results on certain multi-socket hardware due to incorrect implementation of the motherboard.) So, what should you do if you need to measure how long certain function takes (in microseconds), but all you have is a very low-precision timer (like a 15-milliseconds timer)? Apparently, if you have lots and lots of calls to your function, it is often ok to take time measurements using a low-precision timer, and simply add them up (using something like stats.callTime += deltaTime;. It might be even better to reduce performance impact, if( __builtin_expect( deltaTime, 0 ) ) stats.callTime += deltaTime; – potentially without synchronization, as described above). If you’re measuring a time interval of 1.5 microseconds, and have a timer which has precision of 15 milliseconds, you will get deltaTime == 0 in 99.99% of the cases, but apparently, after averaging a million such calls, results are usually surprisingly precise (the precision I’ve observed in practice was about ±20%, which is much better than one would expect intuitively given the numbers above). While precision is not guaranteed and your mileage may vary, if you don’t have any other options on the table – for example, because RDTSC doesn’t work properly on your multi-socket hardware – it is certainly worth a try.

Recipe #3 – _set_se_translator

There is only one thing which I think is fundamentally better with the Microsoft Visual C++ compiler than with GCC, and it is the _set_se_translator. In many heavily loaded server-side cases, it really saved me from becoming a rabbit stew. The idea behind _set_se_translator is to convert SEH exceptions (like access violations, divisions by zero, etc.) into C++ exceptions, which then are handled in the usual C++ way (with destructors called etc.); details can be found on the MSDN page on _set_se_translator [MSDN].

In particular, such a thing is extremely useful if you have a state-machine-based server processing incoming messages. If an access violation (or division by zero) happens during the processing of one message, and it doesn’t cause any memory damage (which is very common with access violations and always happens with division by zero), it is usually perfectly ok to ignore such a message and continue to serve the others without crashing.

One should note that using _set_set_translator and catching resulting C++ exceptions is very different from trying to catch SEH exceptions with catch( ... ). When trying to catch SEH with catch( ... ), no C++ destructors are called between the point where SEH is raised and where it is caught; this can cause all kinds of weird effects (and if you’re using destructors to remove mutex locks, forget catching SEHs with catch( ... )). On the contrary, when using _set_se_translator, SEH is converted into a C++ exception right where SEH occurs, so it is a C++ exception which is thrown. All destructors from the point where SEH was raised, to the point where the C++ exception is caught, are properly called, with much better results (unless memory has already been corrupted by the point where SEH has occurred).

Unfortunately, the last time I checked (which was admittedly a few years ago) similar functionality was not available for *nix C++ compilers, including GCC. Theoretically, it might be possible to throw a C++ exception from the signal handler, but this functionality is platform-dependent and in practice, I wasn’t able to find a platform where it works :-(.

Recipe #2 – Containers with ‘move’ semantics

Recipe #2 is admittedly a rather weird one, but every good cookbook should contain at least one weird recipe, so here goes. In high-performance code, there are scenarios, where you need to have a container which stores complex objects (such objects including allocated storage etc.), but those objects are only moved around and are never copied. Providing a copy constructor for such objects can be either difficult (for example, if such an object includes a file handle) or undesirable for performance reasons (if such an object contains allocated memory, copying which would be expensive). One common way to deal with this is to use some kind of reference-counted pointer (or something similar to auto_ptr<>); this is a viable option, but it has the associated cost of extra allocation/deallocation, and in really high-performance code, this might be an issue. In such cases, an approach similar to the following could help (rephrasing a proverb, you could say that weird times require weird measures) – see Listing 2.

//class X is our complicated class

class StackOfX {
  // stack is just one example; any other type of
  // container (including maps, sets, lists, etc.)
  // can be written in a similar manner
  struct ShadowX { char data[ sizeof(X) ]; };
  typedef vector< ShadowX > ShadowV;
  ShadowV v;

  void push( /* move-in */ X& x ) {
    ShadowX& sx = (ShadowX&)x;
    v.insert( v.end(), sx );
  }
  const X& operator[]( int i ) const {
    return (const X&)v[ i ];
  }

  void pop( /* move-out */ X& x ) {
    ShadowV::iterator it = v.end() - 1;
    ShadowX& sx = (ShadowX&)x;
    sx = *it;
    v.erase( it );
  }

  ~StackOfX() {
    for( ShadowV::iterator it = v.begin();
         it != v.end(); ++it ) {
      X& x = (X&)(*it);
      x.X::~X();
    }
  }
};
			
Listing 2

While other operations on such a container may be added in a similar way, it is extremely important to keep all such operations within this class, and not to expose any operations of an underlying ‘shadow’ container to outside world without ensuring the integrity of our StackOfX container.

Recipe #1 – asserts

And my top place goes to a very simple, but extremely useful, recipe, related to asserts. In the C/C++ world, everybody knows about asserts; the problem with a standard assert() is that it calls abort() if assertion fails, which is often a bit too much.

The idea of this recipe is very simple – to create a MYASSERT macro which, if violated, throws an exception.

  #define MYASSERT( cond ) \ 
    (void)( ( cond ) || ( throw MyAssertException \
    (   #cond, __FILE__, __LINE__ ), 0 ) )

What exactly your MyAssertException class should be, where to place it in the exception hierarchy, and how to use file name, line number, and the assertion failure is up to you. For example, if you’re concerned about revealing information about your code to the customer, you can easily omit #cond in the production compile, and track what has happened in production based only on file/line. (You do have tags in your source control for all the versions released to production, don’t you?)

Often, several such MYASSERTs are introduced (MYASSERT, MYASSERT2, MYASSERT3, MYASSERT4, etc.) to indicate the level at which the check is performed. For example, you can easily create a header which would behave as follows: if you define MYASSERTLEVEL=2, then all MYASSERTs with level <= 2 are checked, and all the others are ignored:

  #if MYASSERTLEVEL >= 2
    #define MYASSERT2( cond )  \
    (void)( ( cond ) || ( throw MyAssertException \
    ( #cond, __FILE__, __LINE__ ), 0 ) )
  #else
   #define MYASSERT2() ((void) 0)
  #endif

Apparently, it is often a good idea to leave at least some of the MYASSERTs in the production code (especially if you run it on your own server or have a mechanism to send logs back to you). If you find that occasionally certain MYASSERTs in production mode fail, this clearly warrants an investigation.

In addition, if you’re using 3rd-party libraries which use the standard assert() internally, it is often a good idea to replace the standard assert() with MYASSERT() to avoid situations when the whole server calls abort() because some assertion within a 3rd-party library has failed (while recovery was still perfectly possible). How you do it depends on the specifics of your project, but it usually can be done either via redefining standard assert() and recompiling the 3rd-party library, or if assert() calls some helper function on your platform, this helper function can often be replaced to achieve the desired effect.

References

[Loganberry04] David ‘Loganberry’, Frithaes! – an Introduction to Colloquial Lapine!, http://bitsnbobstones.watershipdown.org/lapine/overview.html

[MSDN] ‘_set_se_translator’ http://msdn.microsoft.com/en-us/library/5z4bw5h5%28v=vs.110%29.aspx

Acknowledgement

Cartoon by Sergey Gordeev from Gordeev Animation Graphics, Prague.

Overload Journal #113 - February 2013 + Programming Topics