    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: Ultra-fast Serialization of C++ Objects</title>
        <link>https://members.accu.org/index.php/articles/2317</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>




<div class="xar-mod-head"><span class="xar-mod-title">Programming Topics + Overload Journal #136 - December 2016</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c65/">Programming</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c78/">Overload</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c368/">o136</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/c65-368/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/articles/c65+368/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;Ultra-fast Serialization of C++ Objects</h1>
<p><strong>Author:</strong>&nbsp;Martin Moene</p>
<p>
<strong>Date:</strong> 06 December 2016 20:40:21 +00:00 or Tue, 06 December 2016 20:40:21 +00:00</p>
<p><strong>Summary:</strong>&nbsp;Serialising and de-serialising is a common problem. Sergey Ignatchenko and Dmytro Ivanchykhin demonstrate one way to do this quickly.</p>
<p><strong>Body:</strong>&nbsp;<p class="EditorIntro">Disclaimer: as usual, the opinions within this article are those of â€˜No Bugsâ€™ Hare, and do not necessarily coincide with the opinions of the translators and <em>Overload</em> editors; also, please keep in mind that translation difficulties from Lapine (like those described in [<a href="[Loganberry04]">Loganberry04</a>]) might have prevented an exact translation. In addition, the translator and <em>Overload</em> expressly disclaim all responsibility from any action or inaction resulting from reading this article.</p>

<h2>Task definition</h2>

<p>Recently, we were working on a system which required an extremely fast (ideally, the fastest possible) serialization of the state of the Reactor/Finite State Machine (FSM). In addition, we knew for sure that deserialization would happen with exactly the same executable; in other words, we didnâ€™t care at all about either (a) cross-platform issues or (b) extensibility.</p>

<h3>Where it came from</h3>

<p>The whole task comes from exploiting deterministic Reactors/FSMs. As discussed in [<a href="#[NoBugs15]">NoBugs15</a>] and [<a href="#[NoBugs16]">NoBugs16</a>], as soon as we have a deterministic Reactor/FSM, it is possible to use this determinism to achieve such things as production post-mortem analysis, and low-latency fault tolerance. For example, for post-mortem analysis, it is sufficient to write all the inputs of the deterministic Reactor/FSM, and in the case of a crash to replay it from the very beginning.</p>

<p>On the other hand, keeping the whole history of the Reactor inputs is usually impractical, so we need to resort to some kind of â€˜circular bufferâ€™ [<a href="#[NoBugs15]">NoBugs15</a>]. To be able to observe the last <em>N</em> seconds of the life of the Reactor/FSM <em>before</em> the crash, the â€˜circular bufferâ€™ needs to contain (a) a snapshot of the current state of the Reactor/FSM, and (b) all the inputs received after this snapshot is taken. To achieve low-latency determinism-based fault tolerance, the logic is more complicated, but the snapshot of the current state is still required.</p>

<p>And as soon as weâ€™ve said â€˜we need to make a snapshotâ€™, we need to serialize our state one way or another. Moreover, we need to do it Damn Fast â€“ otherwise this debugging/fault tolerance feature would become too expensive. On the positive side â€“ in practice, serialization will happen to memory (and in case of a production post-mortem â€“ it wonâ€™t even be used in any way until program crashes), so weâ€™ll be dealing with purely serialization code, with very little overhead to mask any of our performance blunders.</p>

<p>Note that in both these cases we can be 100% sure that weâ€™ll be deserializing this state on the executable which is <em>identical</em> to the executable which serialized the state. In other words, all the usual serialization/marshalling problems such as different alignments, endianness, etc. â€“ do NOT apply here. :-)</p>

<p>One more case when we know for sure that it is <em>exactly</em> the same executable is when weâ€™re serializing data for inter-thread transfers within the same process; as a result, techniques discussed below will work in this case too. However, whether our serialization is optimal in such scenarios is not that obvious. In some cases â€“ specifically, if you do NOT need to reconstruct a modifiable state on receiving side and are just passing messages around â€“ flattening techniques such as those by FlatBuffers, MAY still happen to be faster (on the deserialization side, that is).</p>

<h2>The fastest way to serialize â€“ C</h2>

<p>Now, as we have our task defined as â€˜the fastest possible serialization for in-memory structure, assuming that it will be deserialized by exactly the same executableâ€™, we can start thinking about implementing it.</p>

<p>First, letâ€™s consider serializing a state in a C program.</p>

<p>Usually, FSM/Reactor state can be described as a kind of generalized tree, with each of the nodes being a C struct, and containing â€˜owningâ€™ pointers to other allocated C structs. As a simple example, see Listing 1.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
struct Y {
  int yy;
};
struct X {
  int xx;
  struct Y* y; // allocated via 
               // malloc(); 'owning' pointer
  int z;
};
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 1</td>
	</tr>
</table>

<p>And the fastest way to serialize struct X, will be something along the lines of Listing 2. Unless weâ€™re resorting so some trickery with allocators or â€˜flatteningâ€™ of our original structure, it is extremely difficult to beat this code performance-wise.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
struct OutMemStream {
  uint8_t* pp;
  uint8_t* ppEnd;
};
inline void writeToStream( OutMemStream* dst, 
    void* p, size_t sz ) {
  assert( dst-&gt;pp + sz &lt; ppEnd );
    //in the real-world, think what to do here
  memcpy( dst-&gt;pp, p, sz );
  dst-&gt;pp += sz;
}
void serializeX( OutMemStream* dst, X* x ) {
  writeToStream( dst, x, sizeof(X) );
  writeToStream( dst, x-&gt;y, sizeof(Y) );
  //that's it!
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 2</td>
	</tr>
</table>

<p>Deserialization would work along the lines of Listing 3. Deserialization is inevitably slower than serialization (there is an expensive <code>malloc()</code> within, ouch) â€“ but it is pretty much inevitable for the kind of data structure weâ€™re working with. Also, for our use cases described above, deserialization will happen MUCH more rarely than serialization (on program crash or on hardware catastrophic failure), so we donâ€™t really care too much about the performance of deserialization â€“ we just need it to work.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
struct InMemStream {
  uint8_t* pp;
  uint8_t* ppEnd;
};
inline void readFromStream( InMemStream* src, 
    void* p, size_t sz ) {
  assert( src-&gt;pp + sz &lt; ppEnd );
  memcpy( p, src-&gt;pp, sz );
  src-&gt;pp += sz;
}
void deserializeX( SomeMemStream* src, X* x ) {
  readFromStream( src, x, sizeof(X) );
    // x-&gt;y contains garbage at this point(!)
    // ok, not exactly garbage - but a pointer
     // which is utterly invalid in our current space
  x-&gt;y = malloc( sizeof(Y) );
    //phew, no garbage anymore
  assert( x-&gt;y );
  readFromStream( src, x-&gt;y, sizeof(Y) );
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 3</td>
	</tr>
</table>

<h2>From C to C++</h2>

<h3>C++ serialization</h3>

<p>Ok, now letâ€™s try to rewrite it into C++ (where weâ€™re no longer restricted to Plain Old Data a.k.a. POD). To make things closer to reality, letâ€™s serialize the class X in Listing 4, which contains (directly or indirectly) two <code>std::string</code>s, a <code>std::vector</code>, and a <code>std::unique_ptr</code>.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
class OutMemStream {
  public:
  inline void write( const void* p, size_t sz );
    // implemented along the lines of the
    // writeToStream() above
  inline void writeString( const std::string&amp; s )
  {
    size_t l = s.length();
    write( &amp;l, sizeof(size_t) );
    write( s.c_str(), l );
  }

  template&lt;class T&gt;
  inline
  void writeVector( const std::vector&lt;T&gt;&amp; v ) {
    // NB: can be further optimized by writing the
    // whole v.data() at once.
    size_t sz = v.size();
    write( &amp;sz, sizeof(size_t) );
    for( auto it : v )
      it.serialize( this );
  }
};
class Y {
  public:
  int yy;
  std::string zz;
  std::string zz2;
  void serialize( OutMemStream* dst ) const;
  Y( const PreDeserializer&amp; );
     //pre-deserializing constructor, see below
  Y( InMemStream* src );
     //deserializing constructor
};
class X {
  int xx;
  std::unique_ptr&lt;Y&gt; y;
  std::vector&lt;Y&gt; vy;
  void serialize( OutMemStream* dst ) const;
  X( const PreDeserializer&amp; ) const;
     //pre-deserializing constructor
  X( InMemStream* src );
     //deserializing constructor
};
void Y::serialize( OutMemStream* dst ) const {
  dst-&gt;write( this, sizeof(Y) );
  dst-&gt;writeString( zz );
  dst-&gt;writeString( zz2 );
  // NB: we do NOT serialize POD members 
  // such as 'yy' separately
}
void X::serialize( OutMemStream* dst ) const {
  dst-&gt;write( this, sizeof(X) );
  y-&gt;serialize( dst );
  dst-&gt;writeVector( vy );
  // NB: we do NOT serialize POD members 
  // such as 'xx' separately
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 4</td>
	</tr>
</table>

<p>Once again, it is very difficult to beat this serialization (that is, unless playing some dirty tricks with flattening or allocators). Nonetheless, it contains all the necessary information (in fact, a little bit more than that) to deserialize our object when/if we need it.</p>

<h3>C++ deserialization â€“ Take 1</h3>

<p>However, deserialization in C++ is not going to be that simple. The problem here is that as we didnâ€™t store data on a per-field basis, which means that unless we do something, on deserialization weâ€™ll be overwriting â€˜owningâ€™ pointers with their values in the old program (and rewriting this garbage with a pointer to allocated data later). While this was ok for POD types in C, in C++ it can cause all kinds of trouble (such as an attempt to free a non-allocated pointer) unless weâ€™re careful. The approach in Listing 5, however, is very clean in this regard.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
class PreDeserializer {
};  // just an empty class, to be used as a flag
    // to constructor

class InMemStream {
  uint8_t* pp;
  uint8_t* ppEnd;
  public:
  inline void read( void* p, size_t sz );
    // implemented along the lines 
    // of the readFromStream() above
  inline void constructString( std::string* s ) {
    size_t l;
    read( &amp;l, sizeof(size_t) );
    assert( pp+l &lt; ppEnd );
    new( s ) std::string(
      reinterpret_cast&lt;const char*&gt;(pp), l );
    pp += l;
  }

  template&lt;class T&gt;
  inline void constructVector( std::vector&lt;T&gt;* v )
  {
    size_t sz;
    read( &amp;sz, sizeof(size_t) );
    new( v ) std::vector&lt;T&gt;;
    for( size_t i=0; i &lt; sz ; ++i ) {
      v-&gt;push_back( T( this ) );
    }
  }
};
Y::Y( const PreDeserializer&amp; ) {
  // here we need to construct a valid object
  // just ANY valid object, preferably the 
  // cheapest one to be constructed-destructed,
  // as it will be destructed right away :-)
}

Y::Y( InMemStream* src ) {
  // at this point 'zz' and 'zz2' are already
  // constructed we cannot call src-&gt;read(this) as
  // it will overwrite valid 'zz'/'zz2' causing all
  // kinds of trouble.
  zz.~basic_string&lt;char&gt;(); 
  //no idea why zz.~string() doesn't work
  zz2.~basic_string&lt;char&gt;();
  // now 'zz'/'zz2' are no longer constructed,
  // and we can overwrite them safely. On the 
  // other hand, starting from this point, we're
  // NOT exception-safe
  src-&gt;read( this, sizeof(Y) );
  //at this point 'zz'/'zz2' contain garbage
  src-&gt;constructString( &amp;zz );
  src-&gt;constructString( &amp;zz2 );
  // phew, no garbage anymore,
  // 'this' is once again a valid object
  // and we're again exception-safe
}

X::X( const PreDeserializer&amp; ){
  // nothing here; we do NOT really need 
  // anything from here
}

X::X( InMemStream* src ) {
  // at this point 'y' is already constructed
  // we cannot call src-&gt;read(this) as it will
  // overwrite valid 'y' and 'vy' causing all
  // kinds of trouble.
  vy.~vector&lt;Y&gt;();
  y.~unique_ptr&lt;Y&gt;(); 
  // now 'y' and 'vy' are no longer constructed,
  // and we can overwrite them safely. On the other
  // hand, starting from this point, we're NOT
  // exception-safe
  src-&gt;read( this, sizeof(X) );
  //at this point 'y' and 'vy' contain garbage
  new(&amp;y) std::unique_ptr&lt;Y&gt;( new Y(src) );
  src-&gt;constructVector( &amp;vy );
  // phew, no garbage anymore, 
  // 'this' is once again a valid object
  // and we're again exception-safe
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 5</td>
	</tr>
</table>

<p>Overall, deserialization of a class T goes as follows:</p>

<ul>
	<li>We construct an object of our class T, constructing all its non-POD members using Pre-Deserialization constructors (we donâ€™t need to construct the members at all, but there is no way to avoid it in C++)</li>
	
	<li>Within the object deserializing constructor, we have the following â€˜sandwichâ€™:
		<ul>
			<li>We destruct all non-POD members by explicitly calling their respective destructors. It gives us the right to overwrite them.</li>
			
			<li>We overwrite the whole object T via <code>memcpy()</code>. At this point, non-POD members will contain garbage (more precisely, pointers which are invalid in our current space).</li>
			
			<li>We re-construct all the non-POD members via their deserializing constructor. No garbage anymore, and weâ€™re ready to go. ïŠ</li>
		</ul>
	</li>
</ul>

<p>Our Take 1 approach will work well â€“ that is, until we need to deal with base classes, and especially polymorphic classes. :-( Polymorphic objects, among other things, contain a so-called â€˜Virtual Table Pointerâ€™, and overwriting it almost universally qualifies as a â€˜pretty bad ideaâ€™. :-( Which leads us to the following...</p>

<h3>C++ deserialization â€“ Take 2, inheritance-friendly</h3>

<p>Letâ€™s consider the same classes X and Y, with class X having a <code>unique_ptr&lt;Y&gt;</code>, but letâ€™s say that Y is a polymorphic base class, so <code>unique_ptr&lt;Y&gt;</code> can be either an instance of Y, or an instance of YY.</p>

<p>Strictly speaking, our original serialization already has all the information we need; however, extracting it can be quite cumbersome without knowing the exact class layout (and this is compiler-specific). So, weâ€™ll modify our serialization a bit (see Listing 6).</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
class Y { //polymorphic base
  public:
  int yy;
  std::string zz;
  std::string zz2;

  void 
  polymorphicSerialize( OutMemStream* dst ) const;
  void serialize( OutMemStream* dst ) const {
    dst-&gt;write( this, sizeof(Y) );
  serializeAsBase( dst );
  }
  void serializeAsBase( OutMemStream* dst ) const
  {
  // non-POD ONLY for serializeAsBase()
    dst-&gt;writeString( zz );
    dst-&gt;writeString( zz2 );
  }
  explicit Y( InMemStream* src );
  explicit Y( const Y* that );
    // constructor from struct serialized by
    // child class
  void deserializeAsBase( InMemStream* src );
  static std::unique_ptr&lt;Y&gt; 
    polymorphicCreateNew( InMemStream* src );

  virtual size_t serializationID() const 
  { return 0; }
  virtual ~Y() {}
};
class YY : public Y {
  public:
  int yy2;

  void serialize( OutMemStream* dst ) const {
    dst-&gt;write( this, sizeof(YY) );
    Y::serializeAsBase(dst);
  }
  explicit YY( InMemStream* src );
  
  virtual size_t serializationID() { return 1; }
};

void Y::polymorphicSerialize( OutMemStream* dst )
{
  size_t id = serializationID();
  dst-&gt;write( &amp;id, sizeof(size_t) );
  serialize( dst );
}

class X {
  int xx;
  std::unique_ptr&lt;Y&gt; y;
  std::vector&lt;Y&gt; vy;

  void serialize( OutMemStream* dst ) const;
  X( InMemStream* src );
    //deserializing constructor
};

void X::serialize( OutMemStream* dst ) const {
  dst-&gt;write( this, sizeof(X) );
  y-&gt;polymorphicSerialize( dst );
  dst-&gt;writeVector( vy );
  // we still do NOT serialize non-POD objects
  // explicitly in Take 2, we will deserialize
  // them explicitly though
}

			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 6</td>
	</tr>
</table>

<p>Here, weâ€™re sacrificing a tiny bit of performance on serialization (sigh) to keep things very cross-platform and not to depend on the exact class layout; on the other hand, the penalty here is pretty small (weâ€™re speaking at most about 1â€“2 CPU clocks plus a pipeline stall per <code>polymorphicSerialize()</code>, though in practice usually it will be much less than that due to branch predictions).</p>

<p>Now to deserialization. When deserializing inherited/polymorphic objects (and letâ€™s not forget about multiple inheritance and virtual bases) we cannot really overwrite the whole object without the risk of overwriting virtual table pointer(s)<a href="#FN01"><sup>1</sup></a>. As a result, the best way we can see for deserializing such objects is on a per-field basis (see Listing 7).</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
class InMemStream {
  uint8_t* pp;
  uint8_t* ppEnd;

  public:
  inline void read( void* p, size_t sz );
    //same as before
  inline void* readInPlace( size_t sz ) {
    assert( pp + sz &lt; ppEnd );
    void* ret = pp;
    pp += sz;
    return ret;
  }
  inline void* fetchInPlace( size_t sz ) const {
    assert( pp + sz &lt; ppEnd );
    return pp;
  }
  inline std::string readString() {
    size_t l;
    read( &amp;l, sizeof(size_t) );
    assert( pp+l &lt; ppEnd );
    pp += l;
    return std::string( 
      reinterpret_cast&lt;const char*&gt;(pp - l), l );
  }
  template&lt; class T &gt;
  inline void readVector( std::vector&lt;T&gt;&amp; v ) {
    size_t sz;
    read( &amp;sz, sizeof(size_t) );
    v.clear();//just in case
    for( size_t i=0; i &lt; sz ; ++i ) {
      v.push_back( T( this ) );
    }
  }
};

Y::Y( InMemStream* src ) {
  Y* that = reinterpret_cast&lt;Y*&gt;( 
              src-&gt;readInPlace(sizeof(Y)) );
  yy = that-&gt;yy;

  deserialiseAsBase( src );
}

std::unique_ptr&lt;Y&gt; Y::polymorphicCreateNew( InMemStream* src ) const {
  size_t id;
  src-&gt;read( &amp;id, sizeof(size_t) );
  switch( id ) {
    case 0:
        return std::unique_ptr&lt;Y&gt;( new Y(src) );
    case 1:
        return std::unique_ptr&lt;Y&gt;( new YY(src) );
    default:
        assert( false );
  }
}
Y::Y( Y* that ) {
  // NB: on non-x86/x64 CPUs, there may be a need
  // to memcpy 'that' into a temporary aligned
  // variable, along the lines of:
  //    alignas(Y) uint8_t tmp[sizeof(Y)];
  //    memcpy(tmp,that,sizeof(Y));
  //    and then use 'tmp' instead of 'that'
  //    this applies to ALL the cases where
  //    readInPlace()/fetchInPlace() are involved
  yy = that-&gt;yy;
}
void Y::deserializeAsBase( InMemStream* src ) {
  //non-POD ONLY for deserializeAsBase()
  zz = src-&gt;readString();
  zz2 = src-&gt;readString();
}
YY::YY( InMemStream* src ) : Y( reinterpret_cast&lt;YY*&gt;( 
             src-&gt;fetchInPlace(sizeof(YY)) )) {
  YY* that = reinterpret_cast&lt;YY*&gt;(
             src-&gt;readInPlace(sizeof(YY)) );
yy2 = that-&gt;yy2;
  Y::deserializeAsBase( src );
}
X::X( InMemStream* src ) {
  X* that = reinterpret_cast&lt;X*&gt;( 
            src-&gt;readInPlace(sizeof(X)) );
  xx = that-&gt;xx;
  y = Y::polymorphicCreateNew( src );
  src-&gt;readVector( vy );
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 7</td>
	</tr>
</table>

<p>Phew. This kind of code should be able to handle pretty much any kind of inheritance â€“ and at extremely high serialization speeds too. Still, a virtual call to <code>serializationID()</code> is a slowdown (however minor it is), and apparently it can be avoided.</p>

<h3>C++ deserialization â€“ Take 2.1, deducing object type from VMT pointers</h3>

<p>Strictly speaking, when weâ€™re writing the whole object it already contains everything we need to deserialize. In particular, it already contains a Virtual Method Table (VMT) pointer which is equivalent to <code>serializationID()</code>; in other words, it is not really necessary to invoke a rather expensive virtual <code>serializationID()</code> on serialization. The only problem is how to deduce object type from the VMT pointers (and thatâ€™s without making too many assumptions about object layout, which is very platform- and compiler-dependent).</p>

<p>One thing which seems to work (still to be double-checked) to deduce object type from VMT pointers is as follows:</p>

<ul>
	<li>At some point (say, when our program starts), weâ€™re creating an instance of each polymorphic objects weâ€™re interested in</li>
	
	<li>To avoid dealing with garbage, weâ€™re creating these instances over zeroed memory (for example, using placement new over pre-zeroed buffer)</li>
	
	<li>When creating a child object, weâ€™re initializing the parent object within the child, in exactly the same manner as weâ€™re initializing standalone parent object</li>
	
	<li>We <code>memcpy()</code> each of such objects, creating an â€˜object dumpâ€™ of each of the polymorphic objects</li>
	
	<li>Then, as soon as we have a child object and a parent object and their respective dumps, we can:
		<ul>
			<li>Cast child pointer to parent pointer to determine offset at which parent sits within the child</li>
			
			<li>Now we can compare byte-by-byte dumps of the parent-within-child (using the offset mentioned above) and standalone-parent.
				<p>Normally, the only different bytes within the parent-within-child, and standalone-parent (given that they were created as described above), are VMT pointers; moreover, the dumps should differ in at least one byte. Therefore, we can distinguish between a polymorphic child and polymorphic parent (by one of them having certain byte(s) at certain offset(s) as certain pre-defined value(s)).</p>
				
				<p>Bingo! These bytes are equivalent to the <code>serializationID()</code>.</p></li>
		</ul>
	</li>
</ul>

<p>Therefore, we can avoid writing the <code>serializationID()</code> during serialization, saving a few more CPU cycles (and bringing performance back to the C structure performance level) â€“ all of that without any priorknowledge about class layout(s).</p>

<p>It should be noted that we didnâ€™t try this approach ourselves, but it still looks perfectly plausible ;-).</p>

<h3>Other stuff</h3>

<p>Of course, this is not really an exhaustive list of problems you can encounter during ultra-high-speed serialization. However, most of the other problems youâ€™ll run into are typical for any kind of C++ serialization. In particular, non-owning pointers (and abstract graphs) need to be handled in pretty much the same manner as for any other C++ serialization (see, for example, [<a href="#[ISOCPP]">ISOCPP</a>] for a relevant discussion).</p>

<h3>Performance</h3>

<p>From what weâ€™ve seen, this kind of Ultra-Fast Serialization is extremely fast; it is pretty much on par with C raw-structure-dump serialization, and is around 5â€“10 times faster than FlatBuffers (this is also consistent with the numbers provided by FlatBuffers themselves here: [<a href="#[FlatBuffers]">FlatBuffers</a>]). Even when comparing with home-grown code with per-field serialization, our Ultra-Fast Serialization still wins (up to 1.5x-2x) due to <code>memcpy()</code> over the whole struct having significant advantage over per-field copying.</p>

<p>However, comparing the performance of our Ultra-Fast Deserialization with FlatBuffers is neither very interesting nor really relevant. It is not very interesting because, for the use cases described above, weâ€™ll be doing serialization orders of magnitude more frequently than deserialization (as deserialization occurs only when something goes wrong). It is not really relevant, because (unlike FlatBuffers) we need to restore a data structure to exactly the same as its original state (which is usually built in the manner described above, and is not easily flattenable); as a result, weâ€™re bound to make all those expensive allocations (and they will eat most of the CPU clocks on deserialization).</p>

<h3>On code generation</h3>

<p>It is always a good idea to move all this mundane serialization code into some kind of code generator; as described in [<a href="#[NoBugs16a]">NoBugs16a</a>], writing a code generator which will generate code along the lines above is not rocket science.</p>

<p>Still, even in a manually-written form, this technique is actually usable in practice (that is, unless your data structures are very elaborate).</p>

<h3>Limitations</h3>

<p>One all-important caveat of our Ultra-Fast Serialization technique is the following:</p>

<p class="blockquote"><em>DONâ€™T even think of using it unless you can GUARANTEE that the serialized data will be deserialized by EXACTLY the same executable as the one which serialized it.</em></p>

<p>This explicitly prohibits ALL of the following:</p>

<ul>
	<li>Deserializing using the same code compiled by a different compiler/for different platform</li>
	
	<li>Deserializing using the same library within different executables (well, this MIGHT fly, but weâ€™d rather not risk it). Exactly the same .so/.dll library is ok, however.</li>
	
	<li>Deserializing by different version of the same executable/shared library</li>
</ul>

<p>If you do one of these things, most likely, Ultra-Fast Serialization will work for some time â€“ but using it under these conditions is akin to sitting on a powder keg with a fuse already lit. Still, if you know for 100% sure that all the serialization/deserialization will happen in EXACTLY the same executable, it will be very difficult to beat this serialization technique performance-wise. </p>

<p>If the â€˜same executableâ€™ prerequisite doesnâ€™t apply to your case, use FlatBuffers (or any of their competitors) instead. As usual, there is no such thing as â€˜The Best Technique for Everything in Existenceâ€™, so you DO need different tools for different types of job. And â€˜serialization for transfer over the network for client code compliant with the protocolâ€™ and â€˜serialization for exactly the same executableâ€™ are two rather different beasts.</p>

<p><img src="http://accu.org/content/images/journals/ol136/Ignatchenko/Ignatchenko-01.png" /></p>

<h2>References</h2>

<p class="bibliomixed"><a id="[Loganberry04]"></a>[Loganberry04] David â€˜Loganberryâ€™, Frithaes! â€“ An Introduction to Colloquial Lapine!, <a href="http://bitsnbobstones.watershipdown.org/lapine/overview.html">http://bitsnbobstones.watershipdown.org/lapine/overview.html</a></p>

<p class="bibliomixed"><a id="[NoBugs15]"></a>[NoBugs15] Modular Architecture: Client-Side. On Debugging Distributed Systems, Deterministic Logic, and Finite State Machines, â€˜No Bugsâ€™ Hare, <a href="http://ithare.com/chapter-vc-modular-architecture-client-side-on-debugging-distributed-systems-deterministic-logic-and-finite-state-machines/">http://ithare.com/chapter-vc-modular-architecture-client-side-on-debugging-distributed-systems-deterministic-logic-and-finite-state-machines/</a></p>

<p class="bibliomixed"><a id="[NoBugs16]"></a>[NoBugs16] Deterministic Components for Distributed Systems, â€˜No Bugsâ€™ Hare, <em>Overload</em> #133</p>

<p class="bibliomixed"><a id="[ISOCPP]"></a>[ISOCPP] Serialization and Unserialization, <a href="https://isocpp.org/wiki/faq/serialization">https://isocpp.org/wiki/faq/serialization</a></p>

<p class="bibliomixed"><a id="[FlatBuffers]"></a>[FlatBuffers] Flatbuffers Benchmarks, <a href="https://google.github.io/flatbuffers/flatbuffers_benchmarks.html">https://google.github.io/flatbuffers/flatbuffers_benchmarks.html</a></p>

<p class="bibliomixed"><a id="[NoBugs16a]"></a>[NoBugs16a] IDL: Encodings, Mappings, and Backward Compatibility, â€˜No Bugsâ€™ Hare, <a href="http://ithare.com/idl-encodings-mappings-and-backward-compatibility/">http://ithare.com/idl-encodings-mappings-and-backward-compatibility/</a></p>

<h2>Acknowledgement</h2>

<p>Cartoon by Sergey Gordeev from Gordeev Animation Graphics, Prague.</p>

<p class="footnotes"></p>

<ul>
	<li><a id="FN01"></a>Yes, there can be more than one virtual pointer per object â€“ at least, in the case of virtual base classes.</li>
</ul>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
