    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: A framework for object serialization in C++</title>
        <link>https://members.accu.org/index.php/articles/486</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>




<div class="xar-mod-head"><span class="xar-mod-title">Programming Topics + Overload Journal #37 - May 2000</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c65/">Programming</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c78/">Overload</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c167/">37</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/c65-167/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/articles/c65+167/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;A framework for object serialization in C++</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 26 May 2000 17:50:56 +01:00 or Fri, 26 May 2000 17:50:56 +01:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e18" id="d0e18"></a></h2>
</div>
<p>Object serialization provides a program the ability to read and
write a whole object to/from a raw byte stream. Serialization
differs subtly from persistence in that it does not handle unique
object naming and location, nor does it handle concurrent access to
persistent objects.</p>
<p>This article describes a C++ framework for object serialization.
It assumes a reasonable level of proficiency with C++ streams and
STL. If you are looking for a serialization method you might also
read Richard Blundell's article in Overload 35 [<a href=
"#Blundell">Blundell</a>] which provoked me to write this. His
approach is somewhat different from mine.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e27" id="d0e27"></a>The
requirements</h2>
</div>
<p>Good software systems start with a good requirements
specification. The requirements for this serialization framework
were reasonably clear and have shaped the solution greatly.</p>
<p>The following list shows the design criteria I was working
to:</p>
<div class="variablelist">
<dl>
<dt><span class="term">The file format has to be easily
extensible:</span></dt>
<dd>
<p>When new objects are added to the class hierarchy they must be
easily integrated into the serialization system. As classes evolve
and new states are added, they should be easily added to the file
format.</p>
</dd>
<dt><span class="term">The system must be forwards and backwards
compatible:</span></dt>
<dd>
<p>Earlier versions of the file format should be loadable by a
later system. Obviously, all non specified elements should be set
to some 'harmless' default</p>
<p>In the same way, later versions of the file format should be
loadable by an earlier system. Serialized information for classes
that do not exist in an older system, or for extended states of
classes, should be ignored - perhaps with a suitable warning
raised.</p>
</dd>
<dt><span class="term">The file format should be human readable,
and preferably human editable:</span></dt>
<dd>
<p>There are disadvantages to this approach, most notably that
files will be larger than an equivalent non-human readable (binary)
format. Human editing may also introduce errors in the file (so the
file parsing code must be robust). The advantages include it being
useful for debugging and error recovery.</p>
<p>The design I describe here can also be applied to non-human
readable files with appropriate tweaks. However, the file format is
(at least initially) ASCII based. This is discussed in a little
more depth in the Extensions section at the end.</p>
</dd>
<dt><span class="term">The system must be easy to use:</span></dt>
<dd>
<p>The method used to add serialization to a class should not be
prohibitive to implement or difficult to use.</p>
</dd>
<dt><span class="term">Communication is via the C++
iostream:</span></dt>
<dd>
<p>This will ordinarily be to a file, but could just as equally be
a network or serial connection, for example. By using streams the
serialization mechanism will be independent of where it gets input
from and puts output to.</p>
</dd>
<dt><span class="term">The output stream must be written
sequentially:</span></dt>
<dd>
<p>Moving the stream position pointer is not permitted. If the
stream accessed a serial connection, this kind of behaviour would
simply not be possible.</p>
<p>Similarly, the file must be readable with a single parsing
pass.</p>
</dd>
<dt><span class="term">The mechanism is to be placed in a
library:</span></dt>
<dd>
<p>Therefore, it should not rely on any specific hard coded
information, like the application name, version number, etc.</p>
</dd>
<dt><span class="term">The code should be portable:</span></dt>
<dd>
<p>It should work on more than one platform, because the library is
going to.</p>
</dd>
<dt><span class="term">The framework should be generic:</span></dt>
<dd>
<p>It should not just be tailored to a single class hierarchy -
after all that would be a waste of design effort. Resuse is good.
Reuse is our friend.</p>
</dd>
</dl>
</div>
<p>I am not fashionable enough to use XML, and so concocted the
following system.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e97" id="d0e97"></a>File
format</h2>
</div>
<p>I started the design work by defining the file
format<sup>[<a name="d0e102" href="#ftn.d0e102" id=
"d0e102">1</a>]</sup>. In order to serialize a class hierarchy some
kind of hierarchical file format would be needed. Something easy to
parse is always helpful, too. The following is a simple example of
the file format.</p>
<pre class="programlisting">
FILE_TYPE_IDENTIFIER
Header 
{
  Version-Major:100
  Version-Minor:0
  Originator:MyProg 3.00
  OtherData:Blah
}
A
{
  DataTag1:Data1
  DataTag2:Data2
  B
  {
    DataTag1:Data1
    DataTag2:Data2
  }
  C
  {
    DataTag1:Data1
  }
}
</pre>
<p>That looks human readable enough, but what are the rules?</p>
<div class="variablelist">
<dl>
<dt><span class="term">Reading the file:</span></dt>
<dd>
<p>The file is parsed a line at a time. The indentation is optional
but very useful. Leading whitespace on any line is ignored.</p>
</dd>
<dt><span class="term">The first line is special:</span></dt>
<dd>
<p>The first line is a string that identifies the file type. This
makes it easy to identify if a file is of the appropriate format
before continuing.</p>
<p>The first line is the only 'special' line in the whole file.
After that, the file conforms to a strict chunked format.</p>
</dd>
<dt><span class="term">Each line contains either:</span></dt>
<dd>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>A chunk identifier (string containing no spaces or colons),</p>
</li>
<li>
<p>An opening brace,</p>
</li>
<li>
<p>A closing brace,</p>
</li>
<li>
<p>A data line, &quot;Tag:Data&quot; (with no whitespace around the
colon).</p>
</li>
</ul>
</div>
</dd>
<dt><span class="term">Each object's data is enclosed in a
'chunk':</span></dt>
<dd>
<p>A chunk begins with an identifier line describing the chunk
contents. The identifier name is only unique within the scope of
its parent chunk - different chunks may have sub-chunks with the
same identifiers. Following this is a line containing an opening
brace. The chunk's data contents appear on each subsequent line,
prior to a line containing a closing brace.</p>
</dd>
<dt><span class="term">Chunk contents:</span></dt>
<dd>
<p>The chunk's contents could be other chunks (in the same format
as described above, i.e. identifier-brace-data-brace) or data
presented in the format 'Tag:data'. These data lines comprise tags
(which contain no spaces) whose name is local to their chunk and a
data string which can be in any format, but should be human
readable.</p>
</dd>
<dt><span class="term">Chunk identifiers:</span></dt>
<dd>
<p>The single string chunk identifier that precedes each chunk
describes either the object or class of the chunk, depending on
context.</p>
<p>In the example file above, the chunks B and C could be
describing two distinct objects of the same class. In this kind of
situation the chunk identifier describes which object's data is
contained in the chunk.</p>
<p>The identifier may instead refer to the class of object.
Consider a class which acts as a 'list' of other objects. There
could be any number of these objects in the list. The example below
shows how this would work</p>
<pre class="programlisting">
List
{
  SomeListData:data
  Item
  {
  }
  Item
  {
  }
  ...
}
...
</pre>
<p>The 'List' chunk is parsed, and for every chunk identifier
'Item' encountered, a new object of an item class is inserted into
the list.</p>
</dd>
<dt><span class="term">There is a predefined chunk,
Header:</span></dt>
<dd>
<p>This should appear at the top of the file, just underneath the
file type identifier. However, if it is absent then the file is not
necessarily in error - how the data is interpreted will be up to
the system.</p>
<p>The Header chunk contains a number of data lines:</p>
<div class="variablelist">
<dl>
<dt><span class="term">Version-Major:</span></dt>
<dd>
<p>The major version number of the file format.</p>
</dd>
<dt><span class="term">Version-Minor:</span></dt>
<dd>
<p>The minor version of the file format.</p>
</dd>
<dt><span class="term">Originator:</span></dt>
<dd>
<p>This field describes the program that created the file. It is
possible to completely ignore this when reading data, but it is
interesting and sometimes useful for debugging purposes. (why can't
I parse this file? because program X wrote it, and got the format
wrong)</p>
</dd>
<dt><span class="term">Other data:</span></dt>
<dd>
<p>Other data that may be pertinent to the whole file goes here.
For example, the data I was serializing contained timestamped
information. In the Header chunk I included a field that specifies
the format of the timestamps. This means that if the system is
later given better time resolution the older files will still be
useable.</p>
</dd>
</dl>
</div>
<p>The minor and major version numbers are used together to
identify the exact file format version. The minor number refers to
changes that do not compromise file compatibility. Major number
changes describe format changes that may break a future parser's
ability to interpret the file correctly. If the major number is
greater than the parser recognises, it should refuse to parse the
file.</p>
<p>In the future it may be necessary for parsers to use this
information to interpret the data in the rest of the file. This
should be avoided if at all possible - it will introduce
maintenance problems and decrease code clarity.</p>
</dd>
</dl>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e205" id="d0e205"></a>Unrecognised
data</h2>
</div>
<p>If the parser encounters a chunk identifier that it doesn't
recognise, then it simply skips the chunk, perhaps raising a
suitable warning along the way. Similarly, if it encounters a data
tag in a chunk that is not recognised, it is ignored.</p>
<p>This simple rule makes the file format open, and forwards and
backwards compatible.</p>
<p>It also allows us to make some data lines optional in their
chunks. They will not necessarily always be saved, perhaps their
value will assume a default or they will simply not be applicable
in all situations.</p>
<p>These eight rules define a very simple file format. However,
they are sufficient to serialise an entire class hierarchy's state
and recover it easily.</p>
<p>The format's extensibility and openness also leads to other
interesting possibilities for an application. For example, it is
possible to save application choices along with data by defining a
new 'Choices' chunk containing the relevant information.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e218" id="d0e218"></a>How to
implement the serialization system</h2>
</div>
<p>Now we have defined the file format it is easy to write code to
implement serialization using it ;-) The framework consists of an
interface mixin class for objects that can be serialized and a set
of classes that help read/write data.</p>
<p>If a class contains data that needs to be serialized, then the
class inherits from the <tt class="classname">Serializable</tt>
base class. The class is shown below.</p>
<pre class="programlisting">
class Serializable {
  public:
    Serializable() {}
    virtual ~Serializable() {}
    virtual void save(std::ostream &amp;out,
                int indentLevel) const;
    virtual void load(std::istream &amp;in,
          SerializableLoadInfo &amp;info);
  protected:
    static std::ostream &amp;indent 
        (std::ostream &amp;s, int level){
      for(int n=0; n&lt;level; n++) s &lt;&lt; &quot; &quot;;
      return s;
    }
    static std::omanip&lt;int&gt; indent(
            int level) {
      return std::omanip&lt;int&gt; 
           (Serializable::indent, level);
    }
  private:
    Serializable &amp;operator=(const
                       Serializable &amp;);
    Serializable(const Serializable &amp;);
};
</pre>
<p>What is all that about? We willl take it a step at a time. First
we will see how the <tt class="methodname">save</tt> method is
used, and then <tt class="methodname">load</tt>. We will meet
<tt class="methodname">indent</tt> along the way.</p>
<p>After defining a <tt class="classname">Serializable</tt> class
interface, we need to define the top-level API to the save/load
mechanism. This bit will differ depending on the class hierarchy
you are saving. Let us say we are trying to save a <tt class=
"classname">Widget</tt> which has the class hierarchy in figure
1.</p>
<p>We will call the serialization API class <tt class=
"classname">WidgetSerializer</tt>. It has a simple public API, but
there is some more devious stuff going on in the <tt class=
"literal">private</tt> part of the class.</p>
<div class="figure"><a name="d0e257" id="d0e257"></a>
<div class="mediaobject c2"><img src=
"/var/uploads/journals/resources/A%20framework%20for%20object%20serialization%201.png"
align="middle"></div>
<p class="title c3">Figure 1. </p>
</div>
<pre class="programlisting">
class WidgetSerializer {
public:
  WidgetSerializer(const std::string
                          &amp;appname);
  virtual ~WidgetSerializer();
  void save(std::ostream &amp;out, 
                         Widget *) const;
  Widget *load(std::istream &amp;in);
private:
  class Header : public Serializable {
  public:
    Header(const std::string &amp;originator);
    virtual ~Header();
    virtual void save(std::ostream &amp;out,
                  int indentLevel) const;
    virtual void load(std::istream &amp;in,
          SerializableLoadInfo &amp;info);
  private:
    std::string originator;
  } header;
  WidgetSerializer &amp;operator=(const
                   WidgetSerializer &amp;);
  WidgetSerializer(const WidgetSerializer &amp;);
};
</pre>
<p>This is the second class definition I've presented without a
full explanation. By the end of the article we will have covered
all aspects of both.</p>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e266" id="d0e266"></a>Saving</h3>
</div>
<p>The first thing we will consider is how to save the <tt class=
"classname">Widget</tt> class hierarchy. After all, we cannot load
anything without having first saved it.</p>
<p>The framework implementation will be easier to understand if we
first see how to use it. The user, say an application called
<tt class="classname">WidgetEditor</tt>, will create a <tt class=
"classname">WidgetSerializer</tt>, and use it to save a <tt class=
"classname">Widget</tt> to file in some function <tt class=
"methodname">saveWidget</tt> thus:</p>
<pre class="programlisting">
static const std::string MY_NAME 
                      = &quot;WidgetEditor&quot;;
void saveWidget(const std::string &amp;filename, Widget *widget){
  WidgetSerializer serializer(MY_NAME);
  std::ofstream out(filename.c_str());
  if (!out) ... // error
  serializer.save(out, widget);
}
</pre>
<p>In this function, and until the very end of the document I am
going to conveniently ignore error handling. At this point you are
free to imagine how the system will go bang.</p>
<p><tt class="classname">WidgetSerializer</tt> is pretty easy to
implement. Its responsibility is to save the file prologue and
epilogue, and serialize the appropriate object in between. What
exactly that means is best shown by implementation:</p>
<pre class="programlisting">
static const std::string FILE_IDENTIFIER_STRING = &quot;WidgetFile&quot;;
WidgetSerializer::WidgetSerialiser(
            const std::string &amp;appname)
    : header(appname){ }
WidgetSerializer::save(std::ostream &amp;out,
             Widget *widget){
  out&lt;&lt; FILE_INDENTITIFIER_STRING &lt;&lt; endl
    &lt;&lt; &quot;Header\n&quot;;   // (*) see below
  header.save(out, 0);   
// the 0 is explained below
  out &lt;&lt; &quot;Widget\n&quot;;   // (*) see below
  widget-&gt;save(out, 0);  
// the 0 is explained below
// Perhaps save some application choices
// in an epilogue here
}
</pre>
<p>So all the <tt class="classname">WidgetSerializer</tt>'s
<tt class="methodname">save</tt> method does is ensure that the
file identifier line and the header are saved, and then pass
responsibility to the <tt class="classname">Widget</tt> class to
serialize itself.</p>
<p>The key to using <tt class="methodname">Serializable::save</tt>
is shown at (*) above. The <tt class=
"methodname">Serializable::save</tt> method causes an object to
serialize itself as a file chunk on the given <tt class=
"classname">ostream</tt>. The information that it saves will
consist of the opening brace line, data and the closing brace line.
Specifically, the chunk identifier will not be saved. Why not?</p>
<p>It is the caller's responsibility to place an identifying tag
line on the <tt class="classname">ostream</tt> before calling this
method so that it can uniquely identify the file chunk when it has
to read it back. For example, if we are saving two objects of the
same class and want then to have different chunk names, then we
have to write the identifier ourselves.</p>
<p>The <tt class="methodname">save</tt> methodof the base
<tt class="classname">Serializable</tt> class is defined to save an
empty chunk. This is a useful default if you have not yet decided
what information to save, or if that particular object has no
saveable state at this point, but may have in the future.</p>
<pre class="programlisting">
void Serializable::save(std::ostream &amp;out,
           int indentLevel) const {
  out &lt;&lt; indent(indentLevel) &lt;&lt; &quot;{\n&quot;
      &lt;&lt; indent(indentLevel) &lt;&lt; &quot;}\n&quot;;
}
</pre>
<p>This simple method illustrates two more details. First,
<tt class="methodname">Serializable::save</tt>'s second parameter,
<i class="parameter"><tt>indentLevel</tt></i> defines what level of
indentation the chunk is being saved at. Remember, at the top level
we passed 0. Second, the curious pair of <tt class=
"methodname">protected indent</tt> methods in the <tt class=
"classname">Serializable</tt> class are used to perform the
indentation. They are a simple means of inserting a whitespace
indent when we save data into a chunk using C++ stream
manipulation. It is outside the bounds of this article to describe
fully how it works, however. Bjarne Stroustrup decribes the
mechanism in [<a href="#Stroustrup">Stroustrup</a>]</p>
<p>The <tt class="classname">WidgetSerializer</tt> <tt class=
"literal">private</tt> <tt class="classname">Header</tt> class'
<tt class="methodname">save</tt> mechanism is implemented as
follows:</p>
<pre class="programlisting">
// These are defined in some suitable
// way, somewhere
static const int VERSION_MAJOR = 100;
static const int VERSION_MINOR = 0;
WidgetSerializer::Header::Header(
         const std::string &amp;originator):
            originator(originator) { }
WidgetSerializer::Header::~Header() { }
void WidgetSerializer::Header::save(
       std::ostream &amp;out, int i) const {
  out &lt;&lt; indent(i) &lt;&lt; &quot;{\n&quot;
     &lt;&lt; indent(i+1) &lt;&lt; &quot;Version-Major:&quot; 
     &lt;&lt; VERSION_MAJOR &lt;&lt; \n&quot;
     &lt;&lt; indent(i+1) &lt;&lt; &quot;Version-Minor:&quot;
     &lt;&lt; VERSION_MINOR &lt;&lt; \n&quot;
     &lt;&lt; indent(i+1) &lt;&lt; &quot;Originator:&quot;
     &lt;&lt; originator  &lt;&lt; \n&quot;
     &lt;&lt; indent(i)   &lt;&lt; &quot;}\n&quot;;
}
</pre>
<p>This explains how we get the Header chunk into the file.</p>
<p>The final part of saving is to make the <tt class=
"classname">Widget</tt> save itself, and all the objects it
contains. It overrides the <tt class=
"methodname">Serializable::save</tt> method with its own version,
just as <tt class="classname">Header</tt> has done above, and puts
the relevant saving magic in it. To save sub-chunks for contained
objects you simply write a chunk identifier line, and then call
<tt class="literal">save(out_stream, i+1)</tt> on the
sub-object.</p>
<p>For example, <tt class="methodname">Widget::save</tt> may look
like this:</p>
<pre class="programlisting">
Widget::save(std::ostream &amp;out, int i) const {
  out &lt;&lt; indent(i) &lt;&lt; &quot;{\n&quot;
    &lt;&lt; indent(i+1)&lt;&lt; &quot;Title:&quot;&lt;&lt;title &lt;&lt; &quot;\n&quot;
    &lt;&lt; indent(i+1)&lt;&lt; &quot;Date:&quot; &lt;&lt;date &lt;&lt; &quot;\n&quot;;
  for (int n = 0; n &lt; no_sprockets; n++)  {
    out &lt;&lt; &quot;Sprocket\n&quot;;
    sproket[n]-&gt;save(out, i+1);
  }
  for (int n = 0; n &lt; no_cogs; n++) {
    out &lt;&lt; &quot;Cog\n&quot;;
    cog[n]-&gt;save(out, i+1);
  }
  out &lt;&lt; indent(i) &lt;&lt; &quot;}\n&quot;;
}
</pre>
<p>The data you place after a 'Tag:' can be as adventurous as you
want, not just <tt class="type">int</tt>s and <tt class=
"type">string</tt>s. Of course, you have to be willing to write
some code to parse it back in. Which brings us neatly on to...</p>
</div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e399" id="d0e399"></a>Loading</h3>
</div>
<p>To make you feel better, saving was the easy bit.</p>
<p>We have created a file containing the serialized version of our
<tt class="classname">Widget</tt>. <tt class=
"classname">WidgetEditor</tt> has been shut down, at some future
point it is fired back up and we want to be able to edit our widget
again. We now need to write a parser to read the data and
reconstruct the <tt class="classname">Widget</tt> class hierarchy.
That is what all those (as yet) unexplained load methods are for,
naturally.</p>
<p>First, we will investigate the <tt class=
"classname">SerializableLoadInfo</tt> structure that is passed into
the load methods. It looks like this:</p>
<pre class="programlisting">
struct SerializableLoadInfo {
  int   major;
  int   minor;
  Widget *widget;
// Any other data that may be read from 
// the header
};
</pre>
<p>This structure will collect information from the input file's
header and allow it to be passed to each <tt class=
"classname">Serializable</tt> object as it parses a data chunk. The
information in this structure may alter how some data in the file
is interpereted.</p>
<p>The <tt class="classname">WidgetEditor</tt> uses the loading API
in some <tt class="methodname">loadFile</tt> function like
this:</p>
<pre class="programlisting">
Widget *loadFile(std::string &amp;filename) {
  std::ifstream in(filename.c_str());
  if (!in) ... // error
  WidgetSerializer serializer(MY_NAME);
  Widget *w = serializer.load(in);
  return w;
}
</pre>
<p>The way <tt class="methodname">WidgetSerializer::load</tt> is
designed means that it hands back a newly allocated <tt class=
"classname">Widget</tt> object based upon the contents of the given
file. It is <tt class="classname">WidgetEditor</tt>'s
responsibility to delete the <tt class="classname">Widget</tt> when
it is finished.</p>
<p>Again, the top-level API is nice and easy to use. So how do we
implement it?</p>
<pre class="programlisting">
WidgetSerializer::load(std::istream, *in) {
  Widget *widget = new Widget();
  SerializableLoadInfo  info;
  info.widget = widget;
// Check first line matches file identifier
  std::string id;
  std::getline(in, id);
  if (id != FILE_IDENTIFIER_STRING) ...
// error
// Now scan each chunk left in the file
  std::tring line;
  while (std::getline(in, line)) {
    if (line == &quot;Header&quot;) {
      header.load(in, info);
    }
    else if (line == &quot;Widget&quot;) {
      widget-&gt;load(in, info);
    }
    else if (line != &quot;&quot;) {
      FileBlockParser parser;
      parser.parse(in, info);
    }
  }
  return widget;
}
</pre>
<p>What's that <tt class="classname">FileBlockParser</tt> thing
doing there? For the moment you can happily accept that it skips an
unrecognised chunk. Just how it does that and why we will see in a
bit.</p>
<p>The parsing code ensures that the file type is correct, then
reads the first chunk identifier, &quot;Header&quot;, and directs the rest of
the chunk to the parser in header's load method. So what does that
look like?</p>
<pre class="programlisting">
void WidgetSerializer::Header::load(
          std::istream &amp;in, 
          SerializableLoadInfo &amp;info){
  std::string open;
  std::getline(std::ws(in), open);
  if (open != &quot;{&quot;) ... // error
  std::string line;
  bool more = true;
  while (more &amp;&amp;
       std::getline(std::ws(in), line)){
    std::getline(std::ws(in), line);
    if (line.substr(0,14).compare( 
              &quot;Version-Major:&quot;) == 0){
      std::istrstream si(line.c_str()+14);
      si &gt;&gt; info.major;
    }
    else if (line.substr(0,14).compare(
               &quot;Version-Minor:&quot;) == 0){
      std::istrstream si(line.c_str()+14);
      si &gt;&gt; info.minor;
    }
    else if (line ==  &quot;}&quot;) more = false;
  }
}
</pre>
<p>There are a couple of useful C++ standard library facilities
being used here. First, <tt class="literal">std::ws(in)</tt>: it is
a stream manipulator that strips leading whitespace from an input
stream. Secondly, <tt class="classname">istrstream</tt> is an
'input string stream', which allows you to treat strings as streams
to be read/written to. It is a type-safe way of doing <tt class=
"function">sscanf</tt>.</p>
<p>This is the only <tt class="methodname">load</tt> method in the
whole <tt class="classname">Serializable</tt> framework that ever
writes to the <tt class="classname">SerializableLoadInfo</tt>
structure, it gets the data from the file's Header chunk.</p>
<p>So far we have hand written two stream parsing loops. The chunk
parsing code for each <tt class="classname">Serializable</tt>
object could look very similar to this. It works fine. In each
<tt class="classname">Serializable</tt> class, the <tt class=
"methodname">load</tt> method loops around, pulling out chunk
identifiers and data lines and acts on them accordingly.</p>
<p>It is not exactly elegant, for every <tt class=
"classname">Serializable</tt> class we have to write another
parsing loop. That is a lot of duplicated effort, which tells us
that there is something wrong with the design. Add to this the
problem of manual string length counting in the compares above - it
is a pretty error prone way of writing code: if you get a
comparison wrong the compiler will not tell you.</p>
<p>The solution to this is the <tt class=
"classname">FileBlockParser</tt> class that we breezed past
earlier. The intention of <tt class=
"classname">FileBlockParser</tt> is to do that loop for us, to save
writing it thousands of times, and to safely recognise the tags and
chunk identifiers for us.</p>
<p>This is what it looks like:</p>
<pre class="programlisting">
class FileItemParser;
// a class that interprets data lines
class FileBlockParser {
public:
  FileBlockParser();
  void add(const std::string &amp;name,
             Serializable *block);
  void add(const std::string &amp;name,
             FileItemParser *item);
  void add(FileItemParser *item);
  void parse(std::istream &amp;in,
           SerializableLoadInfo &amp;info);
private:
  void skipChunk(std::istream &amp;i);
  std::map&lt;std::string,
               FileItemParser*&gt; items;
  std::map&lt;std::string,
               Serializable*&gt; blocks;
    FileItemParser *catchAll;
};
</pre>
<p>The <tt class="classname">FileBlockParser</tt>'s <tt class=
"methodname">parse</tt> method sends it scampering across the input
stream a line at a time. It will expect to find an opening brace
line first (just as our hand written load methods would do).</p>
<p>Then if it finds a chunk identifier line it looks in an internal
map associating chunk names to <tt class=
"classname">Serializable</tt> objects. This is set up before
calling <tt class="methodname">parse</tt> using the appropriate
<tt class="methodname">FileBlockParser::add</tt> method. If the
chunk identifier is not recognised, the <tt class=
"classname">FileBlockParser</tt> will skip the chunk (using its
private <tt class="methodname">skipChunk</tt> method).</p>
<p>If the <tt class="classname">FileBlockParser</tt> finds a data
line then it will call the appropriate <tt class=
"classname">FileItemParser</tt> to interpret that line. If it does
not recognise the tag name in its map then it will call a catch-all
<tt class="classname">FileItemParser</tt> if you have added one.
This may be useful in some chunks. If there is no catch-all object
then the data line will be ignored.</p>
<p>Finally, when the <tt class="classname">FileBlockParser</tt>
finds the closing brace it will stop parsing and return to its
caller.</p>
<p>This class has taken responsibility for writing that ubiquitous
loop for us. It saves us time and effort, and minimises the risk of
bugs in our file parsing.</p>
<p><tt class="classname">FileItemParser</tt> is a base class for
objects that handle data lines. The base class looks like this:</p>
<pre class="programlisting">
class FileItemParser {
public:
  FileItemParser() {}
  virtual ~FileItemParser() = 0;
  virtual void parse(
          const std::string &amp;data) = 0;
private:
  FileItemParser &amp;operator=(
              const FileItemParser &amp;);
  FileItemParser(const FileItemParser &amp;);
}
</pre>
<p>With this in mind we can implement the <tt class=
"classname">FileBlockParser</tt> as follows:</p>
<pre class="programlisting">
FileBlockParser::FileBlockParser()
                : catchAll(0) { }

void FileBlockParser::add(
          const std::string &amp;name,
           Serializable *block) {
  blocks[name] = block;
}
void FileBlockParser::add(
          const std::string &amp;name,
           FileItemParser *item){
  items[name] = item;
}
void FileBlockParser::add(
            FileItemParser *item){
  catchAll = item;
}
void FileBlockParser::parse(
          std::istream &amp;in, 
          SerializableLoadInfo &amp;info){
  std::string line;
  std::getline(std::ws(in), line);
  if (line != &quot;{&quot;) ... // error
  bool more = true;
  while (more &amp;&amp; 
      std::getline(std::ws(in), line)) {
    if (line == &quot;}&quot;) { more = false; }
    else if (line.find(&quot;:&quot;) 
      == std::string::npos) {
// 'line' is a chunk identifier
      std::map&lt;string, 
          Serializable *&gt;::iterator I
              = blocks.begin();
      bool done = false;
      while (!done &amp;&amp; i !=blocks.end()){
        if (i-&gt;first == line){
          i-&gt;second-&gt;load(in, info);
          done = true;
        }
        i++;
      }
      if (!done) skipChunk(n);
    }
    else {
// 'line' is a 'tag:data' pair
      const std::string name =
         line.substr(0, line.find(&quot;:&quot;));
      const std::string data = 
         line.substr(line.find(&quot;:&quot;)+1);
      std::map&lt;string, 
          FileItemParser *&gt;::iterator I
                    = items.begin();
      bool done = false;
      while (!done &amp;&amp; i != items.end()){
        if (i-&gt;first == name) {
          i-&gt;second-&gt;parse(data);
          done = true;
        }
        i++;
      }
      if (!done &amp;&amp; catchAll) [
        catchAll-&gt;parse(line);
      }
    }
  }
}
void FileBlockParser::skipChunk(
                std::istream &amp;in){
  std::string open;
  std::getline(std::ws(in), open);
  if (open != &quot;{&quot;) ... // bad format
  int depth = 1;
  std::string line;
  do {
    std::getline(std::ws(in), line);
    if (line == &quot;{&quot;) depth++;
    else if (line == &quot;}&quot;) depth--;
  } while (!in.eof() &amp;&amp; depth);
}
</pre>
<p>We can easily now define the base <tt class=
"methodname">Serializable::load</tt> method. It will ignore
everything in a chunk by simply creating a <tt class=
"classname">FileBlockParser</tt> with no <tt class=
"classname">FileItemParser</tt>s. This will scan the chunk, ignore
every data line and sub-chunk in it, and stop parsing after the
appropriate closing brace.</p>
<p>This is the logical behaviour for <tt class=
"methodname">Serializable::load</tt> since we defined <tt class=
"methodname">Serializable::save</tt> to save an empty block.
Perfectly complimentary.</p>
<pre class="programlisting">
Serializable::load(std::istream &amp;in,
           SerializableLoadInfo &amp;info){
  FileBlockParser parser;
  parser.parse(in, info);
}
</pre>
<p>The only thing that remains to be done is define a number of
<tt class="classname">FileItemParser</tt>s that implement
'Tag:Data' parsing functionality. I have implemented a number of
these, some for simple integer values, some for string values,
others for more elaborate data lines with several fields. The
following example shows the outline of their structure. It is a
<tt class="classname">FileItemParser</tt> that reads a boolean
value which has been written to the output as a string (either
On/Off or Yes/No).</p>
<p><tt class="classname">FileItemParser_OnOff</tt> is a template
class. The template type you supply is the class with the data item
to modify. Making it a template saves us from having to write a
version of <tt class="classname">FileItemParser_OnOff</tt> tailored
to each specific accessor method of each class that needs it.</p>
<p>As an example class that we will be accessing, here Is the
<tt class="classname">Cog</tt> class:</p>
<pre class="programlisting">
class Cog : public Serializable {
public:
  Cog() : data(0) {}
  virtual ~Cog() {}
  bool data() { return _data; }
  void setData(bool d) { _data = d; }
// implement Serializable interface here
private:
  bool _data;
};
</pre>
<p>The member function to set a boolean value must have the
signature void <tt class="methodname">setXXX(bool)</tt>, like
<tt class="methodname">setData</tt> in <tt class=
"classname">Cog</tt> above. <tt class=
"classname">FileItemParser_OnOff</tt> contains a <tt class=
"literal">typedef</tt> for this member function signature,
<tt class="type">fn_t</tt>.</p>
<pre class="programlisting">
template &lt;class T&gt;
class FileItemParser_OnOff 
          : public FileItemParser {
public:
  typedef void (T::*fn_t)(bool);
  FileItemParser_OnOff(T *obj, fn_t mfun)
  : obj(obj), mfun(mfun) {}
  void parse(const std::string &amp;data) {
    (obj-&gt;*mfun)(data == &quot;On&quot; 
                || data == &quot;Yes&quot;);
  }
private:
  T  *obj;
  fn_t  mfun;
};
</pre>
<p>When you create a <tt class=
"classname">FileItemParser_OnOff</tt> you specify the object whose
value is to be set, and the member function that performs the
setting.</p>
<pre class="programlisting">
Cog cog;
FileItemParser_OnOff&lt;Cog&gt;
        cog_parser(&amp;cog, Cog::setData);
</pre>
<p>We then pass the <tt class="classname">FileItemParser_OnOff</tt>
to a <tt class="classname">FileBlockParser</tt> to read the
block:</p>
<pre class="programlisting">
FileBlockParser fbp;
fbp.add(&amp;cog_parser);
fbp.parse(some_istream);
</pre>
<p>Now we have a complete mechanism to load the files that have
been saved. Objects can be serialised to a byte stream (probably a
file) and reconstructed from the same byte stream at a later
time.</p>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e654" id=
"d0e654"></a>Extensions</h2>
</div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e657" id="d0e657"></a>Forwards
compatibility</h3>
</div>
<p>If an earlier version of the parser sees a file saved by a later
system there will be some data that it does not recognise and just
skips over.</p>
<p>If you wanted to inform the user that this has happened but not
halt the file parsing process you can add new boolean flag to the
<span class="structname">SerializableLoadInfo</span> structure. It
is initially false, but can be set to true by the <tt class=
"classname">FileBlockParser</tt> if it meets unexpected data. When
the load is complete the parser can return the flag to the caller
to handle appropriately.</p>
</div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e670" id="d0e670"></a>Error
handling</h3>
</div>
<p>Up to this point, error handling has been conveniently ignored.
You can add the error handling mechanism of your choice to the
system. For it to be robust there must be some form of error
handling.</p>
<p>I would favour raising exceptions, but have not incorporated
that into the framework here for clarity, and to avoid complicating
matters with exception safety concerns.</p>
</div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e677" id="d0e677"></a>Adaptor
classes</h3>
</div>
<p>The classes to be serialized do not actually have to inherit
from <tt class="classname">Serializable</tt> themselves. It is
possible to create an adaptor class that acts as a <tt class=
"classname">serializable</tt> for a particular type of class and
save its state - providing the data class' <tt class=
"literal">public</tt> API allows you to sufficiently capture and
restore its state.</p>
<p>I have used both approaches with success.</p>
</div>
<div class="sect2" lang="en">
<div class="titlepage">
<h3><a name="d0e693" id="d0e693"></a>Binary
versions</h3>
</div>
<p>It is not difficult to see how to convert this system from an
ASCII text based approach to a binary file format. In doing so you
would remove the need for text searches, comparisons, and
whitespace stripping.</p>
<p>The advantages of doing so include reducing the file size,
taking less time to write, and more importantly less time to parse.
For some applications these may be big issues. Certainly, the
complexity of the parsing operations increases greatly as the file
size increases. This may be of particular importance when
serializing an object over a network connection - the less data
that needs to be sent the better.</p>
<p>The binary version is no longer human editable - this may in
fact also be an advantage for some applications.</p>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e702" id=
"d0e702"></a>Conclusions</h2>
</div>
<p>We have seen a framework for object serialization in C++. it has
a simple API, and gives you a great deal of flexibility with very
little implementation effort.</p>
<p>The files that are produced by this system are forwards and
backwards compatible. The file format is human readable, and
editable. New <tt class="classname">Serializable</tt> classes can
be added easily to the system with very little effort since the
framework is already in place for them.</p>
<div class="bibliography">
<div class="titlepage">
<h2><a name="d0e712" id="d0e712"></a>References</h2>
</div>
<div class="bibliomixed"><a name="Blundell" id="Blundell"></a>
<p class="bibliomixed">[Blundell] Richard Blundell. Automatic
Object Versioning for Forward and Backward File Format
Compatibility. <span class="citetitle"><i class=
"citetitle">Overload 35</i></span>, Jan. 2000.</p>
</div>
<div class="bibliomixed"><a name="Stroustrup" id="Stroustrup"></a>
<p class="bibliomixed">[Stroustrup] Bjarne Stroustrup. The C++
Programming Language. 3ed. Addison-Wesley, 1999. pp 631-636 ISBN:
0-201-88954-4.</p>
</div>
</div>
</div>
<div class="footnotes"><br>
<hr class="c4" width="100">
<div class="footnote">
<p><sup>[<a name="ftn.d0e102" href="#d0e102" id=
"ftn.d0e102">1</a>]</sup> Of course, the stream may not go to a
file, but more often than not will.</p>
</div>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
