    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: File Format Conversion Using Templates and Type
Collections</title>
        <link>https://members.accu.org/index.php/journals/370</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">Overload Journal #52 - Dec 2002 + Programming Topics</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c78/">Overload</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c191/">52</a>
                    (7)
<br />

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c65/">Programming</a>
                    (877)
<br />

                                            <a href="https://members.accu.org/index.php/journals/c191-65/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/journals/c191+65/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;File Format Conversion Using Templates and Type
Collections</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 02 December 2002 21:57:50 +00:00 or Mon, 02 December 2002 21:57:50 +00:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e16" id="d0e16"></a></h2>
</div>
<p>A recent project involved upgrading some files from an old
format to a new one (and possibly back again), as a result of
changes to the data types being stored. Several possible
implementations were considered. The final solution made use of
template methods, and type-collection classes, and supported
forward and backward file format conversion with no code
duplication and minimal overhead.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e20" id=
"d0e20"></a>Requirements</h2>
</div>
<p>Many years ago, one of our projects was converted from a 16-bit
application running on Windows 3.11 to a 32-bit one running on
Win32. Most of the code was ported at the time, but some changes
were not made because they would have required a change to the file
formats. 16-bit identifier values were being stored in a file.
Changing the file format was seen as too much of an upheaval
(especially at a time when so many other changes were being made).
And besides, 16-bits should be enough for anyone...</p>
<p>Time passed. Suddenly, 16-bits were no longer enough everyone.
The file format needed to be upgraded. Discussions were had, and
the following requirements emerged:</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>The old version of the software would not be required to read
the new file format (i.e. no forwards compatibility - see [<a href=
"#Blundell00">Blundell00</a>]).</p>
</li>
<li>
<p>The new version of the software was required to use the new
format (obviously) but only had to recognise the old format, and
prompt the user to upgrade (i.e. limited backwards compatibility -
see [<a href="#Blundell00">Blundell00</a>]).</p>
</li>
<li>
<p>An upgrade utility would convert from the old format to the new
format. A 'downgrade' facility would be a 'nice-to-have' (just in
case users were running both software versions on site and upgraded
the wrong site by mistake) but was not a necessity.</p>
</li>
<li>
<p>The interfaces of the data classes should be changed as little
as possible.</p>
</li>
<li>
<p>Any solution should support future changes (we don't want to
have to re-implement everything when it comes to 64-bits).</p>
</li>
</ol>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e49" id="d0e49"></a>Initial
suggestions</h2>
</div>
<p>Support for the new format in the software, and for both formats
in the upgrade utility, required old and new versions of the
persistence code for the data types involved, as well as some form
of user-interface for the upgrade utility, and logic for converting
the files as a whole. Suggestions were put forward for tackling the
serialisation issues:</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>Copy the old serialisation source code to the upgrade tool
project, modify the original code to use the new format so the
application can read and write the new files, and include this
modified code in the upgrade utility as well. The upgrade utility
would therefore have code for both the old and new formats, and the
application would have only the new code.</p>
</li>
<li>
<p>Append methods supporting the new formats to all the affected
data classes. The application would use the new format only, and
the upgrade utility would use both.</p>
</li>
<li>
<p>Modify the serialisation methods to handle both formats,
determining which one to use with some form of flag or version
number.</p>
</li>
</ol>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e64" id="d0e64"></a>Drawbacks</h2>
</div>
<p>The first suggestion set warning bells ringing left right and
centre. Every time I have ever copied code around it has come back
to haunt me. When the same, or similar, code is in two places you
have twice as much code to manage. Changes need to be made in two
places instead of one, which is highly error-prone. Furthermore,
people inevitably forget about one or other of the copies, and so
it gets out of date, it doesn't get built properly, documentation
stagnates, and it causes endless confusion to new team members when
they stumble across it. Re-use good; copy-and-paste bad!</p>
<p>However, criticisms were levelled at the second suggestion too.
The application would need to cart around both old and new
serialisation code, despite only ever using the new code. Small
classes would find the majority of their source code comprising
multiple persistence methods. Changes and fixes would still need to
be made to both versions. Even if they sit right next to each other
in the source file it is easy to miss one when editing the code
through a tiny keyhole source code window<sup>[<a name="footnote1"
href="#ftn.footnote1" id="footnote1">1</a>]</sup>.</p>
<p>Finally, the third suggestion leads to spaghetti serialisation
code, with huge conditional blocks based on ever-more complicated
version dependencies. In later versions you have a mess of if
blocks checking for file formats that have not been supported for
years [<a href="#Blundell00">Blundell00</a>]. As with the previous
suggestion lean classes become fat with persistence methods.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e80" id="d0e80"></a>Types, typedefs
and templates</h2>
</div>
<p>In our project we were making no changes other than the types
and sizes of various data values. Instead of a version flag, why
not parameterise the persistence methods on the relevant types?
This way we can support a whole raft of file formats using
different types all with the same code. Simple wrapper methods can
then be written to forward to the parameterised method with the
correct types.</p>
<p>As a rather trivial example, consider the code for a class that
stores an array of id values (see [<a href=
"#Blundell99">Blundell99</a>]).</p>
<pre class="programlisting">
  // id_array.h
  class id_array {
    ...
    short m_size; // should be plenty...
    short *m_ids; // should be wide enough
    };

  // id_array.cpp
  void id_array::extract(out_file &amp;f) const
  {
    f &lt;&lt; m_size; // raw write, 16-bits
    for (short i = 0; i != m_size; ++i)
    f &lt;&lt; m_ids[i];
  }

  void id_array::build(in_file &amp;f) {
    short size;
    f &gt;&gt; size; // raw read of 16-bits
    resize(size);
    for (short i = 0; i != size; ++i)
    f &gt;&gt; m_ids[i];
  }
</pre>
<p>As you can see, there is very little change to the code. The two
methods are prefixed with a template declaration containing the
type required. This type is then used inside the methods. One point
worth noting here is that the type must be used in any overloaded
function calls rather than the data members from the class itself.
Writing <tt class="literal">f &lt;&lt; m_size;</tt> will output
<span class="property">m_size</span> as the type defined in the
class itself, rather than the required type <tt class=
"literal"><span class="type">T</span></tt>. Hence you must write
<tt class="literal">T size = m_size; f &lt;&lt; size;</tt> instead.
Easy to overlook, that one (he says from experience
:-)<sup>[<a name="footnote2" href="#ftn.footnote2" id=
"footnote2">2</a>]</sup>.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e111" id="d0e111"></a>Explosion of
types</h2>
</div>
<p>It soon becomes clear that, strictly, we should have
parameterised the class both on the capacity and the contained
type, because these are not necessarily the same. Thus, our class
is now parameterised on two types:</p>
<pre class="programlisting">
  template &lt;typename Count, typename T&gt;
  void id_array::extractT(out_file &amp;f) const{
    Count size = m_size;
    f &lt;&lt; size;
    for (Count i = 0; i != m_size; ++i) {
      T value = m_ids[i];
      f &lt;&lt; value;
    }
  }
  template &lt;typename Count, typename T&gt;
  void id_array::buildT(in_file &amp;f) {
    Count size;
    f &gt;&gt; size;
    resize(size);
    for (Count i = 0; i != size; ++i) {
      T value;
      f &gt;&gt; value;
      m_ids[i] = value;
    }
  }
</pre>
<p>More complicated data structures may have even more types, and
when you have many such low-level data types you can end up with a
huge number of types and a huge number of different parameters to
each method. It gets nasty very quickly.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e120" id="d0e120"></a>Classes of
types</h2>
</div>
<p>What we really want is to be able to say, &quot;My old file format
used types <tt class="literal">t1, t2, &hellip;, tn</tt>, whereas
in my new format I use types <tt class="literal">T1, T2, ...,
Tn</tt>.&quot; It would be nice to be able to group these relevant types
together so you can just say &quot;new format&quot; or &quot;old format&quot; rather
than &quot;short, unsigned short, int and short&quot; to one method and
something else to another. Enter the class as a method of naming
things as a group:</p>
<pre class="programlisting">
  // format_types.h
  class old_types {
  public:
    typedef short count_t;
    typedef short my_id_t;
    ... // lots more follow, if nec.
  };
  class new_types {
  public:
    typedef size_t count_t;
    typedef int my_id_t;
    ... // lots more...
  };
</pre>
<p>Now, rather than passing in as many parameters as each class
requires, persistence methods can be parameterised solely on a
single format type. These methods then pull out whatever named
types they require from the file format 'types class':</p>
<pre class="programlisting">
  template &lt;typename Format&gt;
  void id_array::extractT(out_file &amp;f) const{
    Format::count_t size = m_size;
    f &lt;&lt; size;
    for (Format::count_t i = 0;
         i != size; ++i) {
      Format::my_id_t value = m_ids[i];
      f &lt;&lt; value;
    }
  }

  template &lt;typename Format&gt;
  void id_array::buildT(in_file &amp;f) {
    Format::count_t size;
    f &gt;&gt; size;
    resize(size);
    for (Format::count_t i = 0;
         i != size; ++i) {
      Format::my_id_t value;
      f &gt;&gt; value;
      m_ids[i] = value;
    }
  }
</pre></div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e137" id="d0e137"></a>Forwarding
functions</h2>
</div>
<p>We did not want to alter the interfaces of the data classes more
than necessary. In particular, we wanted persistence from our main
application to work exactly as before. To achieve this we created
one more typedef for the types currently in use:</p>
<pre class="programlisting">
  // format_types.h
  // current_types points to new_types
  // now (not old_types)
  typedef new_types current_types;
  ...
</pre>
<p>and wrote forwarding functions to call the buildT() and
extractT() template methods with the correct types:</p>
<pre class="programlisting">
  // id_array.h
  class id_array {
  public:
    // these are the original method names
    void extract(out_file &amp;f) const; 
    void build(in_file &amp;f);
    // these are new forwarding methods
    void extract_old(out_file &amp;f) const;
    void extract_new(out_file &amp;f) const;
    void build_old(in_file &amp;f);
    void build_new(in_file &amp;f);

  private:
    // These are the implementations
    template&lt;typename Format&gt;
    void extractT(out_file &amp;f) const;
    template&lt;typename Format&gt;
    void buildT(in_file &amp;f);
  };
</pre>
<p>We then implemented these forwarding methods:</p>
<pre class="programlisting">
  void extract(out_file &amp;f) const {
    extractT&lt;current_types&gt;(s);
  }

  void build(in_file &amp;f) const {
    buildT&lt;current_types&gt;(s);
  }

  void extract_old(out_file &amp;f) const {
    extractT&lt;old_types&gt;(s);
  }
  ... // etc.
</pre>
<p>These are all just one-liners, making it trivial to implement
and maintain.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e154" id="d0e154"></a>New
formats</h2>
</div>
<p>If a new format is required in the future (64-bits, etc.)
supporting it is simple:</p>
<div class="orderedlist">
<ol type="1">
<li>
<p>Add code to the unit test class to check that the new format
works OK.</p>
</li>
<li>
<p>Add a new types class, <tt class=
"classname">really_new_types</tt>, containing the relevant
typedefs.</p>
</li>
<li>
<p>Add one-line forwarding methods to each class to pass this types
class in.</p>
</li>
<li>
<p>Update <tt class="classname">current_types</tt> to point to the
new types class, <tt class="classname">really_new_types</tt>.</p>
</li>
<li>
<p>Build and check that your unit tests pass, to ensure the single
persistence methods are sufficiently generalised to support the new
types.</p>
</li>
</ol>
</div>
<p>If you want you can omit step 3 and expose public templated
serialisation methods. That way, clients can use any file format
they choose by calling the method with the correct types class. We
did not do this, (a) to control access to the different formats
more closely, and (b) because our compiler, Visual C++ 7 (the
latest .NET version) requires template methods to be implemented
inline, which we did not want to do. Some of our persistence
methods were quite involved. Implementing them in the header files
could have introduced additional compilation dependencies from
extra #include directives being required.</p>
<p>Our workaround involved declaring a private friend helper class
at the top of each data class:</p>
<pre class="programlisting">
  // id_array.h
  class id_array {
    class persister;
    friend class id_array::persister;
  public:
  ...
  };
</pre>
<p>Class <tt class="literal">persister</tt> then simply had two
methods: the two template persistence methods moved from the main
class:</p>
<pre class="programlisting">
  // id_array.cpp
  class id_array::persister {
  public:
    template&lt;typename Format&gt;
    static void extractT(const id_array &amp;a,
                         out_file &amp;f) {
      ... // inline because of VC++7
    }
    template&lt;typename Format&gt;
    static void buildT(id_array &amp;a,
                       out_file &amp;f) {
      ... // inline because of VC++7
    }
  };
</pre>
<p>The use of this private helper class allowed us to move the
inline implementations of these template methods out of the header
file. Making it a nested class avoided name clashes because we were
not polluting the scope of our data classes with additional names
(and therefore each class could use the same nested class name,
<tt class="classname">persister</tt>). The forwarding methods
within each data class could now simply forward to the static
methods of class <tt class="classname">persister</tt>, passing in a
reference to themselves:</p>
<pre class="programlisting">
  // id_array.cpp
  void id_array::extract(out_file &amp;f) const {
    persister::extractT&lt;current_types&gt;(*this, f);
  }
  ... // etc.
</pre>
<p>Alas we were not quite in the clear yet. Another weakness of
VC++7 is that it does not support explicit specification of
template parameters for template methods/functions. We had to work
around this one as well by passing in a dummy object to each method
and letting the compiler sort out which function to call:</p>
<pre class="programlisting">
  // id_array.cpp
  class id_array::persister {
  public:
    template&lt;typename Format&gt;
    static void extractT(const Format &amp;,
                         const id_array &amp;a,
                         out_file &amp;f) {
      ...
    }
    ...
  };
  ...

  void id_array::extract(out_file &amp;f) const {
    id_array::persister::extractT(current_types(), *this, f);
  }
  ...
</pre></div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e211" id=
"d0e211"></a>Conclusion</h2>
</div>
<p>Classes were used as a scope to package up the whole set of
types, used when serialising to a given file format, into a 'types
class'. A typedef was provided to allow <tt class=
"classname">current_types</tt> always to refer to the primary types
class, and hence the current file format. Template serialisation
methods were used to localise a single serialisation algorithm for
each class in a single place to aid implementation and maintenance.
One-line (non-template) forwarding methods were used to provide an
easy interface to the current, old, and new file formats. And
finally the use of a private nested friend class and dummy template
function parameters allowed us to work around various weaknesses in
the Microsoft C++ compiler and to move our templated persistence
methods out of the header files.</p>
<p>None of these choices were rocket science, but the end result
was a seamless implementation of multi-format persistence with very
little overhead, either overall (just the format classes were
needed) or in each of the persisted classes.</p>
<div class="bibliography">
<div class="titlepage">
<h2><a name="d0e221" id="d0e221"></a>References</h2>
</div>
<div class="bibliomixed"><a name="Blundell99" id="Blundell99"></a>
<p class="bibliomixed">[Blundell99] Blundell, R.P., &quot;A Simple Model
for Object Persistence Using the Standard Library,&quot; Overload 32,
June 1999</p>
</div>
<div class="bibliomixed"><a name="Blundell00" id="Blundell00"></a>
<p class="bibliomixed">[Blundell00] Blundell, R.P., &quot;Automatic
Object Versioning for Forward and Backward File Format
Compatibility,&quot; Overload 35, January 2000</p>
</div>
</div>
</div>
<div class="footnotes"><br>
<hr class="c2" width="100">
<div class="footnote">
<p><sup>[<a name="ftn.footnote1" href="#footnote1" id=
"ftn.footnote1">1</a>]</sup> which is all the space you seem to be
left with, these days, in between the project windows, watch
windows, output windows, toolbars, palette windows, etc., of the
modern IDE.</p>
</div>
<div class="footnote">
<p><sup>[<a name="ftn.footnote2" href="#footnote2" id=
"ftn.footnote2">2</a>]</sup> But fortunately one that is easy to
spot when the automated unit tests, which of course you wrote
first, fall over.</p>
</div>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
