    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: Shadow Data Types</title>
        <link>https://members.accu.org/index.php/journals/1593</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">Overload Journal #94 - December 2009 + Programming Topics</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c78/">Overload</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c259/">94</a>
                    (6)
<br />

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c65/">Programming</a>
                    (877)
<br />

                                            <a href="https://members.accu.org/index.php/journals/c259-65/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/journals/c259+65/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;Shadow Data Types</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 29 December 2009 08:59:00 +00:00 or Tue, 29 December 2009 08:59:00 +00:00</p>
<p><strong>Summary:</strong>&nbsp;Hiding implementation details is a good idea. Jon Jagger shows a technique in C that avoids some of the common problems.</p>
<p><strong>Body:</strong>&nbsp;  <p>Suppose we have a type called <tt class="code">wibble</tt> defined as a concrete data type (that is, a type whose representation is fully visible) as shown in Listing 1.
  </p>
<p> 
<table class="sidebartable"><tr><td>
<pre class="programlisting">
    #include &quot;grommet.h&quot;  
    #include &quot;flange.h&quot;  
 
    typedef struct  
    {  
        grommet g;  
        flange f;  
    } wibble;  
 
    void wopen(wibble * w, int i);  
    ...  

    void wclose(wibble * w);
  </pre></td></tr><tr><td class="title">Listing 1</td></tr></table>
  </p>
  <p> 
    The definition of <tt class="code">wibble</tt> exposes the types involved in its representation (<tt class="code">grommet</tt> and <tt class="code">flange</tt> in this made up example), and hence requires a <tt class="code">#include</tt> for those types in its header file. This exposure has a price. One cost is that any change to the <tt class="code">grommet</tt> or <tt class="code">flange</tt> header files, or any header files they in turn <tt class="code">#include</tt>, at any depth, will require a recompilation of any source file that <tt class="code">#include</tt>s wibble.h (either directly or transitively). Another cost is that client code can easily become reliant on the exposed representation rather than relying solely on the functional api. Note that in C++ you can avoid this latter problem by declaring your data members private.
  </p><p> 
    These costs are sufficiently high that software developers have invented techniques to hide a type's representation: turn it into an Abstract Data Type. This is simply a type that does not reveal its representation; a type that abstracts away its representation. In this article I'll look at two abstract data type implementation techniques: Opaque Data Types, and Shadow Data Types.
  </p><h2>
    Opaque data types
  </h2><p> 
    The term Opaque Data Type is a well established term for the technique of exposing only the name of the type in its header file. This is done with a forward type declaration. This certainly has the effect of not exposing any representation! 
  </p>
<pre class="programlisting">
      typedef struct wibble wibble;  
 
      wibble * wopen(int);  
      ...  
      void wclose(wibble *);  
</pre><p> 
    A definite downside with this approach is that clients cannot create objects themselves. 
  </p>
<pre class="programlisting">
      #include &quot;wibble.h&quot;  
 
      void eg(void)  
      {  
        wibble * ptr; // ok
        wibble value; // constraint violation
        ptr = malloc(sizeof *ptr);  
                      // constraint violation  
      }  
</pre><p> 
    The <tt class="code">wibble</tt> type's representation is defined in its source file and so only code in the source file can create wibble objects. Furthermore, these wibble objects have to be returned to users as pointers. These two constraints mean the created objects cannot have auto storage class. This is a great loss since auto storage class alone of the three storage class options allows the clients to decide where the objects live which can greatly improve locality of reference. 
  </p><p> 
    So how can <tt class="code">wibble</tt>'s source file actually create them? The first possibility is to use an object with auto storage class:
  </p>
<pre class="programlisting">
      ...  
      wibble * wopen(int value)  
      {  
        wibble opened;  
        ...  
        return &amp;opened; // very very bad
      }  
      ...  
</pre><p> 
    A second possibility is to create the objects with static storage class: 
  </p>
<pre class="programlisting">
      ...  
      static wibble wstorage[42];  
      static size_t windex = 0;  
      ...  
      wibble * wopen(int value)  
      {  
        wibble * opened = wstorage[windex];  
        windex++;  
        ...  
        return opened;  
      }  
      ...  
</pre><p> 
    The final possibility is to create the objects with allocated storage class. That is, to create the objects dynamically on the heap: 
  </p>
<pre class="programlisting">
      ...  
      wibble * wopen(int value)  
      {  
        wibble * opened = malloc(sizeof *opened);  
        ...  
        return opened;  
      }  
      ...  
</pre><p> 
    The static and the allocated approaches have opposing advantages and disadvantages. Static storage is very fast and doesn't fragment the memory but the type has to decide the maximum number of objects the application will need. That might be a dependency going in the wrong direction. Allocated storage is much slower and can create memory fragmentation issues, but the application decides how many objects it needs. 
  </p><p> 
    In short, the classic ADT technique creates an abstraction that is very opaque and pays a hefty price for this 'over-abstraction'. Abstracting away the representation also abstracts away the size details of a type and it is the loss of the size information that creates the storage class restrictions. The Shadow Data Type implementation technique attempts to rebalance these forces of abstraction by separating size abstraction from representation abstraction. 
  </p><h2>
    Shadow Data Types
  </h2><p> 
    The term Shadow Data Type, in contrast to Opaque Data Type, is not a well established term. The technique has probably been around for a long time, it's just that it doesn't seem to have ever been documented anywhere and so a term for it has never become established. I've chosen the term Shadow Data Type to try and convey the idea that when you shine a light on an object it casts a shadow which reveals something of the size of the object but nothing of the details of the object. In other words, a Shadow Data Type has a 'full' type declaration (rather than a forward type declaration) but one revealing only the size of type. 
  </p>
<pre class="programlisting">
      typedef struct  
      {  
        unsigned char size_shadow[16];  
      } wibble;  
 
      void wopen(wibble *, int);  
      ...  
      void wclose(wibble *);  
</pre><p> 
    The 'true' definition of the type (together with its accompanying <tt class="code">#include</tt>s) moves into the source file (Listing 2).
  </p>
<p> 
<table class="sidebartable"><tr><td>
<pre class="programlisting">
    #include &quot;wibble.h&quot;  
    #include &quot;grommet.h&quot;  
    #include &quot;flange.h&quot;  
    #include &lt;string.h&gt;   
    typedef struct  
    {  
        grommet g;  
        flange f;  
    } wibble_rep;  
    // sizeof(wibble) &gt;= sizeof(wibble_rep)  
    void wopen(wibble * w, int value)  
    {  
        wibble_rep rep;  
        ...  
        memcpy(w, &amp;rep, sizeof rep);  
    }  
    ...
</pre></td></tr><tr><td class="title">Listing 2</td></tr></table>
  </p><p> 
    However, there are two problems needing attention. 
  </p><h2>
    Synchronized alignment?
  </h2><p> 
    Firstly, there is no guarantee the two types (<tt class="code">wibble</tt> and <tt class="code">wibble_rep</tt>) are alignment compatible. We can solve this problem. The trick is to create a union containing all the basic types. We don't know which basic types have the strictest alignments but if the union contains them all the union must also have the strictest alignment.
  </p>
<pre class="programlisting">
      typedef union  
      {  
        // one of each of all the basic types go here  
        // including data pointers and function pointers  
      } alignment;  
</pre><p> 
    We redefine <tt class="code">wibble</tt> to be a union with two members; one member to take care of the memory footprint and one member to take care of the alignment: 
  </p>
<pre class="programlisting">
      #include &quot;alignment.h&quot;  
 
      typedef union  
      {  
        unsigned char size_shadow[16];  
        alignment universal;  
      } wibble;  
      ...  
</pre><p> 
    The main problem with <tt class="code">wibble</tt> being a union is that unions are rare. Suppose you want to forward declare the <tt class="code">wibble</tt> type in a header. You're quite likely to forget it's a union. 
  </p>
<pre class="programlisting">
      typedef struct wibble wibble; // Oooops
</pre><p> 
    We can fix this by simply putting the union inside a struct! 
  </p>
<pre class="programlisting">
      #include &quot;alignment.h&quot;  
 
      typedef struct  
      {  
        union  
        {  
          unsigned char size[16];  
          alignment universal;  
        } shadow;  
      } wibble;  
      ...  
</pre><p> 
    This is now sufficiently tricky to warrant an abstraction of its own (Listing 3).
  </p>
<p> 
<table class="sidebartable"><tr><td>
<pre class="programlisting">
    #ifndef SHADOW_TYPE_INCLUDED  
    #define SHADOW_TYPE_INCLUDED  
    #include &quot;alignment.h&quot;  
    #define SHADOW_TYPE(name, size)  \  
      typedef struct                 \  
      {                              \  
        union                        \  
        {                            \  
          unsigned char bytes[size]; \  
          alignment universal;       \  
        } shadow;                    \  
      } name  
    #endif  
    #include &quot;shadow_type.h&quot;  
    SHADOW_TYPE(wibble, 16);  
</pre></td></tr><tr><td class="title">Listing 3</td></tr></table>
  </p><h2>
    Synchronized size?
  </h2><p> 
    The second problem is hinted at by the comment in wibble.c</p>
<pre class="programlisting">
      // sizeof(wibble) &gt;= sizeof(wibble_rep)  
</pre><p> 
    This comment, like all comments, has no teeth. Ideally we'd like an assurance that if the sizes lose synchronization we're told about it. This can be done by asserting the relationship inside a unit test of course. The problem with this is the possibility that the runtime check inside a unit-test won't get run. Or, more likely, that the unit-test simply won't get written at all. Fortunately in this case we can check the relationship using a compile time assertion. We start with the fact that you cannot declare an array of negative size: 
  </p>
<pre class="programlisting">
      extern char wont_compile[-1];   
      extern char will_compile[+1];  
</pre><p> 
    Now we have to select a size of either +1 or -1 if the asserted expression is true or false respectively. 
  </p>
<pre class="programlisting">
      // may or may not compile  
      extern char compile_time_assert[sizeof(wibble)   
         &gt;= sizeof(wibble_rep) ? +1 : -1];  
</pre><p> 
    Hiding this mechanism behind a macro inside a dedicated header helps to make the code more Intention Revealing (Listing 4).
  </p>
<p> 
<table class="sidebartable"><tr><td>
<pre class="programlisting">
    #define COMPILE_TIME_ASSERT(description,        \  
       expression) extern char                      \  
       description[ (expression) ? 1 : -1 ]  
    ...  
    #include &quot;compile_time_assert.h&quot;  
    ...  
    COMPILE_TIME_ASSERT(  
    sizeof_wibble_is_not_less_than_sizeof_wibble_rep,  
    sizeof(wibble) &gt;= sizeof(wibble_rep));  
    ...
</pre></td></tr><tr><td class="title">Listing 4</td></tr></table>
  </p><p> 
    Note that the assertion uses <tt class="code">&gt;=</tt> rather than <tt class="code">==</tt>. This allows binary compatibility with any alternative smaller representation. 
  </p><p> 
    It's worth spending a few moments to think about alignment carefully. The <tt class="code">wibble</tt> type contains a union to give us the strictest alignment. This means a single <tt class="code">wibble_rep</tt> and a single <tt class="code">wibble</tt> can overlay each other in either direction.
  </p><p> 
    If we create an array of <tt class="code">wibble</tt>s the compiler will ensure the address of each <tt class="code">wibble</tt> is suitably aligned. To do this it may add trailing padding to the <tt class="code">wibble</tt> type but this padding will be reflected by <tt class="code">sizeof(wibble)</tt>. Similarly, any padding for the <tt class="code">wibble_rep</tt> type will also be reflected by <tt class="code">sizeof(wibble_rep)</tt>.
  </p><p> 
    Importantly, since <tt class="code">sizeof(wibble_rep)</tt> may be strictly less than <tt class="code">sizeof(wibble)</tt> we cannot overlay an array of either type directly onto an array of the other type.
  </p><p> 
    However, we are only concerned with creating an array of wibbles since that is the type the client uses. There should never be any need to create an array of <tt class="code">wibble_reps</tt>. Nevertheless, the .c file implementation must always do any array pointer arithmetic in terms of <tt class="code">wibble</tt>s and never in terms of <tt class="code">wibble_rep</tt>s.
  </p><p> 
    Note also that using <tt class="code">&gt;=</tt> rather than <tt class="code">==</tt> allows binary compatibility with any alternative smaller representation. 
  </p><h2>
    Casting the shadow
  </h2><p> 
    Inside the source file we can create a helper function to overlay the true representation onto the client's memory (the fragment in Listing 5 uses the dot designator syntax introduced in c99).
  </p>
<p> 
<table class="sidebartable"><tr><td>
<pre class="programlisting">
    static inline void shadow(wibble * dst, wibble_rep * src)  
    {  
        memcpy(dst, src, sizeof *src);  
    }  
 
    bool wopen(wibble * w, const char * name)  
    {  
        wibble_rep rep =   
        {  
            .g = ...,  
            .f = ...,  
        };  
        shadow(w, &amp;rep);  
        ...  
    }
</pre></td></tr><tr><td class="title">Listing 5</td></tr></table>
  </p><p> 
    Careful use of <tt class="code">memcpy</tt> can help to make the wibble functions behave atomically from the users perspective. That is to say, the function can do the work off to the side in a local <tt class="code">wibble_rep</tt>, and copy back into the shadow only if everything is successful. 
  </p><p> 
    An alternative to <tt class="code">memcpy</tt> is to cast the pointer on each access (Listing 6).
  </p>
<p> 
<table class="sidebartable"><tr><td>
<pre class="programlisting">
    static inline wibble_rep * light(wibble * w)  
    {  
      return (wibble_rep *)w;  
    }  
    void wclose(wibble * w)  
    {  
      wibble_rep * rep = light(w);  
      rep-&gt;g = ...;  
      rep-&gt;f = ...;  
      ...  
    }
</pre></td></tr><tr><td class="title">Listing 6</td></tr></table>
  </p><h3>
    Constness?
  </h3><p> 
    It makes no sense to declare a <tt class="code">wibble</tt> object with a <tt class="code">const</tt> modifier unless the object can be initialized. 
  </p>
<pre class="programlisting">
      void pointless(void)  
      {  
        const wibble w; // :-(
        // ... ?  
      }  
</pre><p> 
    However, this is not an issue since the <tt class="code">wibble</tt> type is opaque anyway. Nevertheless, a slight redesign can accommodate <tt class="code">const</tt><tt class="code">wibble</tt> objects if desired, at the cost of copying struct objects (Listing 7).
  </p>
<p> 
<table class="sidebartable"><tr><td>
<pre class="programlisting">
    ...  
    wibble wopen(int value)  
    {  
      wibble_rep rep = { ...value... };  
      wibble w;  
      memcpy(&amp;w, &amp;rep, sizeof rep);  
      return w;  
    }  
    void ok(void)  
    {  
      const wibble w = wopen(42);  
      ...  
    }
</pre></td></tr><tr><td class="title">Listing 7</td></tr></table>
  </p><h2>
    Summary
  </h2><p> 
    In C it is impossible to expose a type's size without also exposing its representation. It is possible to explicitly specify a concrete type's representation as being 'unpublished' but since C does not offer the C++ private keyword using the representation is always possible and remains a constant temptation.. Once one piece of client code succumbs more are sure to follow and like a dam bursting the client and implementation quickly become tightly coupled and any separation is washed away. 
  </p><p> 
    Completely hiding a type's representation behind an opaque pointer/handle removes the temptation and creates a powerful abstraction but at the price of hiding the size of the type and the consequent restriction on the storage class of client memory.
  </p><p> 
    A shadow data type offers a half-way house where a type is effectively split into two, with one part exposing the size and the other part holding the representation. The alignment and sizes of the two parts must correspond. Client code is then able to use all three storage class options. The implementation code takes the full load of the extra complexity mapping/overlaying between the split parts. One interesting observation is that the client code would be unaffected (other than needing recompilation) if the representation was moved back into the client side type (to try and improve performance perhaps). 
  </p><p> 
    No mechanism is universally applicable and the shadow data type is no exception! Experience and time alone will tell if and how useful it is. Caveat emptor.</p></p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
