    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: Code Critique Competition 96</title>
        <link>https://members.accu.org/index.php/journals/2169</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">CVu Journal Vol 27, #5 - November 2015 + Programming Topics</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c77/">CVu</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c355/">275</a>
                    (10)
<br />

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c65/">Programming</a>
                    (877)
<br />

                                            <a href="https://members.accu.org/index.php/journals/c355-65/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/journals/c355+65/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;Code Critique Competition 96</h1>
<p><strong>Author:</strong>&nbsp;Martin Moene</p>
<p>
<strong>Date:</strong> 04 November 2015 09:11:52 +00:00 or Wed, 04 November 2015 09:11:52 +00:00</p>
<p><strong>Summary:</strong>&nbsp;Set and collated by Roger Orr. A book prize is awarded for the best entry.</p>
<p><strong>Body:</strong>&nbsp;<p class="EditorIntro">Participation in this competition is open to all members, whether novice or expert. Readers are also encouraged to comment on published entries, and to supply their own possible code samples for the competition (in any common programming language) to scc@accu.org.</p>

<p class="EditorIntro">Note: If you would rather not have your critique visible online, please inform me. (We will remove email addresses!)</p>

<h2>Last issueâ€™s code</h2>

<p class="blockquote">I have some C code that tries to build up a character array using <code>printf</code> calls but Iâ€™m not getting the output I expect. Iâ€™ve extracted a simpler program from my real program to show the problem.</p>

<p class="blockquote">With one compiler I get <code>&quot;Rog&quot;</code> and with another I get <code>&quot;lburp@&quot;</code>.</p>

<p class="blockquote">Iâ€™m expecting to see:</p>

<p class="blockquote"><code>  &quot;Roger:  10</code></p>

<p class="blockquote"><code>   Bill:   5</code></p>
<p class="blockquote"><code>   Wilbur: 12&quot;</code></p>
<p class="blockquote">What have I done wrong?</p>

<p>Can you give some advice to help this programmer?</p>

<p>The code is in Listing 1.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
#include &lt;stdio.h&gt;
#define ARRAY_SZ(x) sizeof(x)/sizeof(x[0])
typedef struct _Score
{
  char *name;
  int score;
} Score;

void to_string(Score *scores, size_t n,
  char *buffer, size_t len)
{
  for (size_t i = 0; i &lt; n; i++)
  {
    size_t printed = snprintf(buffer, len,
      &quot;%s:\t%u\n&quot;,
      scores[i].name, scores[i].score);
    buffer += printed;
    len -= printed;
  }
}
void process(char buffer[])
{
  Score sc[] = {
    { &quot;Roger&quot;, 10 },
    { &quot;Bill&quot;, 5 },
    { &quot;Wilbur&quot;, 12 },
  };
  to_string(sc, ARRAY_SZ(sc),
    buffer, ARRAY_SZ(buffer));
}
int main()
{
  char buffer[100];
  process(buffer);
  printf(buffer);
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 1</td>
	</tr>
</table>

<h2>Critiques</h2>

<h3>Mathias Gaunard</h3>

<p>The main problem of this snippet is the <code>ARRAY_SZ</code> macro, meant to compute the size of an array. This macro will accept pointers as input but provide the wrong answer, in this case <code>sizeof(char*)/sizeof(char)</code>, which is the word size, 4 bytes for 32-bit systems. This explains why the result is <code>&quot;Rog&quot;</code> on some systems; only 4 bytes were written, 3 characters plus the null byte.</p>

<p>With C++, it is possible to write a function that provides the same functionality but that will lead to errors whenever pointers are passed, by passing the array by reference and using templates to deduce its size:</p>

<pre class="programlisting">
  template&lt;class T, size_t N&gt;
  size_t array_sz(T(&amp;)[N]) { return N; }</pre>

<p>By using this function instead of <code>ARRAY_SZ</code>, we can easily isolate the errors. One way to fix this is to modify <code>process</code> to also take the array by reference.</p>

<pre class="programlisting">
  template&lt;size_t N&gt;
  void process(char (&amp;buffer)[N]);</pre>
  
<p>This will make the code work and display the expected result, however, it still has an issue with how it handles the case where the buffer is not large enough to hold all score listings. As we saw earlier, the code incorrectly claimed that the buffer was word-size-sized, but it should still have always given <code>&quot;Rog&quot;</code> as a result, and never <code>&quot;lburp@&quot;</code>. <code>snprintf</code> can return a negative value on failure, and returns the number of characters it would have written if the buffer is not big enough. It is easy to trigger erroneous output like <code>&quot;lbur&quot;</code> simply by hardcoding <code>printed</code> to <code>(size_t)-1</code>.</p>

<p>It is therefore necessary to do</p>

<pre class="programlisting">
  printed = min(printed, len)</pre>
  
<p>to avoid writing past the end or before the start of the buffer. Alternatively, one could use dynamically-sized buffers to not impose arbitrary limits, which can be done easily by using C++ iostreams.</p>
<p>Other miscellaneous issues:</p>

<ul>
	<li><code>_Score</code> is a name reserved for the implementation, and should not be used. All names starting with an underscore followed by an uppercase letters are reserved.  By using C++, the typedef/struct becomes redundant anyway, so one should just use <code>struct Score</code>.</li>
	
	<li>The type of the <code>name</code> member in <code>Score</code> should be <code>const char*</code> rather than <code>char*</code>, since it is initialized from string literals, which are <code>const</code> in C++, and shouldnâ€™t be modified regardless since it is undefined behaviour to do so.</li>
	
	<li>The first argument to <code>printf</code> should be a format, it is therefore potentially dangerous to call <code>printf(buffer)</code> directly in case we ever chose to put special characters in the score entries. Instead, one should write <code>printf(&quot;%s&quot;, buffer);</code>.</li>
</ul>

<h3>Peter Sommerlad</h3>

<p>First of all, if a C program misbehaves, I suggest compiling it with a C++ compiler.</p>

<p>Sidenote: Well, that is what I would do and also I would highly recommend to no longer use C at all. With the exception of very, very small targets in the embedded area (8 bit controllers), decent C++ implementations should be available and C should no longer be taught to students. It just lacks any means of abstraction and is too hard to use correctly and even if it is, it is too easy to break things by â€˜maintenanceâ€™.</p>

<p>Now to the problem at hand. Our IDE Cevelop and my C++ compiler tells me several problematic points, including the culprit of the underlying issue.</p>

<p>Let us start our journey in <code>main()</code>:</p>

<pre class="programlisting">
  int main()
  {
    char buffer[100];
</pre>

<p>The array line is something I wouldnâ€™t have done in C++ at all and I was recommending of not allocating arrays on the stack in C at all, if you can not control the size written. Code like that is behind most security vulnerabilities in the past decades. Cevelop (set to parse as C++14) will claim</p>

<ul>
	<li>Found C-Array: {0}</li>
	
	<li>Un- or ill-initialized variable found</li>
</ul>

<p>The first note is that we try to rewrite code using plain arrays to use the better and copyable alternative <code>std::array</code> instead (or <code>std::string</code>). The second comes from our C++11 checker that looks for variables not initialized with the initializer list syntax and transforms them. Even in C code this might be a good idea, since the memory of buffer can contain arbitrary values (only by luck some bytes are 0 if we use C-style string handling in the general case).</p>

<p>Before we apply the quick fixes let us go further to the code of <code>main()</code>.</p>

<pre class="programlisting">
  process(buffer);</pre>
  
<p>Here an array is passed as a function argument and our student should know that any array passed as such degenerates to a pointer to the first element and the dimension of the array is lost. This is still true in C++ and therefore I recommend to never use a plain array as a function parameter or as an argument. It just works without any safeguards and having a programmer deal with array bounds checking explicitly is very error prone and again leads to another big share of software vulnerabilities.</p>

<pre class="programlisting">
  printf(buffer);</pre>

<p>Even the innocent looking last line is problematic. The <code>printf</code> function takes at least one <code>char</code> pointer argument, but interprets its content. Since we will not know what content our process function is storing in the buffer array, we do not know if <code>printf</code> would be expecting more arguments, because buffer contains any special <code>printf</code> formatting symbols, such as <code>&quot;%s&quot;</code>, for example. This could lead to leaking unintended information or in the better case to program crashes, because of illegal memory addresses accessed. It is a similar problem as with the XKCD comic (<a href="https://xkcd.com/327/">https://xkcd.com/327/</a>) with <code>printf</code> syntax instead of SQL.</p>

<p>A better C function to output a plain C string would be</p>
<pre class="programlisting">
  puts(buffer);</pre>

<p>instead, which at least just expects buffer to be <code>NUL</code> terminated.</p>

<p>Wow, a full screen full of text with just 3 lines of code. Roger selected a good example this time!</p>

<p>Now let us look one step down in the call chain to process:</p>

<pre class="programlisting">
  void process(char buffer[])</pre>
  
<p>as said above, this array parameter is equivalent to writing it as a pointer parameter</p>

<pre class="programlisting">
  void process(char *buffer)</pre>
  
<p>and that will make the real error below (stay tuned) much more evident.</p>

<pre class="programlisting">
{
  Score sc[] = {
    { &quot;Roger&quot;, 10 },
    { &quot;Bill&quot;, 5 },
    { &quot;Wilbur&quot;, 12 },
  };</pre>
  
<p>The lines above again triggers two messages, one from our plug-in and one from the C++ compiler:</p>

<ul>
	<li>Found C-Array: {0}</li>
	
	<li>ISO C++ forbids converting a string constant to '<code>char*</code>'</li>
</ul>

<p>The first one tells us again that plain arrays should be verboten. We can automatically change that to use std::array&lt;Score,3&gt; instead, but that would be C++ already. The second warning tells us something about the differences between C and C++, where C++ makes string literal const char arrays, whereas classic C made those char pointers (even without const, which wasn't invented then). So in theory, one could overwrite the literal values for the names given here. This also tells something about the type Score that keeps plain char pointers and no size information. In production code, this is a no-no, because you can not make any safe memory management around such a string representation.</p>

<p>And now to the definitive culprit.</p>

<pre class="programlisting">
    to_string(sc, ARRAY_SZ(sc),
     buffer, ARRAY_SZ(buffer));
  }</pre>
  
<p><code>ARRAY_SZ(buffer)</code> expands to</p>

<pre class="programlisting">
  sizeof(buffer)/sizeof(buffer[0])</pre>
  
<p>and this generates the compiler warning:</p>

&lt;ArticleBodyIndent&gt;
'sizeof' on array function parameter 'buffer' will return size of 'char*' [-Wsizeof-array-argument]&lt;/ArticleBodyIndent&gt;
<p>which we can compute on a 64 bit computer to become 8 instead of the 100 our student expected and the equivalent of the array with the pointer makes it obvious. <code>to_string</code> is by the way a bad name for new code, because it is now a function overload in C++â€™s <code>std</code> namespace and could make problems, when this code is naively ported to C++ with a using namespace <code>std</code> in scope (BTW, Cevelop provides a refactoring away from such practice). Cevelop also suggest to change the macro for <code>ARRAY</code> to a C++ inline (template) function and can do so. In addition this will avoid the potential bug, since the macro is not having parenthesis around its expansion and not around the second use of the parameter. This can cause parsing havoc if used with non-literals as arguments.</p>

<p>So a more correct version of</p>

<pre class="programlisting">
  #define ARRAY_SZ(x) sizeof(x)/sizeof(x[0])
<p>would be</p>
  #define ARRAY_SZ(x) (sizeof(x)/sizeof((x)[0]))
<p>but even better in C++11:</p>
  template&lt;typename T1&gt;
  inline constexpr auto ARRAY_SZ(T1&amp;&amp; x) -&gt;
    decltype(sizeof (x) / sizeof (x[0]))
  {
    return (sizeof (x) / sizeof (x[0]));
  }</pre>

<p>But again that is to no avail in the correct working of process. There are several problems to be resolved with its design. First, passing the buffer from the outside, requires cautions use of its memory resource, so we need to pass the size of the available space. But there is still a problem, we do not know, if the size was sufficient or not. In such a case, it might be better to return an indication if the size was sufficient. This could be a <code>bool</code>, denoting success (not very helpful), the size actually used and an error code, if insufficient (i.e., -1) or best, the size required for the output, if it was too small, which could be a burden to implement and to use. In any case, passing in memory (buffer) to be used as function output is a hassle. In C++ I would recommend to use an <code>ostringstream</code> which will automatically expand an underlying string object and return that by value. This is one area where C is really a burden vs. C++.</p>

<p>It is also unrealistic, that the <code>Score</code> array <code>sc</code> is local to the function, so this must be passed as well, again explicitly passing its size, meaning we need define 4 parameters and pass 4 interdependent arguments. Using a <code>std::vector&lt;Score&gt;</code> in C++ would simplify that again.</p>

<p>Now to the last function in the code where the student actually shows, that she/he understands a bit about passing arrays as pointers with an explicit size. But again, such an API (inherent in C) is easy to use wrongly (as we have seen) by passing inconsistent sizes. Also a return type of <code>void</code> signals side effects, the code can have quite unintended ones.</p>

<pre class="programlisting">
  void to_string(Score *scores, size_t n,
    char *buffer, size_t len)
  {
    for (size_t i = 0; i &lt; n; i++)
    {
      size_t printed = snprintf(buffer, len,
        &quot;%s:\t%u\n&quot;,
        scores[i].name, scores[i].score);
      buffer += printed;
      len -= printed;
    }
  }</pre>
  
<p>Again, here C gets in the way of safe usage of memory. First <code>snprintf</code>â€™s return type is <code>int</code>, and there are obscure cases, where it might return a negative number. Second, if the space allowed is too small, it still returns the number of characters that would be needed, so the code will miscalculate the value of <code>len</code>, which might underflow. While underflow and overflow does not trigger undefined behaviour with unsigned types, such as <code>size_t</code>, the resulting value is still well beyond the capacity of the buffer and thus can easily lead to overwriting memory far away from the allotted memory, resulting in another severe security issue or potential crashes from overwriting return addresses.</p>

<p>A quick fix, would be to at least check if printed is smaller than <code>len</code> and only then proceed and otherwise return from the function.</p>

<pre class="programlisting">
  if(printed &gt;= len) return;</pre>
  
<p>I think I covered most of the problems by now. So how would a refactored C++ version look?</p>

<p>Here is one, that uses the recommendation I put above and getting rid of most of the bad C habits. Its overall design is still not what I consider splendid, but without more context it would be hard to judge if <code>Score</code> should have a constructor, be <code>const</code> or have member functions/related functions, such as an overloaded output <code>operator&lt;&lt;</code>.</p>

<pre class="programlisting">
  #include &lt;cstdio&gt;
  #include &lt;sstream&gt;
  #include &lt;vector&gt;
  struct Score
  {
    std::string name;
    int score;
  };
  using ScoreList=std::vector&lt;Score&gt;;
  std::string to_string(ScoreList const &amp;scores)
  {
    std::ostringstream out{};
    for (auto const &amp;score:scores)
    {
      out &lt;&lt; score.name &lt;&lt; '\t'
          &lt;&lt; score.score &lt;&lt; '\n';
    }
    return out.str();
  }
  std::string process()
  {
    ScoreList sc {
      { &quot;Roger&quot;, 10 },
      { &quot;Bill&quot;, 5 },
      { &quot;Wilbur&quot;, 12 },
    };
    return to_string(sc);
  }
  int main()
  {
    std::puts(process().c_str());
  }</pre>
  
<p>Hope that helps some programmers to become better programmers and write less code.</p>

<h3>Gareth Ansell &lt;gareth.ansell@sky.com&gt;</h3>

<p>While trying to solve this puzzle I was initially tempted to dive straight for the compiler and start hacking away at the code. However, I managed to resist this urge and engaged brain instead. After which the solution was quite easy to spot. I then modified the code to test my hypothesis, which was proved correct.</p>

<p>The initial problem is in the <code>process()</code> function, where the buffer array is passed as an argument. Since this is decomposed to a pointer, it is not possible for the subsequent call to the <code>to_string()</code> function to determine the size of <code>buffer[]</code> by using <code>sizeof</code> (in the <code>ARRAY_SZ</code> macro).</p>

<p>In C the solution to this is to add a <code>length</code> parameter to the <code>process()</code> function, and use this in the call to <code>to_string()</code>. In C++, a templated solution could be used.</p>

<p>Apart from this there are a few minor niggles:</p>

<ol>
	<li>In <code>_Score</code> <code>score</code> is an <code>int</code>, but in the call to <code>snprintf</code> it is referenced as an <code>unsigned</code>.</li>
	
	<li>The last element of the <code>sc[]</code> array in <code>process[]</code> has a trailing comma</li>
	
	<li>The <code>for</code> loop in <code>to_string()</code> compares a signed to unsigned <code>int</code>.</li>
</ol>

<p>My working solution is shown below:</p>

<pre class="programlisting">
  #include &lt;stdio.h&gt;
  #define ARRAY_SZ(x) sizeof(x)/sizeof(x[0])
  typedef struct _Score
  {
    char *name;
    unsigned int score;
  } Score;
  void to_string(Score *scores, size_t n,
    char *buffer, size_t len)
    {
      for(unsigned int i = 0; i &lt; n; i++)
      {
        size_t printed = snprintf(buffer, len,
          &quot;%s:\t%u\n&quot;,
          scores[i].name, scores[i].score);
        buffer += printed;
        len -= printed;
      }
    }
  void process(char buffer[], size_t len)
  {
    Score sc[] = {
      { &quot;Roger&quot;, 10 },
      { &quot;Bill&quot;, 5 },
      { &quot;Wilbur&quot;, 12 }
    };
    to_string(sc, ARRAY_SZ(sc), buffer, len);
  }</pre>

<h3>Matthew Wilson</h3>

<p>This one is rather simple, I think: itâ€™s an issue of array-&gt;pointer decay.</p>

<p>The <code>ARRAY_SZ()</code> macro follows a familiar pattern in many codebases, used to fill the obvious missing <code>dimensionof()</code> operator that should exist in C (and C++). It works by creating a compile-time constant representing the number of elements in an array by dividing the total size of the array by the size of an element. Works fine for arrays.</p>

<p>The problem is, it also works for pointers and, in C++, for types defining the subscript operator. And in such cases the sizes obtained is almost never correct (and sometimes not compile-time constant). This is all discussed at length in chapter 14 â€“ Arrays and Pointers â€“ of my book <em>Imperfect C++ </em><a href="CCC.xml#[1]">[1]</a>, so I wonâ€™t belabour the point here.</p>

<p>Also discussed in the same chapter of IC++ is the phenomenon of array-pointer decay. Briefly, it allows an array to be used in circumstances where a pointer is required. Also, a function declaration involving an array is interpreted, in a similar vein, as a pointer. Hence, the declaration of <code>process()</code> as</p>

<pre class="programlisting">
  void process(char buffer[])</pre>
  
<p>is exactly equivalent to the declaration</p>

<pre class="programlisting">
  void process(char* buffer)</pre>
  
<p>Expressed in this form, the problem is all too easy to see. The size of <code>ARRAY_SZ(buffer)</code> is going to be 4 (32-bit) or 8 (64-bit), and certainly not the 100 of the actual buffer <code>buffer</code> declared in <code>main()</code>.</p>

<p>The obvious fix is to change <code>process()</code> to have the signature</p>

<pre class="programlisting">
  void process(char* buffer, size_t len)</pre>
  
<p>which is hinted at strongly in the earlier defined <code>to_string()</code>.</p>

<p>When so amended â€“ along with some const-correction, VC++ compatibility, and use of STLSoftâ€™s <code>STLSOFT_NUM_ELEMENTS()</code> (which makes application of <code>ARRAY_SZ()</code> to a pointer a compile-time, rather than runtime, error) â€“ the code looks like:</p>

<pre class="programlisting">
  #include &lt;stlsoft/stlsoft.h&gt;
  #include &lt;stdio.h&gt;

  #if defined(_MSC_VER)
  # define snprintf _snprintf
  #endif
  #define ARRAY_SZ(x) STLSOFT_NUM_ELEMENTS(x)
  typedef struct _Score
  {
    char const* name;
    int         score;
  } Score;
  void to_string(Score const* scores, size_t n,
                 char *buffer, size_t len)
  {
    for (size_t i = 0; i &lt; n; i++)
    {
      size_t printed = snprintf(buffer, len,
        &quot;%s:\t%u\n&quot;,
        scores[i].name, scores[i].score);
      buffer += printed;
      len -= printed;
    }
  }
  void process(char* buffer, size_t len)
  {
    Score const sc[] = {
      { &quot;Roger&quot;, 10 },
      { &quot;Bill&quot;, 5 },
      { &quot;Wilbur&quot;, 12 },
    };
    to_string(sc, ARRAY_SZ(sc),
      buffer, len);
  }
  int main()
  {
    char buffer[100];
    process(buffer, ARRAY_SZ(buffer));
    printf(buffer);
  }</pre>
  
<h4>Reference</h4>
<p class="bibliomixed"><a id="[1]"></a>[1]	<em>Imperfect C++</em>, Matthew Wilson, Addison-Wesley, 2004.</p>

<h3>James Holland &lt;James.Holland@babcockinternational.com&gt;</h3>

<p>My compiler did not provide any helpful hints as the studentâ€™s program compiles without any errors or warnings. When I ran the program, &quot;Rog&quot; was displayed as the student observed. After a little investigation, I find that the problem is in the parameter of <code>process</code>. I suspect than the student is under the impression that an array is being passed to <code>process</code>, after all the type of the parameter, <code>char buffer[]</code>, looks like an array declaration. Unfortunately, despite its looks, the parameter type is equivalent to <code>char * buffer</code>. When the latter declaration is used, it becomes clear that a pointer to <code>buffer</code> is being passed to <code>process</code> and not <code>buffer</code> itself. From within <code>process</code>, it is the size of the pointer that is passed to <code>to_string</code> and not the size of <code>buffer</code>. On my machine, pointers are 4 bytes in length and so the value 4 is being passed to <code>to_string</code>. This explains why the program outputs &quot;Rog&quot;. The function <code>to_string</code>, and ultimately <code>snprintf</code>, thinks that <code>buffer</code> is only 4 characters long, enough for three characters and the null terminator.</p>

<p>Unfortunately, it is not possible to pass to a function an array by value. Only a pointer to an array can be passed. This presents no great difficulty, however. If in addition to passing a pointer to the array, the length of the array is passed, <code>process</code> will have all the information it needs about <code>buffer</code>. Within <code>process</code>, the length of the output buffer can be passed straight to <code>to_string</code> as shown below.</p>

<pre class="programlisting">
  void process(char * buffer, size_t length)
  {
    Score sc[] = {{&quot;Roger&quot;, 10}, {&quot;Bill&quot;, 5},
      {&quot;Wilbur&quot;, 12},};
    to_string(sc, ARRAY_SZ(sc), buffer, length);
  }</pre>
  
<p>When this modification is made, things look a lot better and the program produces the desired result. There are, however, some unresolved problems than could manifest themselves in the studentâ€™s real program.</p>

<p>It is assumed that the student is using <code>snprintf</code> (as opposed to <code>sprintf</code>) because he/she does not want data to be written beyond the limits of <code>buffer</code>. This is a laudable desire but unfortunately the code, as it stands, does not provide that protection in general. There is still a potential problem when the size of the data to be written exceeds the size of <code>buffer</code>. This is demonstrated more conveniently if the size of <code>buffer</code> is reduced instead of increasing the size of the data.</p>

<p>Suppose, for example, that instead of <code>buffer</code> being declared 100 characters long, it had been declared 17 characters long. The program would correctly write to <code>buffer</code> the string representing the data for Roger. <code>snprintf</code> returns the length of the string (not including the null terminator) that it attempts to write, in this case, 10 characters. This value is assigned to the variable <code>printed</code>. This value is then subtracted from <code>len</code> (that currently has a value of 17) leaving 7. The program then attempts to write the data for Bill. This string is 8 characters long. The program (or more specifically <code>snprintf</code>) writes as much of the data to <code>buffer</code> as it can without overflowing <code>buffer</code>. Although the output string has been truncated, nothing disastrous (as far as program execution is concerned) has occurred. Next, the length of string for Bill is subtracted from <code>len</code>. The value of <code>len</code> is currently 7 and the length of the string for Bill is 8 characters. The subtraction to be performed is, therefore, 7 minus 8. This is where problems start. The variables <code>printed</code> and <code>len</code> are both of type <code>size_t</code> (an unsigned type). The result of the subtraction is not -1, as expected, but a very large positive number. On my machine the value of <code>len</code> at this point is 4294967295. The program then goes around the loop once more to write the data for Wilbur and, thinking there is plenty of space in <code>buffer</code> (because <code>len</code> is a large number), writes Wilburâ€™s string beyond the end of <code>buffer</code>, probably with disastrous consequences.</p>

<p>This behaviour can easily be prevented by inserting the following code just after the <code>snprintf</code> statement. It will prevent the program going around the loop again when buffer is full and will also print a message explaining the problem.</p>

<pre class="programlisting">
  if (printed &gt;= len)
  {
    printf(&quot;Buffer exhausted. Results may be &quot;
      &quot;incorrect.\n&quot;);
    break;
  }</pre>
  
<p>While on the subject of buffer length, there is another problem that occurs if <code>buffer</code> is declared as having zero size. I admit this is unlikely to occur in an otherwise fully debugged program but there is, at least, a theoretical point to be made here. An array of zero bytes will have an address, so it can be referenced. Such an array will, in all probability and not unreasonably, occupy zero bytes of memory. Given this, <code>printf</code> (as used in the last statement of <code>main</code>) will start printing at a memory location that has nothing to do with buffer. It could print anything of any length, depending on what data it stumbles across. This behaviour must be prevented if there is any possibility of <code>buffer</code> being of zero length. A simple <code>if</code> statement around <code>printf</code> would suffice. I leave the details to the student.</p>

<p>There is another problem with the <code>printf</code> statement; it is being used in a potentially unsafe way. <code>printf</code> is declared (in <span class="filename">stdio.h</span>) as having one or more parameters. The first parameter is a format control string. The studentâ€™s <code>printf</code> statement has one parameter only and so the parameter must be the control string. It can be seen from an inspection of the <code>printf</code> statement that the control string is the contents of <code>buffer</code>. The contents of <code>buffer</code> came originally from the array <code>sc</code>. If one of the names in <code>sc</code> were to be changed, say, from &quot;Roger&quot; to &quot;%sRoger&quot;, the program will probably crash in spectacular fashion. This is because <code>printf</code>â€™s format control string now says there is a string parameter to be printed despite there not actually being one. The result is undefined behaviour. To prevent this, a literal string should be supplied as the format control string and the string to be printed (in this case <code>buffer</code>) supplied as the second parameter as shown below.</p>

<pre class="programlisting">
  printf(&quot;%s&quot;, buffer);</pre>

<p><code>printf</code> will now simply print the contents of <code>buffer</code> without interpreting any characters sequence as format control characters, as required.</p>

<p>Although it does not show itself in the studentâ€™s test program, there is another potential problem lurking in the wings. It is the definition of the macro <code>ARRAY_SZ</code>. Suppose the student decides to make use of this macro somewhere else in his or her code. The student may write something like the following statement, for whatever reason.</p>

<pre class="programlisting">
  size_t x = 25 % ARRAY_SZ(sc);</pre>
  
<p>Let us assume that <code>sc</code> is the same array as defined in the studentâ€™s test program. <code>ARRAY_SZ(sc)</code> should, therefore, produce the value 3. The expression becomes 25 % 3 which is equal to 1. The variable <code>x</code> should, therefore, be initialised with the value of 1. In fact <code>x</code> is assigned a value of 0, contrary to all expectation. How can this be? All becomes clear if the macro is expanded to produce the expression seen by the compiler, as illustrated below.</p>

<pre class="programlisting">
  size_t x = 25 % sizeof(sc) / sizeof(sc[0]);</pre>
  
<p>The expression will be evaluated from left to right. So the first term that is evaluated is <code>25 % sizeof(sc)</code> or 25 % 24 which equals 1. Next, the term <code>sizeof(sc[0])</code> is evaluated, this equals 8. Finally, 8 is divided into 1 giving zero. It can now be seen that brackets are required around <code>sizeof(sc) / sizeof(sc[0])</code> so that this part of the expression is evaluated first. This is achieved by enclosing the definition of <code>ARRAY_SZ(x)</code> in parentheses.</p>

<pre class="programlisting">
  #define ARRAY_SZ(x) (sizeof(x)/sizeof(x[0]))</pre>
  
<p>In fact it is good practice to enclose the definition of all but the simplest macros in parentheses to prevent this type of error.</p>

<p>There are a couple of inconsistencies in the program. <code>Score::score</code> is of type <code>int</code> and yet it is being printed by <code>snprintf</code> as if it were unsigned. Assuming <code>Score::score</code> is meant to be signed, then <code>snprintf</code>â€™s format control string should be <code>&quot;%s:\t%d\n&quot;</code>.</p>

<p>Also, quoted text strings are considered constant and yet <code>Score::name</code> is declared as a pointer that can modify what it is pointing to. The declaration <code>const char * name</code> provides the remedy. This will prevent accidentally attempting to write to a constant string as a compile-time error will result.</p>

<p>I noticed that the student provided a tag name when defining the <code>Scores</code> structure by use of a <code>typedef</code> statement. A structure need only have a tag name when the structure makes reference to pointers of the same type. This is not the case in the studentâ€™s program and so the tag name is not required. Incidentally, it looks as if the student has made an attempt to prevent the tag name from clashing with the type name by use of a leading underscore character. This is not necessary as the two names are in different â€˜name spacesâ€™. The tag and the type could both be named <code>Score</code>.</p>

<p>There is a redundant comma at the end of <code>SC</code>â€™s initialiser list. The compiler is perfectly happy with this and it does not affect the meaning of the program. It is allowed, at least in part, to make the job of automatic code generators a little easier. However, as it is not required, I prefer not to see a trailing comma in hand-written code.</p>

<p>Finally, it might be constructive to discuss the choice of variable names. To some degree this is a matter of personal style but there are some guidelines that should be observed. The names of variables (and other identifiers) should reflect their meaning and should not be excessively abbreviated. I would prefer to see <code>length_of_buffer</code> rather than <code>len</code> and <code>expected_print_length</code> rather than <code>printed</code>, for example. Selecting suitable names can be quite tricky and I am not suggesting my examples are ideal but I do think they help in understanding how the program works. </p>

<p>I have made quite a few corrections and suggestions in reviewing the studentâ€™s code. This should not be seen as a damning criticism designed to dishearten the student. On the contrary, it is designed to encourage the student to complete his or her project in particular and to continue to learn about the fascinating topic of computer programming and software development in general.</p>

<h2>Commentary</h2>

<p>With five pretty comprehensive entries Iâ€™m not sure thereâ€™s much for me to add. About the only thing no-one remarked on was the inconsistent brace positioning!</p>

<p>It seems a shame that <code>snprintf()</code> doesnâ€™t provide a foolproof safe replacement for <code>sprintf()</code> â€“ examples like this code show how easy it is, given the variety of return values and the dangers of implicit conversion between signed and unsigned integer values, to use <code>snprintf()</code> in an unsafe way.</p>

<p>One additional point that might be worth making is that Visual Studio 2015 does provide <code>snprintf()</code> in <span class="filename">&lt;stdio.h&gt;</span> (although I havenâ€™t yet found where this is documented on MSDN.) This function seems to follow the behaviour required by the C11 standard. However, the Microsoft specific function <code>_snprintf()</code> is subtly <em>different</em>; in particular it does not ensure the target buffer is null terminated. This is an additional source of potentially dangerous confusion around this function call. Even beyond that, the implementation of <code>snprintf()</code> in the mingw implementation of g++ 4.9.2 on Windows <em>also</em> fails to ensure the buffer is null terminated. </p>

<p>The moral of this critique is to be extremely careful if you use <code>snprintf()</code> in C code â€“ or in C++.</p>

<h2>The Winner of CC 95</h2>

<p>There were five good critiques and I think each one would have helped the person with the original problem to understand what was wrong and to fix it. However, I think Peterâ€™s critique covered the most ground and I have awarded him the prize for this issueâ€™s critique.</p>

<h2>Code Critique 96</h2>

<p><strong>(Submissions to scc@accu.org by Oct 1st)</strong></p>

<p><em>Thanks are due to Hubert Matthews for the idea that inspired this critique.</em></p>

<p class="blockquote">I have written some code to read in a CSV file and handle quoted strings but I seem to get an extra row read at the end, not sure why.</p>

<p class="blockquote">If I make a file consisting of one line:</p>
<p class="blockquote">--- Book1.csv ---</p>
<p class="blockquote">word,a simple phrase,&quot;we can, if we want, embed commas&quot;</p>
<p class="blockquote">--- ends ---</p>
<p class="blockquote">I get this output from processing the file:</p>

<pre class="programlisting">
    Rows: 2
    Cells: 3
      word
      a simple phrase
      &quot;we can, if we want, embed commas&quot;
    Cells: 1</pre>
	
<p class="blockquote">What have I done wrong?</p>

<p>The code is in Listing 2.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
// Reading CSV with quoted strings.
#include &lt;iostream&gt;
#include &lt;string&gt;
#include &lt;vector&gt;

typedef std::string cell;
typedef std::vector&lt;cell&gt; row;
typedef std::vector&lt;row&gt; table;

table readTable()
{
  char ch;
  table table; // the table
  row *row = 0; // the row
  cell *cell = 0; // the cell
  char quoting = '\0';
  while (!std::cin.eof())
  {
    char ch = std::cin.get();
    switch (ch)
    {
    case '\n':
    case ',':
      if (!quoting) {
        cell = 0;
        if (ch == '\n') {
          row = 0;
        }
        break;
    case '\'':
    case '&quot;':
        if (quoting == ch) {
          quoting = '\0';
        }
        else if (!cell) {
          quoting = ch;
        }
      }
    default:
      if (!row) {
        table.push_back({});
        row = &amp;table.back();
      }
      if (!cell) {
        row-&gt;push_back({});
        cell = &amp;row-&gt;back();
      }
      cell-&gt;push_back(ch);
      break;
    }
  }
  return table;
}

int main()
{
  table t = readTable();

  std::cout &lt;&lt; &quot;Rows: &quot; &lt;&lt; t.size() &lt;&lt; &quot;\n&quot;;
  for (int r = 0; r != t.size(); ++r) {
    std::cout &lt;&lt; &quot;Cells: &quot;
      &lt;&lt; t[r].size() &lt;&lt; &quot;\n&quot;;
    for (int c = 0; c != t[r].size(); ++c)
    {
      std::cout &lt;&lt; &quot;  &quot; &lt;&lt; t[r][c] &lt;&lt; &quot;\n&quot;;
    }
  }
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 2</td>
	</tr>
</table>

<p>You can also get the current problem from the accu-general mail list (next entry is posted around the last issueâ€™s deadline) or from the ACCU website (<a href="http://www.accu.org/journals/">http://www.accu.org/journals/</a>). This particularly helps overseas members who typically get the magazine much later than members in the UK and Europe.</p>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
