    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: How To Write A Loop</title>
        <link>https://members.accu.org/index.php/articles/353</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>




<div class="xar-mod-head"><span class="xar-mod-title">Programming Topics + Overload Journal #55 - Jun 2003</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c65/">Programming</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c78/">Overload</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c192/">55</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/c65-192/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/articles/c65+192/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;How To Write A Loop</h1>
<p><strong>Author:</strong>&nbsp;</p>
<p>
<strong>Date:</strong> 02 June 2003 22:57:04 +01:00 or Mon, 02 June 2003 22:57:04 +01:00</p>
<p><strong>Summary:</strong>&nbsp;</p>
<p><strong>Body:</strong>&nbsp;<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e18" id="d0e18"></a>Direct
Repetition</h2>
</div>
<pre class="programlisting">
cout &lt;&lt; 1 &lt;&lt; endl;
cout &lt;&lt; 2 &lt;&lt; endl;
cout &lt;&lt; 3 &lt;&lt; endl;
</pre>
<p>This small code fragment writes the values <tt class=
"literal">1</tt>, <tt class="literal">2</tt>, and 3 to <tt class=
"literal">cout</tt>. You'd be hard pressed to write this code more
directly. It writes the value <tt class="literal">1</tt>, then it
writes the value <tt class="literal">2</tt>, then it writes the
value <tt class="literal">3</tt>. Writing repeated code like this
is unrewarding, understanding it is tedious, and modifying it is
error-prone and painful. Of course, programmers almost never
express repetition like this; they express the repetition more
succinctly by adding a level of indirection. Hopefully, by adding a
little stylized software you can remove a lot of code elsewhere.
Less code, more software. (Note however that in the right context
loop unrolling is a useful optimization technique).</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e43" id="d0e43"></a>Indirect
Repetition</h2>
</div>
<pre class="programlisting">
for (int value = 1; value &lt;= 3; ++value) {
  cout &lt;&lt; value &lt;&lt; endl;
}
</pre>
<p>This is without question a more succinct way of expressing
repetition. If you want to write out the values <tt class=
"literal">1</tt> to <tt class="literal">42</tt> simply change the
<tt class="literal">3</tt> in the continuation condition into
<tt class="literal">42</tt>. However, there is a price to pay for
this brevity - the purpose of the code is now expressed less
directly. This is the price of indirection. It is totally clear
what the purpose of the first code fragment is, to write <tt class=
"literal">1</tt>, to write <tt class="literal">2</tt>, and to write
<tt class="literal">3</tt>. It is not so directly clear what the
purpose of the <tt class="literal">for</tt> statement is. To
understand the <tt class="literal">for</tt> statement you have to
master the extra complexity: understand <tt class=
"literal">for</tt>'s semantics, mentally examine the
initialization, continuation condition, and update parts, and make
the correct logical deductions. Experienced programmers know that
although the logical deductions required appear trivial and easy,
they are in fact fraught with traps and pitfalls. Experienced
programmers also know that avoiding mistakes is better than making
them, then finding them, and then removing them. Debugging is slow.
As a result, programmers tend to learn a few vital mental tools to
avoid loop traps and pitfalls. The two most important tools are
invariants (things that are always true) and intention (programming
on purpose).</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e80" id="d0e80"></a>Invariants</h2>
</div>
<p>Here is the unrolled sequence of statements that comprise the
previous <tt class="literal">for</tt> statement:</p>
<pre class="programlisting">
    initialization: int value = 1;
continuation-condition: (value &lt;= 3); // 1 &lt;= 3, true
                  body: cout &lt;&lt; value &lt;&lt; endl;
                update: ++value;
continuation-condition: (value &lt;= 3); // 2 &lt;= 3, true
                  body: cout &lt;&lt; value &lt;&lt; endl;
                update: ++value;
continuation-condition: (value &lt;= 3); // 3 &lt;= 3, true
                  body: cout &lt;&lt; value &lt;&lt; endl;
                update: ++value;
continuation-condition: (value &lt;= 3); // 4 &lt;= 3, false
</pre>
<p>During this sequence of statements one invariant is &quot;the next
value to be written is always inside <tt class=
"literal">value</tt>&quot;. That's not a useful invariant because it
expresses a truth about the future. A useful invariant expresses a
truth about the past, such as &quot;<tt class="literal">value-1</tt> is
the number of times something has been written&quot;. Let's try it.</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>Before the loop starts, the invariant says that (<tt class=
"literal">value-1</tt>) writes have already occurred. No writes
have occured yet so (<tt class="literal">value-1 == 0</tt>), hence
<tt class="literal">value == 1</tt>. So value is initialized to
<tt class="literal">1</tt>.</p>
</li>
<li>
<p>Then a continuation condition check, and then a write. Now,
because a write has occurred, the invariant is momentarily broken:
value is still <tt class="literal">1</tt>, <tt class=
"literal">1</tt> write has occurred, and <tt class=
"literal">1-1==1</tt> is not true.</p>
</li>
<li>
<p>To maintain the invariant you must change <tt class=
"literal">value</tt> to <tt class="literal">2</tt> since
(<tt class="literal">2-1==1</tt>). The simplest way to change
<tt class="literal">1</tt> to <tt class="literal">2</tt> is via an
increment, so that's what the update part does. The update
maintains the invariant. And, in general, if <tt class=
"literal">N</tt> writes have occurred we need to change value from
<tt class="literal">N</tt> to <tt class="literal">N+1</tt>, which
is the definition of 'increment'.</p>
</li>
<li>
<p>Now the invariant says that (<tt class="literal">2-1==1</tt>)
writes have occurred, which is true.</p>
</li>
</ul>
</div>
<p>Since the invariant is always true you can, in fact, completely
ignore the loop and instead consider the context after the loop.
When the loop has finished the continuation condition <tt class=
"literal">(value &lt;= 3)</tt> must be false, so <tt class=
"literal">!(value &lt;= 3)</tt> must be true, viz, <tt class=
"literal">(value &gt; 3)</tt>. Since value is initialized to
<tt class="literal">1</tt>, and is only ever incremented, it must
be true that <tt class="literal">(value == 4)</tt>. The invariant
says that <tt class="literal">(4-1==3)</tt> writes have occurred,
and three writes have indeed occurred.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e179" id=
"d0e179"></a>!Invariant</h2>
</div>
<p>So, in fact, often the most important thing about the
continuation condition is not the continuation condition itself,
but its negation because it's the negation that's true after the
loop. Looping is easy; it's knowing when to stop that causes the
problems. Consequently you want the negation of the continuation
condition to be as strong as possible. In the example, the
continuation condition was <tt class="literal">(value &lt;= 3)</tt>
and hence its negation was <tt class="literal">(value &gt; 3)</tt>.
In order to strengthen this negation into <tt class=
"literal">(value == 4)</tt> you had to do extra mental work.
However, if you weaken the continuation condition you automatically
strengthen the negation of the continuation condition. So if
instead of writing <tt class="literal">(value &lt;= 3)</tt> as your
continuation condition you write <tt class="literal">(value !=
4)</tt> then when the loop finishes you'll know <tt class=
"literal">(value == 4)</tt> with (almost) no mental effort at all.
Another benefit of weakening the continuation condition is that it
weakens the loop requirements (significant when you consider
function templates):</p>
<pre class="programlisting">
for (int value = 1; value != 4; ++value) {
  cout &lt;&lt; value &lt;&lt; endl;
}
</pre>
<p>At this point it's worth reiterating (sorry) that this code
fragment is less direct than the very first code fragment. In the
first code fragment the values <tt class="literal">1</tt>,
<tt class="literal">2</tt>, <tt class="literal">3</tt> were written
and the values <tt class="literal">1</tt>, <tt class=
"literal">2</tt>, <tt class="literal">3</tt> all appeared directly
in the code. In this latest code fragment the value <tt class=
"literal">3</tt> does not appear at all. Is this a problem? Does
this mean you should use <tt class="literal">(value &lt;= 3)</tt>
instead of <tt class="literal">(value != 4)</tt>? I don't think so.
I think it would be a mistake to base any decision on the first
code fragment. The reason is simple; programmers don't write code
like that. They just don't. Programmers write loops using dedicated
loop constructs. That is the context. The secret of programming is
not to over generalize or to over specialize; to be aware of, and
sensitive to, the immediate problem context.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e233" id=
"d0e233"></a>Dependency</h2>
</div>
<p>A loop has two parts. The loop control:</p>
<pre class="programlisting">
for (int value = 1; value != 4; ++value)
</pre>
<p>and the loop body being controlled:</p>
<pre class="programlisting">
{
  cout &lt;&lt; value &lt;&lt; endl;
}
</pre>
<p>These two parts play different roles. The former governs the
latter, making sure it executes neither too few nor too many times.
There is a clear one-way dependency; the body depends on the
control but not vice versa. Any micro-modification that disturbs
this dependency is ill advised. For example, suppose you rewrite
the for statement like this:</p>
<pre class="programlisting">
for (int value = 1; value != 4; ) {
  cout &lt;&lt; value++ &lt;&lt; endl;
}
</pre>
<p>The loop control now depends on the loop body. In other words
the loop control is dependent on the very thing it is supposed to
be controlling! Entirely wrong. In fact, the control and the body
are becoming intertwined so tightly it's hard to talk about the
control as a separate concept at all. The software is disappearing
and the loop control and the loop body are gelling into an
amorphous lump of code. A lump of code that is less transparent,
harder to reason about and harder to understand.</p>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e250" id=
"d0e250"></a>Separation</h2>
</div>
<p>To avoid this amorphous lump simply don't modify the loop
variable inside the loop. That way the dependency remains a oneway
dependency from the controlled to the controller, the loop control
parts remain separate (and textually together), and the loop
invariant remains transparent. As useful as this well-known piece
of advice is it's not sufficient to protect your loops. It's not
really a generative piece of advice. The most important thing is to
keep the loop control separate from the loop body. Separation of
Concerns. Modifying your loop variable inside its loop body is one
way of breaking the separation and tangling the dependencies but
there are plenty of others. Using <tt class="literal">goto</tt>,
<tt class="literal">break</tt>, <tt class="literal">continue</tt>,
<tt class="literal">throw</tt>, or <tt class="literal">return</tt>
inside the loop body can all have the undesired effect as well.
Here's another example where the loop control and the loop body are
tightly interwoven. Does it write <tt class="literal">1</tt>,
<tt class="literal">2</tt>, and <tt class="literal">3</tt> as
before? Are you sure?</p>
<pre class="programlisting">
int value = 1;
for (;;++value) {
  cout &lt;&lt; value &lt;&lt; endl;
  if (value != 4)
    continue;
  else
    break;
}
</pre>
<p>You might be thinking that advising you not to use <tt class=
"literal">return</tt> statements inside loop bodies is over
zealous. Do I really mean that? Yes I do. Functions that return
something should do so via a single <tt class="literal">return</tt>
statement at the very end of the function. Here are some practical
reasons why:</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p><span class="bold"><b>Change</b></span> The only thing
absolutely guaranteed in software is change (well, maybe corrupt
data too). A function sprinkled with <tt class=
"literal">return</tt> statements will almost certainly break when
changed. Such functions are just too opaque. They are not
transparent. It's too hard to see, let alone reason about, what
effects a change will have. A classic example from C is adding a
statement at the start of the function to acquire a resource (eg
calling <tt class="literal">malloc</tt>) and adding a statement at
the end of the function to release the resource (eg calling
<tt class="literal">free</tt>). Better make sure there's no return
statement in between.</p>
</li>
<li>
<p><span class="bold"><b>No Change</b></span> Software that
separates out its concerns and manages the dependencies between the
separate parts (in particularwhat's dependent on what) is
significantly easier to refactor than software that does not. If
your loop body contains a <tt class="literal">return</tt> statement
then you won't be able to refactor that loop out into another
method. You'll have to first refactor the loop so it doesn't
contain any <tt class="literal">return</tt> statements.</p>
</li>
</ul>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e315" id="d0e315"></a>Half-Open
Interval</h2>
</div>
<p>A quick recap. We started with direct repetition:</p>
<pre class="programlisting">
cout &lt;&lt; 1 &lt;&lt; endl;
cout &lt;&lt; 2 &lt;&lt; endl;
cout &lt;&lt; 3 &lt;&lt; endl;
</pre>
<p>and we've worked through to indirect repetition:</p>
<pre class="programlisting">
for (int value = 1; value != 4; ++value) {
  cout &lt;&lt; value &lt;&lt; endl;
}
</pre>
<p>The value <tt class="literal">3</tt> appears in the direct
version but not in the indirect version. As I've already said, I
don't think this is worth fretting about since programmers never
actually write the first version. However, there is a sense in
which <tt class="literal">3</tt> does appear, albeit indirectly, in
the indirect version. This is because <tt class=
"literal">(1+3==4)</tt> or, equivalently, <tt class=
"literal">(4-1==3)</tt>. Specifying an inclusive lower bound and an
exclusive upper bound is an extremely common, powerful, and
idiomatic way of expressing a loop. It even has a special notation
and name. It's written like this:</p>
<pre class="programlisting">
[1, 4)
</pre>
<p>and it's called a half-open interval. Here are two well-known
examples:</p>
<pre class="programlisting">
// [0, 42)
char array[42];
// ...
const size_t end = 42;
for (size_t at = 0; at != end; ++at) {
  stuff(array[at]);
}
// [array, array+42)
char array[42];
// ...
const char * const end = array + 42;
for (char * at = array; at != end; ++at) {
  stuff(*at);
}
</pre>
<p>Note that:</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>By definition a legal array index cannot be negative so the
first fragment uses a <tt class="literal">size_t</tt> to match this
constraint (<tt class="literal">size_t</tt> is an ISO C/C++
unsigned integer typedef).</p>
</li>
<li>
<p>By definition the size of an object fits into a <tt class=
"literal">size_t</tt>. In other words, the valid indexes of the
elements of an array of size <tt class="literal">N</tt> are
<tt class="literal">[0, N-1]</tt> but only a <tt class=
"literal">size_t</tt> is guaranteed to be able to hold the value
<tt class="literal">N</tt>.</p>
</li>
<li>
<p>The valid indexes of the elements of an array of size <tt class=
"literal">N</tt> are <tt class="literal">[0,N-1]</tt> and their
addresses are <tt class="literal">[array + 0, array + N -1]</tt>
respectively. However, ISO C/C++ explicitly says you can use the
just-past-the-end-address (array + N) in pointer comparisons.</p>
</li>
</ul>
</div>
</div>
<div class="sect1" lang="en">
<div class="titlepage">
<h2><a name="d0e388" id="d0e388"></a>Worked
Example</h2>
</div>
<p>The secret of mastering loops (and in fact, of most programming
tasks) is to work intentionally. That is, to program on purpose
(deliberately) and for a purpose (know what it is you're trying to
do).</p>
<p>Suppose your intention is to search through a range of elements
looking for a value. A loop is just a mechanism to realize this
intention. A loop is, quite literally, a means to an end. If you
concentrate on the loop you're solving the solution rather than
solving the problem. Concentrate on the problem. Always design a
thing by considering it in its next larger context.</p>
<p>If you're searching for an element then either you'll find the
element or you won't. You need to be able to distinguish between
these two possibilities. One of the strengths of a Half-Open
Interval is its exclusive upper bound. When searching:</p>
<pre class="programlisting">
[begin, end)
</pre>
<p>you can make any position (let's call it at) in <tt class=
"literal">[begin, end)</tt> correspond to the position of the
element if found and make <tt class="literal">at == end</tt>
correspond to not finding the element. Like this:</p>
<pre class="programlisting">
// ...
if (at == end) // not found
  ...
else // found
  ...
</pre>
<p>Furthermore, if <tt class="literal">(at == end)</tt> is not true
it means the element at at must equal value: <tt class=
"literal">(*at == value)</tt> in the pointer case and <tt class=
"literal">(array[at] == value)</tt> in the indexer case.</p>
<pre class="programlisting">
// ...
if (at == end) // not found
  ...
else { // found
  assert(*at == value);
  ...
}
</pre>
<p>Now we understand the problem context we can start to think
about a solution. If we use a loop then once the loop finishes the
following must be true:</p>
<pre class="programlisting">
(at == end) || (*at == value)
</pre>
<p>From the earlier !Invariant section we know that this expression
is the negation of the continuation condition. In other words the
continuation condition must be:</p>
<pre class="programlisting">
!((at == end) || (*at == value))
</pre>
<p>which, using De Morgans Law, is the same as this:</p>
<pre class="programlisting">
!(at == end) &amp;&amp; !(*at == value))
</pre>
<p>which is the same as this:</p>
<pre class="programlisting">
(at != end) &amp;&amp; (*at != value)
</pre>
<p>Which means &quot;(we're not at the end) and (we haven't found
<tt class="literal">value</tt>)&quot;. Note how you can't swap the left
and right arguments to &amp;&amp; because the left side acts a
validity check on the right side. It's now just a matter of
completing the loop by filling in the initialization part and the
update part. Note that you can't declare the loop variable in the
initialization part of a <tt class="literal">for</tt> statement
(since it would be out of scope at the if statement).</p>
<pre class="programlisting">
char array[42];
// ...
const char * const end = array + 42;
char * at = array;
for (; at != end &amp;&amp; *at != value; ++at) {
  ;
}
if (at == end) // not found
  ...
else // found
  ...
</pre>
<p>In the majority of cases finding the value is considered the
successful outcome. It's usually best to emphasize the positive
case rather than the negative case so a lot of programmers write
the if statement like this:</p>
<pre class="programlisting">
// ...
if (at != end) // found
  ...
else // not found
  ...
</pre>
<p>The empty initialization part and the empty loop body are
noticeable. You might be tempted to rewrite the fragment like
this:</p>
<pre class="programlisting">
char array[42];
// ...
const char * const end = array + 42;
char * at = array;
while (at != end &amp;&amp; *at != value) {
  ++at;
}
if (at != end) // found
  ...
else // not found
  ...
</pre>
<p>This is possibly a minor improvement. However, a much more
relevant point is that to search another array you'd have to write
another identically structured loop. Copy-and-paste duplication is
a bad thing but it hints at something very important; that you have
formed a common pattern of use to conquer similar problems. Instead
of copying and pasting you should be considering how to capture and
name the common pattern of use in a higher level abstraction. How
about a function:</p>
<pre class="programlisting">
char * find(char * begin, char * end, char
value) {
  while (begin != end &amp;&amp; *begin != value) {
    ++begin;
  }
  return begin;
}
</pre>
<p>Or a function template:</p>
<pre class="programlisting">
template&lt;typename iterator_type,
         typename value_type&gt;
iterator_type
find(iterator_type begin,
     iterator_type end,
     const value_type &amp; value) {
  while (begin != end &amp;&amp; *begin != value) {
    ++begin;
  }
  return begin;
}
</pre>
<p>It's a mistake to think that these abstractions are &quot;too small&quot;
to warrant existence. And so finally, we end up with a code
fragment that is clear, concise, transparent, and intention
revealing:</p>
<pre class="programlisting">
char array[42];
// ...
const char * const end = array + 42;
char * at = find(array, end, value);
if (at != end) { // found it
  ...
}
</pre>
<p>The form of this article as well as the content of the first two
sections were collectively written by a dozen or so people during a
Birds of a Feather session at the ACCU 2003 Spring Conference. Many
thanks to everyone who contributed.</p>
</div>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
