    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: Concurrent Programming with Go</title>
        <link>https://members.accu.org/index.php/journals/1947</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">Overload Journal #106 - December 2011 + Programming Topics</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c78/">Overload</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c303/">o106</a>
                    (7)
<br />

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c65/">Programming</a>
                    (877)
<br />

                                            <a href="https://members.accu.org/index.php/journals/c303-65/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/journals/c303+65/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;Concurrent Programming with Go</h1>
<p><strong>Author:</strong>&nbsp;Martin Moene</p>
<p>
<strong>Date:</strong> 01 December 2011 21:53:49 +00:00 or Thu, 01 December 2011 21:53:49 +00:00</p>
<p><strong>Summary:</strong>&nbsp;Concurrency is becoming ever more important. Mark Summerfield looks at the approach of the new language Go.</p>
<p><strong>Body:</strong>&nbsp;<p>The Go programming language is in some respects a radical departure from existing compiled languages. Its syntax, although C-ish, is much cleaner and simpler than C or C++â€™s, and it supports object-orientation through embedding (delegation) and aggregation, rather than by using inheritance. Go has a built-in garbage collector so we never have to worry about deleting/freeing memory â€“ something that can be fiendishly complicated in a multithreaded context. In this article we will focus on another area where Go breaks new ground (at least, compared with other mainstream programming languages): concurrency.</p>

<p>Go has the usual concurrency primitives, such as mutexes, readâ€“write mutexes, and wait conditions, as well as low-level primitives such as atomic adds, loads, and compare and swaps. But Go programmers are encouraged to avoid using any of these and instead to use Goâ€™s high-level <em>goroutines</em> and <em>channels</em>.</p>

<p>A goroutine is a very lightweight thread of execution that shares the same address space as the rest of the program. The <em>gc</em> compiler multiplexes one or more goroutines per operating system thread and can realistically support hundreds, thousands, or more goroutines.</p>
 
<p>A channel is a two-way (or one-way, at our option) communications pipeline. Channels are type safe, and when they are used to pass immutable values (<code>bool</code>s, <code>int</code>s, <code>float64</code>s, <code>string</code>s, and <code>struct</code>s composed of immutable values), they can be used in multiple goroutines without formality. When it comes to passing pointers or references, we must, of course, ensure that our accesses are synchronized.</p>

<p>Incidentally, goroutines and channels are an implentation of a form of CSP (Communicating Sequential Processes), based on the ideas of computer scientist C. A. R. Hoare.</p>

<p>Goâ€™s mantra for concurrency is:</p>

<p class="blockquote">Do not communicate by sharing memory;instead, share memory by communicating.</p>

<p>In this article we will review a simple concurrent program called <code>headcheck</code>, that, given a list of URLs, performs an HTTP HEAD request on each one and reports its results. We will look at a few different ways the program can implement concurrency using Goâ€™s goroutines and channels, to give a flavour of the possibilities.</p>

<p>Listing 1 shows the <code>struct</code>s the program will operate on. We made <code>Job</code> a <code>struct</code> because this is syntactically more convenient when giving it methods.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
type Result struct {
  url          string
  status       int
  lastModified string
}
type Job struct {
  url string
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 1</td>
	</tr>
</table>

<p>Listing 2 shows the <code>main()</code> function. The built-in <code>make()</code> command is used to create channels (as well as values of the built in map and slice collection types).</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
func main() {
  jobs := make(chan Job, nWorkers * 2)
  results := make(chan Result, bufferSize)
  done := make(chan bool, nWorkers)

  go addJobs(jobs)
  for i := 0; i &lt; nWorkers; i++ {
    go doJobs(jobs, results, done)
  }
  go wait(results, done)
  process(results)
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 2</td>
	</tr>
</table>

<p>In Listing 2, both <code>nWorkers</code> and <code>bufferSize</code> are constants (6 and 24; not shown).</p>

<p>The <code>main()</code> function begins by creating three channels, one for passing jobs to worker goroutines, one for receiving all the results, and another to keep track of when each worker goroutine has finished.</p>

<p>By default channels are unbuffered (their size is 0) which means that a receive will block until there is a send and a send will block if thereâ€™s a sent item that hasnâ€™t been received. By buffering we allow a channel to accept as many sends as the size of the buffer, before sends are blocked. Similarly, we can do as many receives as there are items in the buffer, only blocking when the buffer is empty. The purpose of buffering is to improve throughput by minimizing the time goroutines spend being blocked.</p>

<p>In this example we have buffered all the channels by giving <code>make()</code> a second buffer-size argument. We have made the <code>jobs</code> channel large enough to accept (an average of) two jobs per worker goroutine and made the results channel big enough to accept plenty of results without blocking the workers. The <code>done</code> channelâ€™s bufferâ€™s size is the same as the number of workers since, as we will see, each worker sends to that channel exactly once.</p>

<p>To execute code in a separate goroutine we use the <code>go</code> keyword. This keyword must be followed by a function call (which could be a call on a function literal â€“ which is also a closure). The go statement completes â€˜immediatelyâ€™ and the called function is executed in a newly created goroutine. When the function finishes the Go runtime system automatically gets rid of its goroutine and reclaims the memory it used.</p>

<p>Here, the <code>main()</code> function executes the <code>addJobs()</code> function in its own separate goroutine, so execution continues immediately to the <code>for</code> loop. In the <code>for</code> loop six separate goroutines are created, each one executing an instance of the <code>doJobs()</code> function. All the newly created goroutines share the <em>same</em> <code>jobs</code> channel and the <em>same</em> <code>results</code> channel. The <code>for</code> loop completes as soon as the goroutines have been created and started and then another function is called, <code>wait()</code>, again in its own separate goroutine. And finally, we call the <code>process()</code> function in the current (<code>main</code>) goroutine.</p>

<p>Figure 1 schematically illustrates the relationships between the programâ€™s goroutines and channels.</p>

<table class="sidebartable">
	<tr>
		<td><img style="max-width:80%" src="http://accu.org/content/images/journals/ol106/Summerfield/Summerfield-01.png" /></td>
	</tr>
	<tr>
		<td class="title">Figure 1</td>
	</tr>
</table>

<p>Once the <code>main</code> goroutine has finished, the program will terminate â€“ even if there are other goroutines still executing. So, we must ensure that all the other goroutines finish their work before we leave <code>main()</code>.</p>

<p>The <code>addJobs()</code> function is used to populate the <code>jobs</code> channel and is shown in Listing 3, but with the code for reading in the URLs elided.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
func addJobs(jobs chan Job) {
  reader := bufio.NewReader(os.Stdin)
  for {
    ... // Read in a URL
    url = strings.TrimSpace(url)
    if len(url) &gt; 0 {
      jobs &lt;- Job{url}
    }
  }
  close(jobs)
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 3</td>
	</tr>
</table>

<p>Each job simply consists of a URL to check. URLs are read from <code>os.Stdin</code> (e.g., by using redirection on the command line). At each iteration we read both a line and an <code>error</code> value; if the error is <code>io.EOF</code> we have finished and break out of the <code>for</code> loop. (All of this has been elided.)</p>

<p>Once all the jobs have been added the <code>jobs</code> channel is closed to signify that there are no more jobs to be added. Sending to a channel is done using the syntax <code>channel &lt;- item</code>. Items can be received from a channel that is non-empty, even if it is closed, so no jobs will be lost. When the <code>addJobs()</code> function has finished the Go runtime system will take care of removing the goroutine in which it ran and reclaiming its memory.</p>

<p>The <code>doJobs()</code> function is shown in Listing 4. It is simple because it passes all its work on to a method of the <code>Job</code> type (not shown). The <code>Job.Do()</code> method sends one result of type <code>Result</code> to the <code>results</code> channel using the statement <code>results &lt;- result</code>.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
func doJobs(jobs chan Job,
  results chan Result, done chan bool) {
  for job := range jobs {
      job.Do(results)
  }
  done &lt;- true
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 4</td>
	</tr>
</table>

<p>Goâ€™s <code>for ... range</code> statement can iterate over maps (data dictionaries like C++11â€™s <code>unordered_map</code>), slices (in effect, variable length arrays), and channels. If the channel has an item it is received and assigned to the <code>for</code> loopâ€™s variable (here, <code>job</code>); if the channel has no item but isnâ€™t closed the loop <em>blocks</em>. Of course, this does not hold up the rest of the program, only the goroutine in which the loop is executing is blocked. The loop terminates when the channel is empty and closed.</p>

<p>Once all the jobs are done the function sends a <code>bool</code> to the <code>done</code> channel. Whether <code>true</code> or <code>false</code> is sent doesnâ€™t matter, since the <code>done</code> channel is used purely to keep the program alive until all the jobs are done.</p>

<p>The <code>headcheck</code> program has one goroutine adding jobs to the <code>jobs</code> channel and six goroutines reading and processing jobs from the same channel, all of them working concurrently. Yet, we donâ€™t have to worry about locking â€“ Go handles all the synchronization for us.</p>

<p>Listing 5 shows the <code>wait()</code> function which was executed by <code>main()</code> in its own goroutine. This function has a regular <code>for</code> loop that iterates for as many worker goroutines as were created, and at each iteration it does a <em>blocking</em> receive using the syntax <code>item &lt;- channel</code>. Notice that it doesnâ€™t matter whether <code>true</code> or <code>false</code> was sent â€“ we only care that <em>something</em> was sent â€“ since we discard the channelâ€™s items. Once all the workers have sent to the <code>done</code> channel we know that there can be no more results added to the <code>results</code> channel, so we close that channel.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
func wait(results chan Result,
  done chan bool) {
  for i := 0; i &lt; nWorkers; i++ {
    &lt;-done
  }
  close(results)
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 5</td>
	</tr>
</table>

<p>Listing 6 shows the <code>process()</code> function which is executed in the <code>main</code> goroutine. This function iterates over the <code>results</code> channel and blocks if no result is available. The <code>for</code> loop terminates when the <code>results</code> channels is empty and closed, which will only happen when the <code>wait()</code> function finishes. This ensures that this function blocks the <code>main </code>goroutine until every result has been received and output.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
func process(results chan result) {
  for result := range results {
    result.Report(os.Stdout)
  }
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 6</td>
	</tr>
</table>

<p>We could replace the <code>wait()</code> and <code>process()</code> functions with a single <code>waitAndProcess()</code> function executed in the main goroutine, as Listing 7 illustrates.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
func waitAndProcess(results &lt;-chan Result,
  done &lt;-chan struct{}) {
  for working := nWorkers; working &gt; 0; {
    select { // Blocking
    case result := &lt;-results:
      result.Report(os.Stdout)
    case &lt;-done:
      working--
    }
  }
  for {
    select { // Non-blocking
    case result := &lt;-results:
      result.Report(os.Stdout)
    default:
      return
    }
  }
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 7</td>
	</tr>
</table>

<p>This function begins with a <code>while</code> loop that iterates so long as there is at least one worker still working. The <code>select</code> statement is structurally like a <code>switch</code> statement, only it works in terms of channel communications. A <code>select</code> with no <code>default</code> case is <em>blocking</em>. So, here, the first <code>select</code> blocks until it receives either a result or an empty <code>struct</code>.</p>

<p>Since we donâ€™t care whatâ€™s sent to the <code>done</code> channel, only whether somethingâ€™s sent, we have defined the channel to be of type <code>chan struct{}</code>. This channelâ€™s value type specifies a <code>struct</code> with no fields; there is only one possible value of such a type and this is specified using <code>struct{}{}</code> which means create a (zero) value of type <code>struct{}</code>. Since such values have no data they are more expressive of our semantics than sending <code>bool</code>s whose value we would then ignore.</p>

<p>After each receive the <code>select</code> is broken out of and the loop condition is checked. This causes the <code>main</code> goroutine to block until all the workers have finished sending their results (because they only send to the <code>done</code> channel after they have finished all their jobs).</p>

<p>It is quite possible that after all the worker goroutines have finished there are still unprocessed results in the <code>results</code> channel (after all, we buffered the channel when we created it). So we execute a second <code>for</code> loop (an infinite loop) that uses a non-blocking <code>select</code>. So long as there are results in the <code>results</code> channel the <code>select</code> will receive each one and finish, and the <code>for</code> loop will iterate again. But once the <code>results</code> channel is empty and closed the <code>default</code> case will be executed, and the function returned from. At this point all the results have been output and the program will terminate.</p>

<h2>A pipelining approach </h2>

<p>Goroutines and channels are very flexible and versatile, to the extent that we can take some quite different approaches to concurrency. Listing 8 illustrates an alternative <code>headcheck</code> implementationâ€™s <code>main()</code> function.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
func main() {
  results := make(chan Result, bufferSize)
  go sink(processImages(processHTML(
    source(results))))
  for result := range results {
    result.Report(os.Stdout)
  }
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 8</td>
	</tr>
</table>

<p>Unlike the previous versions, this <code>headcheck</code> program only reports on URLs for HTML files and images, and ignores anything else. We could always add another pipeline component, say, <code>processRemainder()</code>, if we didnâ€™t want to ignore any URLs.</p>

<p>Figure 2 schematically illustrates the relationships between the programâ€™s goroutines and channels.</p>

<table class="sidebartable">
	<tr>
		<td><img style="max-width:80%" src="http://accu.org/content/images/journals/ol106/Summerfield/Summerfield-02.png" /></td>
	</tr>
	<tr>
		<td class="title">Figure 2</td>
	</tr>
</table>

<p>The function begins by creating a buffered <code>results</code> channel and then it executes a pipeline in a separate goroutine. (And as we will see, each component of the pipeline itself executes in a separate goroutine.)  Control immediately passes to the <code>for ... range</code> loop which blocks waiting for results and finishes when the <code>results</code> channel is empty and closed. (The <code>Result</code> type was shown in Listing 1.) The <code>source()</code> function shown in Listing 9 is where the pipeline begins.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
func source(results chan Result) (
  &lt;-chan string, chan Result) {
  out := make(chan string, bufferSize)
  go func() {
    reader := bufio.NewReader(os.Stdin)
    for {
      ... // Read in a URL
      out &lt;- url
    }
    close(out)
  }()
  return out, results
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 9</td>
	</tr>
</table>

<p>A Go functionâ€™s return value is specified after the closing parenthesis that follows the functionâ€™s arguments. Multiple values can be returned simply by enclosing them in parentheses. Here we return a receive-only channel and a sendâ€“receive channel.</p>

<p>The <code>source()</code> function is passed the <code>results</code> channel, which it simply returns to its caller. All the pipeline functions share the same <code>results</code> channel. The function starts by creating a buffered output channel which is initially bi-directional. It then creates a goroutine to populate the <code>out</code> channel with jobs (in this case URL strings), after which the goroutine closes the channel. By closing the channel, future receivers (e.g., using a <code>for ... range</code> loop) will know when to finish. The <code>go func() { ... }()</code> creates a function literal (which is a closure) and executes it in a separate goroutine. So processing immediately continues to the <code>source()</code> functionâ€™s last statement which simply returns the <code>out</code> channel as a receive-only channel thanks to the way the functionâ€™s return value is declared, as well as the <code>results</code> channel.</p>

<p>The <code>processHTML()</code> function shown in Listing 10 has the same structure as all the pipeline component functions except for the <code>source()</code> function. It is passed two channels, a receive-only jobs channel (of type <code>chan string</code>) which it calls <code>in</code> (and which was the previous pipeline componentâ€™s <code>out</code> channel), and the shared <code>results</code> channel. The function creates a new buffered bi-directional <code>out</code> channel with the same buffer size as the capacity of the <code>in</code> channel the function has been passed. It then creates a goroutine which executes a new function literal. The goroutine reads jobs from the <code>in</code> channel. This channel is closed by the previous pipeline component when it has been fully populated, so this goroutineâ€™s <code>for ... range</code> loop is guaranteed to terminate. For each job (URL) received, for those that this function can process it performs its processing (in this case a call to a <code>Check()</code> function), and sends the result to the <code>results</code> channel. And those jobs the function isnâ€™t concerned with are simply sent to the (new) <code>out</code> channel. At the end the function returns its <code>out</code> channel and the shared <code>results</code> channel.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
func processHTML(in &lt;-chan string,
  results chan Result) (&lt;-chan string,
  chan Result) {
  out := make(chan string, cap(in))
  go func() {
    for url := range in {
      suffix := strings.ToLower(
        filepath.Ext(url))
      if suffix == &quot;.htm&quot; ||
        suffix == &quot;.html&quot; {
        results &lt;- Check(url)
      } else {
        out &lt;- url
      }
    }
    close(out)
  }()
  return out, results
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 10</td>
	</tr>
</table>

<p>The <code>processImages()</code> function has the same signature and uses the same logic.</p>
<p>The <code>sink()</code> function takes a receive-only <code>jobs</code> channel and a receive-only <code>results</code> channel. It is shown in Listing 11.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
func sink(in &lt;-chan string,
  results chan Result) {
  for _ = range in {
    // Drain unprocessed URLs
  }
  close(results)
}
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 11</td>
	</tr>
</table>

<p>The <code>sink()</code> function is the last one in the pipeline. It iterates over the penultimate pipeline componentâ€™s jobs channel, draining it until it is empty. (It might be empty in the first place if every job has been done by one or other pipeline component.)</p>

<p>At the end the function closes the <code>results</code> channel. Closing the <code>results</code> channel is essential to ensure that the <code>for ... range</code> loop in <code>main() </code>terminates. But we must be careful not to close the channel when one or more goroutines might still want to send to it. Looking back at the implementations of the <code>source()</code> and <code>processHTML()</code> functions we can see that each creates its own <code>out</code> channel which it ultimately closes when it has finished processing. The last of these <code>out</code> channels is passed to the <code>sink()</code> function as its <code>in</code> channel â€“ and this channel isnâ€™t closed until all the previous pipeline components have finished reading their <code>in</code> channels and closed their <code>out</code> channels. In view of this we know that once the <code>for ... range</code> loop in the <code>sink()</code> function has finished, all of the pipelineâ€™s preceding components have finished processing and no more results could be sent to the <code>results</code> channel, and hence the channel is safe to close.</p>

<p>The pipeline-based <code>headcheck</code> program ends up with five goroutines: one executing <code>sink()</code>, one started by <code>source()</code>, one started by <code>processHTML()</code>, one started by <code>processImages()</code>, and the main goroutine that <code>main()</code> executes in. All of these goroutines operate concurrently, passing type-safe values via channels, only terminating when their work is done, and with no explicit locks.</p>

<p>Thinking in terms of goroutines and channels is very different from thinking in terms of threads and locks, and may take some getting used to!  Also, keep in mind that in this article we have seen just <em>some</em> of the possible ways of using goroutines and channels â€“ many other approaches are possible. Go is a fascinating language, not just for its innovative approach to concurrency, but also for its clean syntax and very different approach to object orientation, and is well worth getting to know.</p>


<h2>Further information </h2>

<p>The Go Programming Language: <a href="http://golang.org/">http://golang.org/</a></p>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
