    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: An Introduction to OpenMP</title>
        <link>https://members.accu.org/index.php/articles/2283</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>




<div class="xar-mod-head"><span class="xar-mod-title">Programming Topics + CVu Journal Vol 28, #4 - September 2016</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c65/">Programming</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c77/">CVu</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c365/">284</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/c65-365/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/articles/c65+365/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;An Introduction to OpenMP</h1>
<p><strong>Author:</strong>&nbsp;Martin Moene</p>
<p>
<strong>Date:</strong> 06 September 2016 16:16:39 +01:00 or Tue, 06 September 2016 16:16:39 +01:00</p>
<p><strong>Summary:</strong>&nbsp;Silas S. Brown dabbles in multiprocessing to speed up his calculations.</p>
<p><strong>Body:</strong>&nbsp;<p>If you use a CPU that was manufactured during the last few years, then the chances are it has more than one core, most likely two or four. Multi-core programming can be difficult (I would certainly recommend putting in a little effort to make sure youâ€™re using a fast-enough algorithm on one core first), but it was made easier by GCCâ€™s adoption of the OpenMP (Open Multi-Processing) standard since version 4.2 (2007). If you use a recent version of GCC, you might have OpenMP without knowing it. Try:</p>

<pre class="programlisting">
  gcc my-program.c -fopenmp</pre>
  
<p>and see whether or not it calls it an unknown option. (I do this in a script to decide what compilation options to use on a deployment machine.)</p>

<p>Adding OpenMP directives to a program can be surprisingly simple.  Consider a <code>for</code> loop:</p>

<pre class="programlisting">
  for (int i=0; i &lt; nItems; i++)
    process_item(i);</pre>
	
<p>If <code>process_item</code> looks only at item <code>i</code> and nothing else (no memory conflicts) then all you need to add before the <code>for</code> is:</p>

<pre class="programlisting">
  #pragma omp parallel for</pre>
  
<p>and, by default, the OpenMP library will find out at runtime how many cores are available on the CPU, split into that number of threads, divide <code>nItems</code> by the number of threads, let each thread process its â€˜chunkâ€™ of the items, and wait for them to finish. It will also add code to let the user override the number of threads at runtime by setting an environment variable (<code>OMP_NUM_THREADS</code>). This is all rather powerful just for one <code>#pragma</code>. Of course, if that code is compiled without OpenMP support, the pragma will be ignored and the code will run sequentially. But some compilers warn about unknown pragmas, so to suppress this warning you could wrap the pragma in an <code>ifdef</code>:</p>

<pre class="programlisting">
  #ifdef _OPENMP
  #pragma omp parallel for
  #endif</pre>
  
<p>which you can even extend to let you use macros to control exactly which parts of your program are parallelised:</p>

<pre class="programlisting">
  #define Parallelise_The_XYZ_Loop 1
  ...
  #if defined(_OPENMP) &amp;&amp; Parallelise_The_XYZ_Loop
  #pragma omp parallel for
  #endif</pre>
  
<p>Since the extra work of creating and managing the threads has an overhead, you should only use <code>parallel for</code> if youâ€™re sure the benefits will be worth the overhead. For very short loops, you might actually slow the program down. Always measure to check you are actually getting a speed increase.</p>

<p>By default, the loop counter and any variable you declare inside the loop will be private to that thread, but other variables will be shared, so if you want to change them you had better write a <code>critical</code> section to ensure only one thread at a time can get in:</p>

<pre class="programlisting">
  #pragma omp critical
  update_a_shared_variable();</pre>
  
<p><code>critical</code> is not needed if all youâ€™re doing is writing to an array when the element number you write to is the item number youâ€™re processing, as the other threads wonâ€™t be writing to the same element. But it is often needed in other shared-variable circumstances; you are going to have to think.</p>

<p>One pattern that is often seen in OpenMP programming is to check if a shared variable needs updating, then enter a <code>critical</code> section and repeat the check:</p>

<pre class="programlisting">
  if (shared_variable_needs_updating())
    #pragma omp critical
    if (shared_variable_needs_updating())
      update_a_shared_variable();</pre>
	  
<p>The second check is there in case another thread beats us to it with updating the shared variable. For example, this might be used if the shared variable is â€˜best solution found so farâ€™: just because we found a better solution outside the <code>critical</code> section doesnâ€™t mean nobody else posted an even better one just before we entered it. We could save the extra comparison by entering the <code>critical</code> section unconditionally and THEN making the comparison, but that would be inefficient because it would hold up other threads unnecessarily.</p>

<p>One trick that might be useful during debugging is to add <code>default(none)</code> to the end of the <code>parallel for</code> pragma. That tells OpenMP to refrain from its default behaviour of making variables within the loop private to each thread and other variables shared, and forces you to declare the shared/private status of each variable explicitly. If you havenâ€™t done so, you get some handy error messages pointing out each variable referred to from the parallel section. This can be very useful indeed when retro-fitting OpenMP to existing code and the loop is too large for you to be sure youâ€™ve noticed everything.</p>

<p><code>parallel for</code> can take only <code>normal</code> for loops that count items as they go; trying to be more â€˜cleverâ€™ with the <code>for</code> statement will not work with OpenMP. You may use the <code>continue</code> statement in a <code>parallel for</code>, but not <code>break</code> (unless itâ€™s inside another loop etc thatâ€™s nested inside the parallel one), and not <code>return</code>. This is for obvious reasons: there would be no way for the OpenMP libraries to make sure that <code>break</code> or <code>return</code> stops other iterations of the loop if some other thread is already running away with them.</p>

<p>By default, <code>parallel for</code> assumes that each loop iteration will be roughly equal, and so it splits the number of required iterations evenly among the threads. You could instead add <code>schedule(dynamic)</code> to the pragma to take the alternative approach of sending just one iteration at a time to each thread (so for example if there are four cores, the first four iterations will be started on immediately, and as soon as one of the cores finishes its iteration it will be given the fifth iteration to do), but that tends to work well only if each iteration is quite long; if iterations are short then the overheads of managing the <code>dynamic schedule</code> slow things down. You can however do your own scheduling: instead of using <code>parallel for</code>, just say:</p>

<pre class="programlisting">
  #pragma omp parallel
  some_function_or_block()</pre>
  
<p>which will run <em>N</em> identical copies of <code>some_function_or_block()</code>; these copies will then need to work out amongst themselves which one does which section of work. To help with this, <span class="filename">omp.h</span> defines the functions <code>omp_get_thread_num()</code> and <code>omp_get_num_threads()</code>: the thread number will be between 0 and threads-1 inclusive. Since I like to make sure my programs can still compile even if OpenMP is not present, I do this:</p>

<pre class="programlisting">
  #ifdef _OPENMP
  #include &lt;omp.h&gt;
  #else
  #define omp_get_num_threads() 1
  #define omp_get_thread_num() 0
  #endif</pre>
  
<p>You have to be careful, when dividing your work units by the number of threads, to make sure no work is left out due to the division result being rounded down. If your units are fairly even then itâ€™s probably best just to use OMPâ€™s own <code>parallel for</code> which does all the work for you.</p>

<p>Signals are usually sent to an arbitrary thread, so the best thing to do in a signal handler is probably just to set a flag which all threads regularly check.</p>

<p>OpenMP works in C++ as well, but if you are using a lot of objects then you might need to be even more careful of where you put your <code>critical</code> sections.</p>

<p>Besides GCC, other compilers that support OpenMP include Visual C++ (from its 2005 version onward) and the Intel compiler, but I havenâ€™t tried these. Clang 3.7 supports it, but some older Macs (e.g. OS X 10.7) have both Clang and GCC installed where the GCC supports OpenMP but the Clang does not. OpenMP implementations are generally limited to multicore CPUs with shared memory, as in a modern multicore desktop; more advanced approaches are needed if youâ€™re running on a supercomputer or cluster that does not share its memory between all the cores, or if you want to run your processing on graphics cards (GPUs).</p>

<p>On slightly older Apple computers, thereâ€™s some strange bug that means you canâ€™t call <code>memcpy()</code> from inside a function that uses OpenMP: you have to wrap that <code>memcpy()</code> into another function of your own and call that. But the function you wrap it in can be â€˜inlineâ€™ so you donâ€™t actually lose anything. If you get other problems on Apple, try:</p>

<pre class="programlisting">
  #define _FORTIFY_SOURCE 0</pre>
  
<p>as a workaround.</p>

<p>Finally, if you are cross-compiling for Windows using MingW, you might want to use the <code>-static</code> flag to make sure the <span class="filename">.exe</span> file doesnâ€™t depend on OpenMP and threading DLLs. Windows <span class="filename">.exe</span> files are easier to distribute if they donâ€™t need DLLs.</p>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
