    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: Python Streams vs Unix Pipes</title>
        <link>https://members.accu.org/index.php/journals/2319</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>


        <h2>Journal Articles</h2>


<div class="xar-mod-head"><span class="xar-mod-title">Overload Journal #136 - December 2016 + Programming Topics</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c78/">Overload</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c368/">o136</a>
                    (9)
<br />

                                            <a href="https://members.accu.org/index.php/journals/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/journals/c65/">Programming</a>
                    (877)
<br />

                                            <a href="https://members.accu.org/index.php/journals/c368-65/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/journals/c368+65/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;Python Streams vs Unix Pipes</h1>
<p><strong>Author:</strong>&nbsp;Martin Moene</p>
<p>
<strong>Date:</strong> 04 December 2016 20:44:20 +00:00 or Sun, 04 December 2016 20:44:20 +00:00</p>
<p><strong>Summary:</strong>&nbsp;Dealing with an infinite sequence requires some thought. Thomas Guest presents various ways to approach such a problem.</p>
<p><strong>Body:</strong>&nbsp;<p>I chanced upon an interesting puzzle:</p>

<p class="blockquote">Find the smallest number that can be expressed as the sum of 5, 17, 563 and 641 consecutive prime numbers, and is itself a prime number.</p>

<p>Here, the prime numbers are an infinite steam:</p>

<p class="blockquote">2, 3, 5, 7, 11, 13 ...</p>

<p>and sums of <em>N</em> consecutive primes are similarly infinite. For example, the sum of 2 consecutive primes would be the stream:</p>

<p class="blockquote">2+3, 3+5, 5+7, 7+11, 11+13 ...</p>

<p>which is:</p>

<p class="blockquote">5, 8, 12, 18, 24 ...</p>

<p>and the sum of 3 consecutive primes is:</p>

<p class="blockquote">10 (=2+3+5), 15, 23, 31 ...</p>

<p>Had we been asked to find the smallest number which can be expressed as the sum of 3 consecutive primes <em>and</em> as the sum of 5 consecutive primes <em>and</em> is itself prime, the answer would be 83.</p>

<p class="blockquote">&gt;&gt;&gt; 23 + 29 + 31</p>
<p class="blockquote">83</p>
<p class="blockquote">&gt;&gt;&gt; 11 + 13 + 17 + 19 + 23</p>
<p class="blockquote">83</p>

<h2>Infinite series and Python</h2>

<p>My first thought was to tackle this puzzle using Python iterators and generators. Hereâ€™s the outline of a strategy:</p>

<ul>
	<li>starting with a stream of primes</li>
	<li>tee the stream to create 4 additional copies</li>
	<li>transform these copies into the consecutive sums of 5, 17, 563 and 641 primes</li>
	<li>merge these consecutive sums back with the original primes stream</li>
	<li>group the elements of this merged stream by value</li>
	<li>the first group which contains 5 elements must have occurred in every source, and is therefore a prime and representable as the consecutive sum of 5, 17, 563 and 641 primes</li>
	<li>which solves the puzzle!</li>
</ul>

<p>Note that when we copy an infinite stream we cannot consume it first. We will have to be lazy or weâ€™ll get exhausted.</p>

<p>Courtesy of the <em>Python Cookbook</em>, I already had a couple of useful recipes to help implement this strategy (see Listing 1).</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
def primes():
  '''Generate the sequence of prime numbers: 2,
     3, 5 ... '''
  ....
def stream_merge(*ss):
  '''Merge a collection of sorted streams.
  Example: merge multiples of 2, 3, 5
  &gt;&gt;&gt; from itertools import count, islice
  &gt;&gt;&gt; def multiples(x): 
    return (x * n for n in count(1))
  &gt;&gt;&gt; s = stream_merge(multiples(2),
    multiples(3), multiples(5))
  &gt;&gt;&gt; list(islice(s, 10))
  [2, 3, 4, 5, 6, 6, 8, 9, 10, 10]
  '''
  ....
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 1</td>
	</tr>
</table>

<p>Both these functions merit a closer look for the cunning use they make of standard containers, but weâ€™ll defer this inspection until later. In passing, note that <code>stream_merge()</code>â€™s docstring suggests we might try using it as basis for <code>primes()</code>:</p>

<ol>
	<li>form the series of composite (non-prime) numbers by merging the streams formed by multiples of prime numbers;</li>
	<li>the primes remain when you remove these composites from the series of natural numbers.</li>
</ol>

<p>This scheme is hardly original â€“ itâ€™s a variant of Eratosthenesâ€™ sieve â€“ but if you look carefully youâ€™ll notice the self-reference. Unfortunately recursive definitions of infinite series donâ€™t work well with Python<a href="#FN01"><sup>1</sup></a>, hence <code>primes()</code> requires a little more finesse. Weâ€™ll take a look at it later.</p>

<p>Moving on, to solve the original puzzle we need a consecutive sum filter. Listing 2 will transform a stream of numbers into a stream of consecutive sums of these numbers.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
def consecutive_sum(s, n):
    '''Generate the series of sums of n
       consecutive elements of s
    Example: 0, 1, 2, 3, 4 ... =&gt; 0+1, 1+2,
       2+3, 3+4, ...
    &gt;&gt;&gt; from itertools import count, islice
    &gt;&gt;&gt; list(islice(consecutive_sum(count(), 2),
        10))
    [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
    '''
    lo, hi = itertools.tee(s)
    csum = sum(next(hi) for _ in range(n))
    while True:
        yield csum
        csum += next(hi) - next(lo)
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 2</td>
	</tr>
</table>

<p>Here we can think of the summed elements as lying within a sliding window: each time we slide the window an element gets added to the top and an element gets removed from the bottom, and we adjust <code>csum</code> accordingly.</p>

<p>So, now we have:</p>

<ul>
	<li>the series of prime numbers, <code>primes()</code></li>
	
	<li>a <code>stream_merge()</code> connector</li>
	
	<li>a <code>consecutive_sum()</code> filter</li>
</ul>

<p>The remaining stream adaptors come from the standard itertools module. Note that the <code>stream_merge()</code> works here since all the consecutive sum series are strictly increasing. Note also that the stream of prime numbers can be treated as <code>consecutive_sum(s=primes(), n=1)</code>, handling the â€˜and is itself a prime numberâ€™ requirement. (See Listing 3.)</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
&gt;&gt;&gt; lens = 1, 5, 17, 563, 641
&gt;&gt;&gt; N = len(lens)
&gt;&gt;&gt; from itertools import tee, groupby
&gt;&gt;&gt; ps = tee(primes(), N)
&gt;&gt;&gt; csums = [consecutive_sum(p, n) for p, n in zip(ps, lens)]
&gt;&gt;&gt; solns = (n for n, g in groupby(stream_merge(*csums)) 
            if len(list(g)) == N)
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 3</td>
	</tr>
</table>

<p>Here, <code>solns</code> is yet another stream, the result of merging the <em>N</em> input consecutive sum streams then filtering out the numbers which appear <em>N</em> times; that is, the numbers which can be expressed as sums of 1, 5, 17, 563 and 641 consecutive primes.</p>

<p>The first such number solves the original puzzle.</p>

<pre class="programlisting">
  &gt;&gt;&gt; next(solns)
  7002221</pre>

<p>Figure 1 is a picture of how these stream tools link up to solve this particular puzzle. The great thing is that we can reconnect these same tools to solve a wide range of puzzles, and indeed more practical processing tasks. To use the common analogy, we direct data streams along pipes.</p>

<table class="sidebartable">
	<tr>
		<td><img src="http://accu.org/content/images/journals/ol136/Guest/Guest-01.png" /></td>
	</tr>
	<tr>
		<td class="title">Figure 1</td>
	</tr>
</table>

<h2>Infinite series in other languages</h2>

<p>Python is the language I find most convenient most of the time, which explains why I reached for it first. Itâ€™s an increasingly popular language, which helps explain why I didnâ€™t need to write the tricky parts of my solution from scratch: theyâ€™d already been done. Python is also a language which makes compromises. Having used Python to find a solution to the puzzle I wondered if there wasnâ€™t some other language better suited to this kind of problem.</p>

<p>Haskell makes no compromises when it comes to functional programming. Its lazy evaluation and inductive recursion make it a perfect fit for this kind of puzzle â€“ but my approach of teeing, filtering and merging made me consider the Unix Shell. Now, I use Bash every day and page through its manual at least once a week. Scripting appeals and Iâ€™m comfortable at the command line. How hard could it be to solve this puzzle using Bash? After all, I already knew the answer!</p>

<h2>Partial sums</h2>

<p>Hereâ€™s a simple shell function to generate partial sums. Iâ€™ve used awk, a little language I gave up on a long time ago in favour of more rounded scripting languages like Perl and then Python. Now I look at it again, it seems to fill a useful gap. Awk processes a file sequentially, applying pattern-action rules to each line, a processing template which Iâ€™ve reinvented less cleanly many times. Despite my rediscovery of awk, Iâ€™ll be keeping its use strongly in check in what follows.</p>

<pre class="programlisting">
  $ psum() { awk '{ print s += $1 }'; }</pre>

<p>Much like Perl, awk guesses what you want to do. Here, it conjures the summation variable, <code>s</code>, into existence, assigning it a default initial value of 0. (Good guess!) Since weâ€™re doing arithmetic awk converts the first field of each input line into a number. We can test <code>psum</code> by using <code>jot</code> to generate the sequence 1, 2, 3, 4, 5 (this is on a Mac â€“ on a Linux platform use <code>seq</code> instead of <code>jot</code>).</p>

<pre class="programlisting">
  $ jot 5 | psum
  1
  3
  6
  10
  15</pre>

<h2>Consecutive sums</h2>

<p>You may be wondering why weâ€™ve bothered creating this partial sum filter since itâ€™s the sums of consecutive elements weâ€™re after, rather than the sum of the series so far. Well, notice that if <code>P[i]</code> and <code>P[i+n]</code> are two elements from the series of partial sums of S, then their difference, <code>P[i+n] - P[i]</code>, is the sum of the <code>n</code> consecutive elements from S.</p>

<p>So to form an n-element consecutive sum series we can <code>tee</code> the partial sums streams, advance one of these by n, then zip through them in parallel finding their differences. An example makes things clear â€“ see Listing 4.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
$ mkfifo pipe
$ jot 5 | psum | tee pipe | tail -n +2 | paste - pipe
3       1
6       3
10      6
15      10
        15
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 4</td>
	</tr>
</table>

<p>Here, <code>jot 5</code> generates the sequence 1, 2, 3, 4, 5, which <code>psum</code> progressively accumulates to 1, 3, 6, 10, 15. We then <code>tee</code> this partial sum series through two pipes: the first, <code>pipe</code>, is an explicitly created named pipe created by <code>mkfifo</code>, the second is implicitly created by the pipeline operator, <code>|</code>. The remainder of the command line delays one series by one (note that <code>tail</code> numbers lines from 1, not 0, so tail -n +1 is the identity filter) then pastes the two series back together.<a href="#FN02"><sup>2</sup></a></p>

<p>By appending a single <code>awk</code> action to the pipeline we get a consecutive sum series (see Listing 5).</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
$ jot 5 | psum | tee pipe | tail -n +2 | paste - pipe | awk '{print $1 - $2}'
  2
  3
  4
  5
  15
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 5</td>
	</tr>
</table>

<p>The output 2, 3, 4, 5 is the series of consecutive sums of length 1 taken from the original series 1, 2, 3, 4, 5. The trailing 15 and the 1 missed from the start are edge case problems, and easily corrected.</p>

<p>Accumulating an increasing series of numbers in order to find the differences between elements lying a given distance apart on this series isnâ€™t a very smart idea on a computer with a fixed word-size, but itâ€™s good to know (e.g.) that awk doesnâ€™t stop counting at 32 bits. (See Listing 6.)</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
$ let &quot;N=1&lt;&lt;32&quot; &amp;&amp; echo $N | tee &gt;(awk '{print $1 * $1}')
  4294967296
  18446744073709551616
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 6</td>
	</tr>
</table>

<p>Exactly if and when awk stops counting, Iâ€™m not sure. The documentation doesnâ€™t say and I havenâ€™t looked at the source code.</p>

<h2>Bug fixes</h2>

<p>Letâ€™s capture these tiny functions and name them. In Listing 7, then, are revised <code>psum()</code> and <code>sdiff()</code> filters. The edge case problems should now be fixed.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
$ psum()  { awk 'BEGIN { print 0 }{print s += $1 }'; }
$ delay() { let &quot;n = $1 + 1&quot; &amp;&amp; tail +$n; } 
$ sdiff() { mkfifo p.$1 &amp;&amp; tee p.$1 | delay $1 | paste - p.$1 | \
            awk 'NF == 2 {print $1 - $2 }'; }
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 7</td>
	</tr>
</table>

<p>A quick test:</p>

<pre class="programlisting">
  $ jot 5 | psum | sdiff 3
  6
  9
  12</pre>

<p>The output is, as expected, the series of sums of consecutive triples taken from 1, 2, 3, 4, 5 (6=1+2+3, 9=2+3+4, 12=3+4+5).</p>

<p>Thereâ€™s a pernicious bug, though. These functions canâ€™t handle infinite series so they are of limited use as pipeline tools. For example, if we stream in the series 0, 1, 2, â€¦ (generated here as the partial sums of the series 1, 1, 1, â€¦) nothing gets output and we have to interrupt the process.</p>

<pre class="programlisting">
  # This command appears
  # to hang
  $ yes 1 | psum | sdiff 1
  ^C</pre>
  
<p>To work around this is, we can use Gnu <code>stdbuf</code> to prohibit <code>tail</code> and <code>paste</code> from using output buffers (see Listing 8).</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
$ psum()  { awk 'BEGIN { print 0 }{print s += $1 }'; }
$ delay() { let &quot;n = $1 + 1&quot; &amp;&amp; stdbuf -o 0 tail +$n; } 
$ sdiff() { mkfifo p.$1 &amp;&amp; tee p.$1 | delay $1 | \
            stdbuf -o 0 paste - p.$1 | \
            awk 'NF == 2 {print $1 - $2 }'; }
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 8</td>
	</tr>
</table>

<p>Now the data flows again:</p>

<pre class="programlisting">
  # Accumulate the stream 1 1 1 ... and print the
  # difference between successive elements
  $ yes 1 | psum | sdiff 1
  1
  1
  1
  1
  ^C</pre>

<h2>Merging streams</h2>

<p>The Unix shell merges streams rather more succinctly than Python. <code>Sort -m</code> does the job directly. Note that a standard <code>sort</code> cannot yield any output until all its inputs are exhausted, since the final input item might turn out to be the one which should appear first in the output. Merge sort, <code>sort -m</code>, can and does produce output without delay.<a href="#FN03"><sup>3</sup></a></p>

<pre class="programlisting">
  $ yes | sort
  ^C
  $ yes | sort -m
  y
  y
  y
  y
  y
  ^C</pre>

<h2>Generating primes</h2>

<p>No doubt itâ€™s possible to generate the infinite series of prime numbers using native Bash code, but I chose to reuse the <em>Python Cookbook</em> recipe (Listing 9) for the job.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
primes
#!/usr/bin/env python
import itertools

def primes():
  '''Generate the prime number series: 2, 3, 5 
  ... '''
  D = {}
  for n in itertools.count(2):
    p = D.pop(n, None)
    if p is None:
      yield n
      D[n * n] = n
    else:
      x = n + p
      while x in D:
        x += p
      D[x] = p

for p in primes():
  print(p)
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 9</td>
	</tr>
</table>

<p>This is a subtle little program which makes clever use of Pythonâ€™s native hashed array container, the dictionary. In this case dictionary values are the primes less than n and the keys are composite multiples of these primes. The loop invariant, roughly speaking, is that the dictionary values are the primes less than n, and the corresponding keys are the lowest multiples of these primes greater than or equal to n. Itâ€™s a lazy, recursion-free take of Eratosthenesâ€™ sieve.</p>
<p>For the purposes of this article the important things about this program are:</p>

<ul>
	<li>it generates an infinite series of numbers to standard output<a href="#FN04"><sup>4</sup></a>, making it a good source for a shell pipeline</li>
	
	<li>by making it executable and adding the usual shebang incantation, we can invoke this Python program seamlessly from the shell.</li>
</ul>

<h2>Pipe connection</h2>

<p>Recall the original puzzle:</p>

<p class="blockquote">Find the smallest number that can be expressed as the sum of 5, 17, 563 and 641 consecutive prime numbers, and is itself a prime number.</p>

<p>First, letâ€™s check the connections by solving a simpler problem which we can manually verify: to find prime numbers which are also the sum of 2 consecutive primes. As we noted before, this is the same as finding primes numbers which are the consecutive sums of 1 and 2 primes.</p>

<p>In one shell window we create a couple of named pipes, <code>c.1</code> and <code>c.2</code>, which weâ€™ll use to stream the consecutive sum series of 1 and 2 primes respectively. The results series comprises the duplicates when we merge these pipes.</p>

<pre class="programlisting">
  $ mkfifo c.{1,2}
  $ sort -mn c.{1,2} | uniq -d</pre>

<p>In another shell window, stream data into <code>c.1</code> and <code>c.2</code>:</p>

<pre class="programlisting">
  $ for i in 1 2; do (primes | psum | sdiff $i &gt;
  c.$i) &amp; done</pre>
  
<p>In the first window we see the single number 5, which is the first and only prime number equal to the sum of two consecutive primes.</p>
<p>Prime numbers equal to the sum of three consecutive primes are more interesting. In each shell window recall the previous commands and switch the 2s to 3s (a simple command history recall and edit, ^2^3^, does the trick). The merged output now looks like this:</p>

<pre class="programlisting">
  $ sort -mn c.1 c.3 | uniq -d
  23
  31
  41
  ...</pre>
  
<p>We can check the first few values:</p>

<pre class="programlisting">
  23 = 5 + 7 + 11
  31 = 7 + 11 + 13
  41 = 11 + 13 + 17</pre>
  
<p>At this point weâ€™re confident enough to give the actual puzzle a try. Start up the solutions stream.</p>

<pre class="programlisting">
$ mkfifo c.{1,5,17,563,641}
$ sort -mn c.{1,5,17,563,641} | uniq -c | grep &quot;5 &quot;</pre>

<p>Here, we use a standard shell script set intersection recipe: <code>uniq -c</code> groups and counts repeated elements, and the <code>grep</code> pattern matches those numbers common to all five input streams.</p>

<p>Now we can kick off the processes which will feed into the consecutive sum streams, which sort is waiting on.</p>

<pre class="programlisting">
  $ for i in 1 5 17 563 641; do (primes | psum | 
  sdiff $i &gt; c.$i) &amp; done</pre>
  
<p>Sure enough, after about 15 seconds elapsed time, out pops the result:</p>

<pre class="programlisting">
$ sort -mn c.{1,5,17,563,641} | uniq -c | grep &quot;5 &quot;
  5 7002221</pre>
  
<p>15 seconds seems an eternity for arithmetic on a modern computer (you could start up a word processor in less time!), and you might be inclined to blame the overhead of all those processes, files, large numbers, etc. In fact it took around 6 seconds for the Python program simply to generate prime numbers up to 7002221, and my pure Python solution ran in 9 seconds.</p>

<table class="sidebartable">
	<tr>
		<td>
			<p>I used a MacBook laptop used to run these scripts.</p>
			<table class="journaltable">
				<tr>
					<td>Model Name:</td>
					<td>MacBook</td>
				</tr>
				<tr>
					<td>Model Identifier:</td>
					<td>MacBook1,1</td>
				</tr>
				<tr>
					<td>Processor Name:</td>
					<td>Intel Core Duo</td>
				</tr>
				<tr>
					<td>Processor Speed:</td>
					<td>2 GHz</td>
				</tr>
				<tr>
					<td>Number Of Processors:</td>
					<td>1</td>
				</tr>
				<tr>
					<td>Total Number Of Cores:</td>
					<td>2</td>
				</tr>
				<tr>
					<td>L2 Cache (per processor):</td>
					<td>2 MB</td>
				</tr>
				<tr>
					<td>Memory:</td>
					<td>2 GB</td>
				</tr>
				<tr>
					<td>Bus Speed:</td>
					<td>667 MHz</td>
				</tr>
			</table>
		</td>
	</tr>
</table>

<h2>Portability</h2>

<p>One of the most convenient things about Python is its portability. I donâ€™t mean â€˜portable so long as you conform to the language standardâ€™ or â€˜portable if you stick to a subset of the languageâ€™. I mean that a Python program works whatever platform I use without me having to worry about it.</p>

<p>Non-portability put me off the Unix shell when I first encountered it: there seemed too many details, too many platform differences â€“ which shell are you using? which extensions? which implementation of the core utilities, etc, etc? Readily available and well-written documentation didnâ€™t help much here: generally I want the shell to just do what I mean, which is why I switched so happily to Perl when I discovered it.</p>

<p>Since then this situation has, for me, improved in many ways. Non-Unix platforms are declining as are the different flavours of Unix. Bash seems to have become the standard shell of choice and Cygwin gets better all the time. GNU coreutils predominate. As a consequence Iâ€™ve forgotten almost all the Perl I ever knew and am actively rediscovering the Unix shell.</p>

<p>Writing this article, though, I was reminded of the platform dependent behaviour which used to discourage me. On a Linux platform close to hand I had to use <code>seq</code> instead of <code>jot</code>, and <code>awk</code> formatted large integers in a scientific form with a loss of precision.</p>

<pre class="programlisting">
  $ echo '10000000001 0' | awk '{print $1 - $2}'
  1e+10</pre>
  
<p>On OS X the same command outputs 10000000001. I couldnâ€™t tell you which implementation is more correct. The fix is to explicitly format these numbers as decimal integers, but the danger is that the shell silently swallows these discrepancies and youâ€™ve got a portability problem you donâ€™t even notice.</p>

<pre class="programlisting">
  $ echo '10000000001 0' | awk '{printf &quot;%d\n&quot;, 
  $1 - $2}'
  10000000001</pre>

<h2>Stream merge</h2>

<p>I mentioned <code>stream_merge()</code> at the start of this article, a general purpose function written by Raymond Hettinger which I originally found in the Python Cookbook. As with the prime number generator, you might imagine the merge algorithm to be recursively defined:</p>

<ol>
	<li>to merge a pair of streams, take items from the first which are less than the head of the second, then swap;</li>
	
	<li>to merge N streams, merge the first stream with the merged (N-1) rest.</li>
</ol>

<p>Again the Python solution does it differently, this time using a heap as a priority queue of items from the input streams. Itâ€™s ingenious and efficient. Look how easy it is in Python (Listing 10) to shunt functions and pairs in and out of queues.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
from heapq import heapify, heappop, heapreplace

def stream_merge(*ss):
  '''Merge a collection of sorted streams.'''
  pqueue = []
  for i in map(iter, ss):
    try:
      pqueue.append((i.next(), i.next))
    except StopIteration:
      pass
  heapify(pqueue)
  while pqueue:
    val, it = pqueue[0]
    yield val
    try:
      heapreplace(pqueue, (it(), it))
    except StopIteration:
      heappop(pqueue)
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 10</td>
	</tr>
</table>

<p>A more sophisticated version of this code has made it into the Python standard library, where it goes by the name of <code>heapq.merge</code> (I wonder why it wasnâ€™t filed in <code>itertools</code>?)</p>

<h2>Alternative solutions</h2>

<p>As usual Haskell wins the elegance award, so Iâ€™ll quote in full a solution (Listing 11) built without resorting to cookbookery which produces the (correct!) answer in 20 seconds.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
module Main where

import List

isPrime x = all (\i -&gt; 0/=x`mod`i) $ takeWhile (\i -&gt; i*i &lt;= x) primes

primes = 2:filter (\x -&gt; isPrime x) [3..]

cplist n = map (sum . take n) (tails primes)

meet (x:xs) (y:ys) | x &lt; y = meet xs (y:ys)
                   | y &lt; x = meet (x:xs) ys
                   | x == y =  x:meet xs ys
main = print $ head $ \
(primes `meet` cplist 5) `meet` (cplist 17 `meet` cplist 563) `meet` cplist 641
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 11</td>
	</tr>
</table>

<p class="footnotes"></p>

<ol>
	<li><a id="FN01"></a>CPython, more precisely â€” I donâ€™t think anything in the Python language itself prohibits tail recursion. Even using CPython, yet another recipe from the online Python Cookbook explores the idea of an @tail_recursion decorator</li>
	
	<li><a id="FN02"></a>Tail is more commonly used to yield a fixed number of lines from the end of the file: by prefixing the line count argument with a + sign, it skips lines from the head of the file. The GNU version of head can similarly be used with a - prefix to skip lines at the tail of a file. The notation is {compact,powerful,subtle,implementation dependent}.</li>

	<li><a id="FN03"></a>Sort -m is a sort which doesnâ€™t really sort â€” its inputs should already be sorted â€” rather like the +n option turning tail on its head.</li>
	
	<li><a id="FN04"></a>The series is infinite in theory only: at time n the number of items in the <code>has_prime_factors</code> dictionary equals the number of primes less than n, and each key in this dictionary is larger than n. So resource use increases steadily as n increases.</li>
</ol>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
