    <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
     <channel>
        <title>ACCU  :: The Path of Least Resistance</title>
        <link>https://members.accu.org/index.php/articles/2749</link>
        <description>Professionalism in Programming</description>
        <dc:language>en-us</dc:language> 
        <dc:creator>Administrator</dc:creator> 
        <admin:generatorAgent rdf:resource="http://www.xaraya.org" /> 
        <admin:errorReportsTo rdf:resource="mailto:webeditor@accu.org" />
       <sy:updatePeriod>hourly</sy:updatePeriod>
       <sy:updateFrequency>1</sy:updateFrequency>
       <docs>http://backend.userland.com/rss</docs>




<div class="xar-mod-head"><span class="xar-mod-title">Design of applications and programs + Overload Journal #155 - February 2020</span></div>

<table border="0" cellpadding="1" cellspacing="0">
    <tbody>
    <tr>
        <td valign="top">
            Browse in :
       </td>
       <td valign="top">

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c13/">Topics</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c67/">Design</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/">All</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c76/">Journals</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c78/">Overload</a>

                     &gt;                         <a href="https://members.accu.org/index.php/articles/c407/">o155</a>
<br />

                                            <a href="https://members.accu.org/index.php/articles/c67-407/">Any of these categories</a>

                    -                        <a href="https://members.accu.org/index.php/articles/c67+407/">All of these categories</a>
<br />
</td>
   </tr>
   </tbody>
</table>




<div class="xar-error">
   <p>
 <strong>Note:</strong> when you create a new publication type,
the articles module will automatically use the templates
<em>user-display-[publicationtype].xt</em>
and <em>user-summary-[publicationtype].xt</em>.
If those templates do not exist when you try to preview or display a new article,
you'll get this warning :-)  Please place your own templates in themes/<em>yourtheme</em>/modules/articles . The templates will get the extension .xt there. </p>
</div>
<div class="xar-norm xar-standard-box-padding">
   <h1><strong>Title:</strong>&nbsp;The Path of Least Resistance</h1>
<p><strong>Author:</strong>&nbsp;Bob Schmidt</p>
<p>
<strong>Date:</strong> 04 February 2020 18:47:24 +00:00 or Tue, 04 February 2020 18:47:24 +00:00</p>
<p><strong>Summary:</strong>&nbsp;Pythonâ€™s modules and imports can be overwhelming. Steve Love attempts to de-mystify the process.</p>
<p><strong>Body:</strong>&nbsp;<p>There is more to Python than scripting. As its popularity grows, people naturally want to do more with Python than just knock out simple scripts, and are increasingly wanting to write whole applications in it. Itâ€™s lucky, then, that Python provides great facilities to support exactly that need. It can, however, be a little daunting to switch from using Python to run a simple script to working on a full-scale application. This article is intended to help with that, and to show how a little thought and planning on the structure of a program can make your life easier, and help you to avoid some of the mistakes that will certainly make your life harder.</p>

<p>We will examine modules and packages, how to use them, and how to break your programs into smaller chunks by writing your own. We will look in some detail at how Python locates the modules you import, and some of the difficulties this presents. Throughout, the examples will be based on the tests, since this provides a great way to see how modules and packages work. As part of that, weâ€™ll get to explore how to test modules independently, and how to structure packages to make using them straightforward.</p>

<h2>A Python program</h2>

<p>We begin with that simple, single file script, because it introduces some of the ideas that weâ€™ll build on later. Listing 1 is a simple tool to take CSV data and turn it into JSON.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
import csv
import json
import sys

parser = csv.DictReader( sys.stdin )
data = list( parser )
print( json.dumps( data, sort_keys=True,
  indent=2 ) )
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 1</td>
	</tr>
</table>

<p>This program takes its input from <code>stdin</code>, and parses it as comma separated values. The <code>DictReader</code> parser turns each row of the CSV data into a <code>dict</code>.</p>

<p>Next, the entire input is read by creating a list of the dictionaries.</p>

<p>Lastly, the list is transformed into JSON, and the result (pretty) printed to <code>stdout</code>.</p>

<p>The code itself is not really what this article is about. Whatâ€™s more interesting are the <code>import</code> statements at the top of the program. All 3 of these modules are part of the Python Standard Library. The <code>import</code> statement tells the Python interpreter to find a module (or a package, but weâ€™ll get to that) and make its contents available for use.</p>

<p>The fact that these are standard library modules means theyâ€™re always available if you have Python installed. The implication here is that you can share this program with anyone, and theyâ€™ll be able to use it successfully if <em>they</em> have Python installed (and itâ€™s the right version, and youâ€™ve told them <em>how</em> to use it).</p>

<p>Thatâ€™s all well and good for such a simple program, but sooner or later (sooner, hopefully!) someone will ask â€œWhere are the tests?â€, followed quickly by â€œAnd what about error handling?â€ Error handling is left as an exercise, but testing allows us to explore more features of Pythonâ€™s support for modules.</p>

<h2>Modularity 0.0</h2>

<p>The basic Python feature for splitting a program into smaller parts is the function. It may seem overkill for our tiny script with only 3 lines of working code, but one of the side-effects (bumph-tish!) of creating a function is that with a little care, you can make a unit-test to exercise it independently. â€˜Independentlyâ€™ has multiple levels of meaning here: independent of other functions in your program, independent of the file system or other external components like databases, independent of user input â€“ and, by the way â€“ screen output. Your unit-tests should work <em>as if</em> the computer is disconnected from the outside world â€“ no keyboard, no mouse, no screen, no network, no disk.</p>

<p>Why is this important? Partly because disk and network access is slow and unreliable, and you want your tests to run quickly, and partly because your tests might be run by some automated system like a Continuous Integration system that might not have access to the same things you do on your workstation.</p>

<p>Our program so far doesnâ€™t lend itself to being easily and automatically tested, so weâ€™ll start with factoring out the code that parses CSV data into a list of dictionaries. Listing 2 shows an example.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
import csv
import json
import sys
def read_csv(input):
  parser = csv.DictReader(input)
  return list(parser)

data = read_csv(sys.stdin)
print(json.dumps(data, sort_keys = True, 
  indent = 2))
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 2</td>
	</tr>
</table>

<p>The function still uses a <code>DictReader</code>, but instead of directly using <code>sys.stdin</code>, it passes the argument it receives from the calling code.</p>

<p>The calling code now passes <code>sys.stdin</code> to the function and captures the result. The printing to screen remains the same.</p>

<p>A simple first test for this might be to check that the function doesnâ€™t misbehave if the input is empty. Although Python has a built-in unit-testing facility, there are lightweight alternatives, and the examples in this article all use pytest. Pytest will automatically discover test functions with names prefixed with <span class="filename">test</span>, in files with names prefixed with <span class="filename">test_</span>. For this example, assume the program is in a file called <span class="filename">csv2json.py</span>, and the test file is <span class="filename">test_csv2json.py</span>, containing Listing 3.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
from csv2json import read_csv
def test_read_csv_accepts_empty_input():
  result = read_csv(None)
  assert result == []
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 3</td>
	</tr>
</table>

<p>The first line imports the <code>read_csv</code> function from our <span class="filename">csv2json</span> script. The import name <code>csv2json</code> is just the name of the file, without the <span class="filename">.py</span> extension.</p>

<p>To start with, we write a test that we expect to <em>fail</em>, just to ensure weâ€™re actually exercising what we <em>think</em> weâ€™re exercising. <code>csv.DictReader</code> works with <em>iterable</em> objects, but passing <code>None</code> should certainly cause an error.</p>

<p>Once again, itâ€™s the import thatâ€™s interesting, because it shows that any old Python script is also a module that can be imported. There is nothing special about code in Listing 2 to make it a module. A Python module is a namespace, so names within it must be unique, but can be the same as names from other namespaces without a clash. The syntax for importing shown here indicates that we only want one identifier from the csv2json module. Alternately, we could use</p>

<pre class="programlisting">
  import csv2json</pre>

<p>and then explicitly qualify the use of the function with the namespace:</p>

<pre class="programlisting">
  result = csv2json.read_csv( None ).</pre>

<p>We have a function, and a test with which to exercise it. Running that test couldnâ€™t be easier. From a command prompt/shell, in the directory location of the test script, run <code>pytest</code> [<a href="#[pytest]">pytest</a>].</p>

<p>But wait! Whatâ€™s this? (Figure 1 shows the output from pytest.)</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
==================== ERRORS ====================
____ ERROR collecting test_csv2json.py ____

test_csv2json.py:1: in &lt;module&gt;
    from csv2json import read_csv
csv2json.py:9: in &lt;module&gt;
    data = read_csv( sys.stdin )
csv2json.py:7: in read_csv
    return list( parser )
â€¦â€‹

    &quot;pytest: reading from stdin while output is captured!  Consider using `-s`.&quot;
E   OSError: pytest: reading from stdin while output is captured!  Consider using `-s`.
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Figure 1</td>
	</tr>
</table>

<p>It looks like the test failed, but not in the way we expect. It demonstrates another aspect of Pythonâ€™s import behaviour: importing a module <em>runs</em> the module. Thatâ€™s obviously not what we intended. What we need is some way to indicate the difference between <em>running</em> a python program, and <em>importing</em> its contents to a different program. Sure enough, Python provides a simple way to do this.</p>

<p>Each Python module in a program has a unique name, which you can access via the global <code>__name__</code> value. Additionally, if you invoke a program, then <em>its</em> name is always <code>__main__</code>. Listing 4 shows this in action.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
import csv
import json
import sys
def read_csv(input):
  parser = csv.DictReader(input)
  return list(parser)
if __name__ == '__main__':
  data = read_csv(sys.stdin)
  print(json.dumps(data, sort_keys = True,
    indent = 2))
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 4</td>
	</tr>
</table>

<p>Importing a module runs all the top-level code â€“ which includes the <em>definition</em> of functions. The code within a function (or class) is syntax checked, but not invoked</p>

<p>When the script is run, the test for <code>__name__</code> will fail if itâ€™s being run as a result of an import of the module.</p>

<p>The function is invoked explicitly if the script is being run rather than imported.</p>

<p>Running the test now produces the output shown in Figure 2.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
=================== FAILURES ===================
____ test_read_csv_accepts_empty_input() ____

    def test_read_csv_accepts_empty_input():
&gt;       result = read_csv( None )

test_csv2json.py:4:
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
csv2json.py:6: in read_csv
    parser = csv.DictReader( input )
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = &lt;csv.DictReader object at 0x000001E882CD94A8&gt;, f = None, fieldnames = None, restkey = None, restval = None, dialect = 'excel'
args = (), kwds = {}

    def __init__(self, f, fieldnames=None,
                 restkey=None, restval=None,
                 dialect=&quot;excel&quot;, *args, **kwds):
        self._fieldnames = fieldnames   
          # list of keys for the dict
        self.restkey = restkey          
          # key to catch long rows
        self.restval = restval          
          # default value for short rows
&gt;       self.reader = reader(f, dialect, *args,
                      **kwds)
E       TypeError: argument 1 must be an iterator
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Figure 2</td>
	</tr>
</table>

<p>This time, the test has failed in the way we expected: <code>DictReader</code> is expecting an iterator. We can now alter the test to something we expect to <em>pass</em>, as shown here:</p>

<pre class="programlisting">
  def test_read_csv_accepts_empty_input():
    result = read_csv( [] )
    assert result == []</pre>

<p>And sure enough, it does.</p>

<pre class="programlisting">
  ==== test session starts ====
  platform win32 -- Python 3.7.2, pytest-5.3.2, 
  py-1.8.0, pluggy-0.13.1
  rootdir:
  collected 1 item
  
  test_csv2json.py .      [100%]
  
  ==== 1 passed in 0.04s ====</pre>

<p>The next steps would be to add more tests, and some error handling, and factoring out the JSON export with its own tests, too. These are left as exercises.</p>

<h3>What have we learned?</h3>

<ol>
	<li>Simple Python modules are just files with Python code in them</li>
	<li>Importing a module runs all the top-level code in that module</li>
	<li>Python modules all have a unique name within a program, which is accessed via the value of <code>__name__</code></li>
	<li>The entry point of the program is a module called <code>__main__</code></li>
	<li>Unit tests are a great way of exercising your code, but theyâ€™re also great for exercising the <em>structure</em> of the code.</li>
</ol>

<table class="sidebartable">
	<tr>
		<td class="title">Namespaces</td>
	</tr>
	<tr>
		<td>
			<table class="journaltable">
				<tr>
					<td>
						<p>You may need to explicitly qualify function calls with namespacing, if, for example, you were importing another module or package that defined a <code>read_csv()</code> function. A full qualification includes the name of the package:</p>
						<pre class="programlisting">
import textfilters.csv2json
def test_read_csv_accepts_empty_input():
  result = textfilters.csv2json.read_csv( [] )
  assert result == []</pre>
						<p>It may be, however, that just the module name provides sufficient uniqueness in the name, and so a compromise is possible:</p>
						<pre class="programlisting">
from textfilters import csv2json

def test_read_csv_accepts_empty_input():
  result = csv2json.read_csv( [] )
  assert result == []</pre>
						<p>Donâ€™t confuse this general idea with Pythonâ€™s Namespace Packages <a href="#[NamespacePkg]">[NamespacePkg]</a>, which are used to split packages up across several locations.</p>
					</td>
				</tr>
			</table>
		</td>
	</tr>

</table>

<h2>Packages</h2>

<p>Having satisfied the requirement for tests, we may wish to embellish our little library to be a bit more general. One obvious thing might be to extend the functionality to handle other data formats, and perhaps to be able to convert in any direction. It wouldnâ€™t necessarily be unreasonable to just add a new method to the module called <code>json2csv</code><a href="#FN01"><sup>1</sup></a>. However, if we want to add new text formats, the combinations become unwieldy, and would introduce an unnecessary amount of duplication.</p>

<p>The basis of the library is to take some input and parse it into a Python data structure, which we can then turn into an export format. Having gone to the trouble of separating the inputs and outputs, we can extend the idea and use a different module for each text format. Python provides another facility to bundle several modules together into a package.</p>

<p>As weâ€™ve already explored, a Python module can be just a file containing Python code. A Python package is a module that gathers sub-modules (and possibly sub-packages) together in a directory<a href="#FN02"><sup>2</sup></a>, along with a special file named <span class="filename">__init__.py</span>. For the time being, this can be just an empty file, but we will revisit this file in a later article.</p>

<p>Weâ€™ll begin by just adding our existing module to a package called <code>textfilters</code>. Our source tree now should look something like this:</p>

<pre class="programlisting">
  root/
   |__ textfilters/
          |__ __init__.py
          |__ csv2json.py
   |__ test_csv2json.py</pre>

<p>The <code>import</code> statement in <span class="filename">test_csv2json.py</span> now needs to change as shown here:</p>

<pre class="programlisting">
from textfilters.csv2json import read_csv 

def test_read_csv_accepts_empty_input():
  result = read_csv( [] )
  assert result == []</pre>

<p>The <code>import</code> statement line at the top says â€˜import the <code>read_csv</code> definition from the <code>csv2json</code> module in the <code>textfilters</code> packageâ€™.</p>

<p>Note that the portion of the import line <em>after</em> the <code>import</code> keyword defines the namespace. In this example, the namespace consists only of the function name, and needs no further qualification when used.</p>

<p>So far, so good. Whichever method of importing the function you use, running the test works, and (hopefully!) your tests still pass.</p>

<p>The next step is to split the functions into their separate modules. So far, we have two text formats to deal with: CSV and JSON. It makes some sense, therefore, to handle all the CSV functionality in a module called <code>csv</code>, and all the JSON in a module called <code>json</code>. Our original CSV function â€“ <code>read_csv</code> â€“ now has a name with some redundancy, duplicating as it does the name of the new module. I also decided that â€˜readâ€™ and â€˜writeâ€™ werenâ€™t really accurate names, implying some kind of file-system access, and decided upon <code>input</code> and <code>output</code>. Listing 5 shows the content of the new <code>csv</code> module called <span class="filename">csv.py</span>.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
import csv
from io
import StringIO
def input(data):
  parser = csv.DictReader(data)
  return list(parser)
def output(data):
  if len(data) &gt; 0:
    with StringIO() as buffer:
      writer = csv.DictWriter(buffer, data[
        0].keys())
      writer.writeheader()
      for row in data:
        writer.writerow(row)
      return buffer.getvalue()
  return &quot;&quot;
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 5</td>
	</tr>
</table>

<p>As previously, itâ€™s not really the implementation thatâ€™s interesting about this, itâ€™s that first <code>import</code> statement. Remember, this is a file called <span class="filename">csv.py</span>, and so <em>its</em> module name is <code>csv</code>. It should be clear from the code that the intention is to import the built-in <code>csv</code> module, and that is indeed what will <em>probably</em> happen, due to a number of factors. Itâ€™s time to talk about how Python imports modules.</p>

<h2>The search for modules</h2>

<p>Importing a module (and remember, packages are modules too) requires the Python interpreter to locate the code for that module, and make its code available to the program by <strong>binding</strong> the code to a name. You canâ€™t usually give an absolute path directly, but Python has a number of standard places in which it attempts to locate the module being imported.</p>

<p>Python keeps a cache of modules that have already been imported. Where an imported module is a sub-module, the parent module (and its parents) are also cached. This cache is checked first to see if a requested module has already been imported.</p>

<p>Next, if the module was not found in the cache, the paths in the system-defined <code>sys.path</code> list are checked in order. This sounds simple enough, but this is probably where the consequences of giving our own module the same name as a built-in one will become apparent. The contents of <code>sys.path</code> are initialized on startup with the following:</p>

<ol>
	<li>The directory containing the script being invoked, <em>or</em> an empty string to indicate the current working directory in the case where Python is invoked with no script (i.e. interactively).</li>
	
	<li>The contents of the environment variable <code>PYTHONPATH</code>. You can alter this to change how modules are located when theyâ€™re imported.</li>
	
	<li>System defined search paths for built-in modules.</li>
	
	<li>The root of the <code>site</code> module (more on this in a later article).</li>
</ol>

<p>Letâ€™s consider the directory layout containing our package:</p>

<pre class="programlisting">
  root/
   |__ textfilters/
          |__ __init__.py
          |__ csv.py
          |__ ...
   |__ test_filters.py</pre>

<p>If you invoke a python script, or run the Python interpreter, in the <span class="filename">root/textfilters/</span> directory, the statement <code>import csv</code> will find the <code>csv</code> module in that directory first. Note that the current working directory is <em>not</em> used if a Python script is invoked. Correspondingly, invoking a Python script, or running the Python interpreter in any other location would result in <code>import csv</code> importing the built-in <code>csv</code> module.</p>

<p>Back to our <span class="filename">csv.py</span> module, the statement <code>import csv</code> would indeed be recursive if you were to invoke the script directly, or any other script in the same directory that imported it. However, the usual purpose of a package is for it to be imported into a program, and the reason for making a directory for a package is to keep the packageâ€™s contents separate from other code.</p>

<p>To demonstrate this, letâ€™s have a look at how the test script looks now itâ€™s using the <code>textfilters</code> package in Listing 6.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
from textfilters
import csv
def test_read_csv_accepts_empty_input():
  result = csv.input([])
  assert result == []
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 6</td>
	</tr>
</table>

<p>The <code>import</code> statement is explicitly requesting the <code>csv</code> module from the <code>textfilters</code> package.</p>

<p>As long as the <code>textfilters</code> package is found, then this script will use the <code>csv</code> module within it, and will never import the built-in module of the same name. In this instance, the invoked Python script is <span class="filename">test_filters.py</span>, and the search path will have its directory at the head of <code>sys.path</code>. The <code>textfilters</code> package is found as a child of that directory, and all is good with the world.</p>

<p>If the <code>textfilters</code> package were located somewhere else, away from your main program, you could add its path to the <code>PYTHONPATH</code> environment variable to ensure it was found. As previously mentioned, you donâ€™t <em>directly</em> import modules using absolute paths, but the <code>PYTHONPATH</code> environment variable is one <em>indirect</em> way of specifying additional search paths. Requiring all users of your module(s) to have an appropriate setting for <code>PYTHONPATH</code> is a bit heavy-handed<a href="#FN03"><sup>3</sup></a>, but can be useful during development.</p>

<p>It needs to be said that deliberately naming your own modules to clash with built-in modules is a bad idea, because just relying on the search behaviour to ensure the right module is imported is flying close to the wind, to say the least. However, things are never that simple: you cannot know what future versions of Python will call new modules, and you cannot know what other 3rd party libraries your users have installed. This is one reason why itâ€™s a good idea to partition your code with packages, and be <em>explicit</em> about the names you import.</p>

<h3>What have we learned?</h3>

<ol>
	<li>A Python package is a module that has sub-modules. Standard Python packages contain a special file called <span class="filename">__init__.py</span>.</li>
	
	<li>The package name forms part of the namespace, and needs to be unique in a program.</li>
	
	<li>Python looks for modules in a few standard places, defined in a list called <span class="filename">sys.path</span>.</li>
	
	<li>You can modify the search path easily by defining (or changing) an environment variable called <code>PYTHONPATH</code>.</li>
	
	<li>You should partition your code with packages to minimize the risk of your names clashing with other modules.</li>
</ol>

<h2>Relativity</h2>

<p>One of the reasons for packaging up code is to make it easy to share. At the moment, we have a package â€“ <code>textfilters</code>, and the test code for it lives <em>outside</em> the package. If we share the package, we should also share the tests, and having them <em>inside</em> the package means we can more easily share it just by copying the whole directory.</p>

<p>Look back at Listing 6, and note the <code>import</code> statement directly names the package. While this is fine (the tests should pass), it seems redundant to have to explicitly name it. Since this test module is now part of the same package as the <code>csv</code> module, whatâ€™s wrong with just <code>import csv</code>?</p>

<p>The problem with that is we will get caught out (again) by the Python search path for modules; <code>import csv</code> will just import the built-in <code>csv</code> module. Is there an alternative to explicitly having to give a full, absolute, name to modules imported within a package?</p>

<p>We previously learned that you canâ€™t give an absolute path to the <code>import</code> statement, but you <em>can</em> request a <em>relative</em> path when importing within a module, i.e. one file in a package needs to import another module <em>within the same package</em>. Listing 7 shows the needed invocation of import.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
from . import csv 

def test_read_csv_accepts_empty_input():
  result = csv.input( [] )
  assert result == []
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 7</td>
	</tr>
</table>

<p>The test file is now part of the <code>textfilters</code> package, and so uses <code>.</code> instead of the package name to indicate â€˜within my package directoryâ€™.</p>

<p>As weâ€™ve already noted, Python modules have a special value called <code>__name__</code>. Weâ€™ve looked at how this is set to <code>__main__</code> when a module is run as a script. When a module is imported, this value takes on the namespace name, which is essentially the path to the module relative to the applicationâ€™s directory, but separated by <code>.</code> instead of <code>\</code> or <code>/</code>.</p>

<p>Python modules have another special value which is used to resolve relative imports within it. This is the <code>__package__</code> value, which contains the name of the package. If the module is <em>not</em> part of a package, then this value is the empty string. This is discussed in detail in PEP 366 [<a href="#[PEP366]">PEP366</a>], but the important thing to note here is that, since the testing module is now part of a package, the relative import path shown in Listing 7 uses the value of <code>__package__</code> to determine how to find the csv module to be imported.</p>

<p>The package so far is a basic outline, with a simple API to translate one data format to another. We can extend this idea with facilities to transform the data as it passes through, perhaps to rename fields, or select only those fields we need. We could clearly just make a new package for these general utilities, but the intention is that theyâ€™re used in conjunction with the functions weâ€™ve already created, so letâ€™s instead make the new package a child of the existing one.</p>

<pre class="programlisting">
  root/
   |__ textfilters/
          |__ __init__.py
          |__ csv.py
          |__ ...
          |__ transformers/
              |__ __init__.py
              |__ change.py
          |__ test_filters.py</pre>
		  
<p>Relative imports also work for sub-packages. This is easily demonstrated with more tests, this time for the <span class="filename">transformers/change.py</span> module, as shown in Listing 8.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
# textfilters / transformers / change.py
def change_keys(data, fn):
  return {
    fn(k): data[k]
    for k in data
  }
# textfilters / test_change.py
from .transformers.change
import change_keys
def test_change_keys_transforms_input():
  d = {
    1: 1
  }
  res = change_keys(d, lambda k: 'a')
  assert res == {
    'a': 1
  }
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 8</td>
	</tr>
</table>

<p>The test module imports the code under test from <span class="filename">change.py</span> using a relative import path prefixing the package and module names.</p>

<p>In this case, itâ€™s exactly as if weâ€™d written:</p>

<pre class="programlisting">
  from textfilters.transformers.change import
  change_keys.</pre>

<p>By now, weâ€™re collecting a few test modules in the base of the package directory, and we might want to think about more tidying up to gather all the tests together away from the actual package code. We can do this by creating a new sub-package called <code>tests</code> and moving all the test code into it. This must be a sub-package, and so requires an <span class="filename">__init__.py</span> of its own, and the relative imports need to change as shown in Listing 9.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
# textfilters / tests / test_change.py
from ..transformers.change
  import change_keys
def test_change_keys_transforms_input():
  d = { 1: 1 }
  res = change_keys(d, lambda k: 'a')
  assert res == { 'a': 1 }
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 9</td>
	</tr>
</table>

<p>The relative module imports now have an extra dot to indicate the <em>parent</em> package location.</p>

<p>This may look a bit like relative paths on a file-system, where <span class="filename">../</span> is the parent directory, itâ€™s not quite the same. Packages can be arbitrarily nested, and to indicate the grand-parent directory, on Linux youâ€™d say <span class="filename">../../</span>, whereas in Python you just add another dot: <span class="filename">..</span>.</p>

<h3>What have we learned?</h3>

<ol>
	<li>Modules in a package can use relative imports to access code within the same module.</li>
	<li>Packages have a special value <code>__package__</code> which is used to resolve relative imports.</li>
	<li>Packages can have sub-packages, and these can be deeply nested â€“ if you really want!</li>
</ol>

<h2>All together now</h2>

<p>Weâ€™ve demonstrated how to write test code in the package to exercise our little library, so now for completeness, itâ€™s time to demonstrate a little running program<a href="#FN04"><sup>4</sup></a>. Listing 10 shows how it might be used.</p>

<table class="sidebartable">
	<tr>
		<td>
			<pre class="programlisting">
from textfilters
import csv, json
from textfilters.transformers.change
import change_keys
import sys
if __name__ == '__main__':
  def key_toupper(k):
    return k.upper()
  data = csv.input(sys.stdin)
  result = [change_keys(row, key_toupper) for row
    in data
  ]
  print(json.output(result, sort_keys =
    True, indent = 2))
			</pre>
		</td>
	</tr>
	<tr>
		<td class="title">Listing 10</td>
	</tr>
</table>

<p>This short program essentially does what the code in Listing 1 did, with the added bonus of taking the column-names and changing them to upper case. Importing the required functions is the job of the first 2 lines, and for now, the <code>textfilters</code> package needs to be on Pythonâ€™s search path. The simplest way to do <em>that</em> is to have it as a sub-directory of the main application folder.</p>

<p>Have we achieved what we set out to achieve? We had three goals in mind at the start:</p>

<ol>
	<li>to be able to test independent code independently</li>
	<li>to be able to split our programs into manageable chunks</li>
	<li>to be able to share those chunks with others.</li>
</ol>

<p>We have created a package, with its own set of tests. Those tests not only test each part of the package code independently of the others, but also â€“ and importantly â€“ independently of the code that uses it in the main application. This makes sharing the code with others much easier: the package is a small, self-contained parcel of functionality.</p>

<p>The import statements are a little verbose, due to the use of sub-packages, and the need to make the package directory a direct child of the main application is a little unwieldy.</p>

<p>In the next installment, we will explore how to improve on both of those things so that your fellow users will really like using your library.</p>

<h2>References and resources</h2>

<p class="bibliomixed"><a id="[NamespacePkg]"></a>[NamespacePkg] Python Namespace Packages: <a href="https://packaging.python.org/guides/packaging-namespace-packages/">https://packaging.python.org/guides/packaging-namespace-packages/</a></p>

<p class="bibliomixed"><a id="[PEP366]"></a>[PEP366] â€˜PEP 366, Main module explicit relative importsâ€™ <a href="https://www.python.org/dev/peps/pep-0366/">https://www.python.org/dev/peps/pep-0366/</a></p>

<p class="bibliomixed"><a id="[pytest]"></a>[pytest] â€˜A Python testing frameworkâ€™ <a href="https://docs.pytest.org/en/latest/">https://docs.pytest.org/en/latest/</a></p>

<p class="footnotes"></p>

<ol>
	<li><a id="FN01"></a>It doesnâ€™t always make sense to do this conversion due to the possibility of nesting in the JSON input, of course, but just for the exercise.</li>
	
	<li><a id="FN02"></a>This is a little simplistic, because there is no intrinsic requirement for modules or packages to exist on a file-system, but it suffices for now.</li>
	
	<li><a id="FN03"></a>You can also modify sys.path in code, since itâ€™s just a list of paths, but your users will probably not thank you for it</li>
	
	<li><a id="FN04"></a>Ok, not actually completeness, because some of the functionality that this uses was left as exercises. You did do the exercises?</li>
</ol>

<p class="bio"><span class="author"><b>Steve Love</b></span> is an independent developer constantly searching for new ways to be more productive without endangering his inherent laziness.</p>
</p>
<p><strong>Notes:</strong>&nbsp;</p>
<p><em>More fields may be available via dynamicdata ..</em></p>
</div>
</channel>
</rss>
