This is the second instalment, following on from the introduction to Python modules [Love20]. In that article, we looked at how to create your own modules, a little on how to split your program into modules to make sharing of the code easier, and how to structure packages to make testing them easier. In this article, we will take a more detailed look at making the packages you create easier to import and use. We will explore more ways to share your packages with others, and some ways of ensuring you can always have a dependable environment in which your code runs.
A little more on the import statement
In the previous article, we described a simple package with code to take input in one structured format, e.g. JSON or CSV, and turn it into another format, perhaps performing simple transformations on the way.
Listing 1 shows the basic usage of the code in our own package textfilters
. For the sake of keeping the package contents tidy, we created some sub-packages so that the code to perform transformations was separate from the main package, and the tests for the package were all in one place, also separate. The package structure we ended up with is shown below.
<project root>/ |__ main.py |__ textfilters/ |__ __init__.py |__ csv.py |__ json.py |__ transformers/ |__ __init__.py |__ change.py |__ choose.py |__ tests/ |__ __init__.py |__ test_filters.py |__ test_change.py |__ test_choose.py
This structure explains the two import statements in Listing 1: the first such import brings in the main filters for taking (in this case) CSV input and turning it into JSON output. The second import is pulling a single function – change_keys
– from a module called change
. This module is in a package named transformers
, which is a sub-package of the textfilters
package.
from textfilters import csv, json from textfilters.transformers.change import change_keys import sys if __name__ == '__main__': def key_toupper( k ): return k.upper() data = csv.input( sys.stdin ) result = [ change_keys( row, key_toupper ) for row in data ] print( json.output( result, sort_keys=True, indent=2 ) ) |
Listing 1 |
As we mentioned in the previous article [Love20], there are a few ways we could arrange the import statements, with alterations to the usage. The portion of the import line after the import
statement effectively defines the namespace, so that first import line could be:
import textfilters.csv
And the corresponding use of the csv object would become:
data = textfilters.csv.input( sys.stdin )
This demonstrates why namespaces are so important. Python already has a built-in module named csv
(which our package’s csv
module uses), and it’s not unimaginable that you would want to import both of those. Explicitly fully naming the textfilters.csv
module allows Python’s csv
module to also be used alongside it.
Python provides a shortcut to import all the names from a module. Consider the following:
from textfilters.csv import * data = input( sys.stdin )
The import statement here requests that all the names from the textfilters.csv
module are imported into the current namespace. On the face of it, this seems great – we get to use the input
function unadorned! However, there are pitfalls to this approach. Programming is more than a typing exercise, and names matter.
Whilst that import *
directive did indeed bring the name of the function we wanted into the current scope, it also brought in every other name exported by the csv
module (we will return to what ‘exported’ means later). This may, or may not be what you intended. To see why it’s important, create a file called namespace.py with the code below (a cut-down version of the textfilters.csv contents).
import csv def input( data ): return list( csv.DictReader( data ) )
Now run a Python interpreter session in the same directory, and try the following:
>>> csv = '1,2,3' >>> csv '1,2,3' >>> from namespace import * >>> csv <module `csv` from `...`>
Here, we’re creating a variable called csv
, and assigning it a value. Importing *
from the namespace
module then over-writes that value. I’m sure you can guess why, but to make this completely clear, when the namespace
module invokes import csv
, it’s bringing the name csv
into its scope as an exported name along with the name input
. When you import the namespace
module, any exported names are brought into your scope, over-writing your own variable names where they clash.
Of course, while you can be disciplined and always avoid the use of import *
, you can’t very well impose that on everyone who might use your package. There are ways of helping to prevent your users from shooting themselves in their own feet.
Explicit is better than implicit | |
|
Private names
Not all names are imported when you use the from
module import *
form. Python has a convention for making names private to a module (or indeed, a class – the mechanism is the same) by prefixing it with an underscore. Consider the code in Listing 2.
import csv as _stdcsv from io import StringIO as _StringIO def input( data ): parser = _stdcsv.DictReader( data ) return list( parser ) ... |
Listing 2 |
The import
statement allows you to alter the names of things you import, and by renaming csv
as _stdcsv
, we make that name private to the module. If a user of this module now invokes from textfilters.csv import *
, those names are not brought into scope. Note how this affects the usage within the module’s code. You can still explicitly request private names when you import from a module, because in Python, private doesn’t mean really private, it just means you have to try a little harder to get access to it.
Define a public API
You can also limit the set of names brought into local scope when using from
module import *
by defining a module-level list of strings called __all__
. If this value exists when from
module import *
is encountered, it is taken to mean ‘this is the list of all public names in the module’. It’s just a list of the names from the module you wish to be public. In the instance of the code in Listing 2, this would be defined as:
__all__ = [ 'input', 'output' ]
Adding this line to textfilters/csv.py will change the behaviour of import *
for everyone so that only the names you defined will get imported.
What have we learned?
- Using
import *
imports all the public names from a module. - You can rename imported things in the import statement.
- Prefixing names with an underscore makes them ‘private’, so
import *
does not import them. - As the author of a module, you can also limit the names that
*
imports by defining a value for the special__all__
list. - As the user of a module, avoid using
import *
, as it can bring in unexpected names that may hide names in your code.
Package initialization
In the previous instalment [Love20], we explored how packages are a special kind of Python module which can have sub-modules – some of which may also be packages. Python identifies a package by the existence of a file named __init__.py. What we didn’t mention was that this file gets ‘run’ by the Python interpreter when the package is imported, in much the same way that the top-level code of a simple module is run when imported.
This file can contain any Python code you like, but it’s useful for bringing sub-module names into a narrower scope. Consider again the directory layout of our package:
|__ textfilters/ |__ __init__.py |__ csv.py |__ json.py |__ transformers/ |__ __init__.py |__ change.py |__ choose.py
Functions inside the change.py sub-module of the sub-package transformers need a full-qualification when they’re imported:
from textfilters.transformers.change import change_keys
This is a bit unwieldy, but arises from the physical separation of the change
module from the choose
module. That physical separation helps us as the package author to structure the code for ease of maintenance, but imposes some unnecessary complexity on the users of our package. Listing 3 shows how I’d prefer to present the API to users.
from textfilters import csv, json from textfilters import reshape import sys if __name__ == '__main__': def key_toupper( k ): return k.upper() data = csv.input( sys.stdin ) result = [ reshape.change_keys( row, key_toupper ) for row in data ] print( json.output( result, sort_keys=True, indent=2 ) ) |
Listing 3 |
I’ve already mentioned there is more to programming than typing, but there is more to this than reducing key-presses. Your public API needn’t be constrained by the physical structure of the code, and how you choose to lay out your package needn’t be limited by how you wish your users to use it. We can take advantage of the fact that Python, by default, exports all public names from a module – including the modules it imports.
In order to achieve my desired result, a couple of changes are required. The first is to the transformers/__init__.py file:
from .change import change_keys
This brings the name change_keys
into the scope of the transformers
namespace, and removes the need for users to explicitly name the intermediate change
module name.
The second alteration is to the top-level package __init__.py.
from . import transformers as reshape
This renames the namespace of transformers
to be reshape
. Naturally, you could just rename the transformers
folder, but one reason you might not want to do that could be if you already have a version ‘in the wild’, but you’d like new users to have a new API, while still supporting existing users on the ‘old’ API.
We can streamline the API even further. A common pattern when using complex modules is to import the whole package and have access to its contents, as in Listing 4.
import textfilters as tf import sys if __name__ == '__main__': def key_toupper( k ): return k.upper() data = tf.csv.input( sys.stdin ) result = [ tf.reshape.change_keys( row, key_toupper ) for row in data ] print( tf.json.output( result, sort_keys=True, indent=2 ) ) |
Listing 4 |
As things stand, however, this will not work. You’ll get an error:
AttributeError: module 'textfilters' has no attribute 'csv'.
A common mistake is to presume that importing a package causes Python to go and find all of its sub-modules and import the published names from them all. Such behaviour could be quite expensive! This is why the __init__.py file is so important – it is how a package defines all of its published names. In order to achieve what we want in Listing 4, we just need to bring the names csv
and json
into the package scope, using the top-level package’s __init__.py:
from . import transformers as reshape from . import csv, json
A similar mistake is to presume that from textfilters import *
would cause Python to automatically load all the sub-modules. For the same reason as above, it does not. Not even the top-level modules (csv
and json
). The documented behaviour is that this imports the textfilters
package, but in our case, textfilters
is ‘just’ a directory. It does, however, run the textfilters/__init__.py. and import any published names that result from that.
As with simple modules, packages also recognise the special __all__
value as a list of strings naming the sub-modules to import. It’s crucial to note, however, that using __all__
isn’t transitive. Suppose you have the following:
- In textfilters/__init__.py:
__all__ = [ 'transformers' ]
- In textfilters/transformers/__init__.py:
__all__ = [ 'change', 'choose' ]
If you invoke from textfilters import *
, it will import the transformers
sub-package, but the sub-packages defined by the __all__
value in transformers/__init__.py will not be loaded. You would also need to invoke from textfilters.transformers import *
to also bring those names.
You can’t use the top-level __all__
value to import sub-packages, either. For example, the following will not work:
- textfilters/__init__.py
__all__ = [ 'transformers', 'transformers.change' ]
The consequence of this is that defining the public API for a package is best done by importing or defining the names you want in __init__.py. It’s not necessary to also specify __all__
, since importing *
from a package won’t bring any unexpected names into scope, as it might with a simple module.
What have we learned?
- A package’s __init__.py file gets run when it’s imported, and this file can contain Python code.
- You can use the __init__.py to alter the public API of your package.
- Importing
*
from a package does not automatically bring in any of the public names, only what is defined in the __init__.py.
Creating an installable package
Sharing a package directly by copying the package directory, or even better, including it in a shared version control system, is sufficient in most cases. There can be benefits to having a cleaner separation between application and library code, however. One example might be that a package is used across multiple applications. In such a case, it is wasteful and error-prone to have the package sources duplicated in different repositories. It makes more sense to have the shared code separately version-controlled in its own shared repository.
Most modern version control systems have the facility to build a working copy from multiple repositories, so this shouldn’t present a problem. However, you can avoid the need for that by creating your own installable package. If you’ve used Python for anything more sophisticated than simple scripts, you’ll almost certainly have come across pip
: the standard Python package installer1. In this section we’ll explore how to create a package that can be installed using pip
.
The very simplest installable package just needs a file named setup.py, located in the parent directory of the package itself (i.e. in the same directory as main.py in the example). Listing 5 shows the bare minimum contents.
from setuptools import setup, find_packages setup( name = 'TextFilters', version = '0.0.0.dev1', packages = find_packages(), ) |
Listing 5 |
The name and version properties are used to create the file name of the package. The version number here follows the recommended practice that is based on Semantic Versioning (see [PEP440] and [SemVer]). The pre-release specifier (.dev1
in this case) departs from the Semantic Version spec, and is the format understood by pip
, which – when installing from a shared package repository like PyPI
– ignores pre-releases unless they’re explicitly requested.
The last line uses a tool which automatically detects and includes any sub-packages (directories containing __init__.py). The packages property is merely a list of package and module names to be included, so you could explicitly name them:
packages = [ 'textfilters', 'textfilters.transformers' ]
This invocation would exclude the tests
sub-package, which might be what you intend. Note that sub-packages have to be explicitly named. If you have a large package with several sub-packages, the find_packages()
utility is much more convenient. Note also that the file main.py will not be included. In our case, that’s intentional, because it’s not inside a package.
There are many more parameters accepted by the setup()
function; we’ll examine a few of the common ones here, but a complete description, along with recommendations on version numbering schemes, and restrictions on things like the name
property, can be found in the Python Packaging Guide [PPG]. Many of those properties are used by the Python Package Index, PyPI.
For now, we have the bare essentials needed to create an installable package. To build it, run this command within the directory containing setup.py:
python setup.py bdist_wheel
This invocation creates a ‘binary distribution’, also known in Python circles as a wheel (see [PEP427] for all the gory details). If all went well2, you will see a couple of new directories: build and dist, and the dist folder should have your installable package in it, named TextFilters-0.0.0.dev1-py3-none-any.whl. You can create ‘source distributions’, too, if the package is pure Python code, but it doesn’t have any real benefit over a wheel format package.
The components of the file name are partly taken from the name and version parameters given to the setup()
function in setup.py (refer back to Listing 5). The last 3 parts identify the targeted Python language version (py3), the ABI (none, in this case) and the required platform (which we didn’t specify, and so is any). You can control these with other parameters to the setup()
function, but for our purposes, the code in the package is indeed intended for Python 3, and is pure Python code, with no ABI or platform requirements, so the defaults are appropriate.
The file itself is just a normal Zip file with a .whl extension, so you can examine the contents for yourself (I find 7-zip especially useful).
Before we install our shiny new package, however, we should talk about segregation.
Partitioning and separation
Python comes with a rich standard library of tools, some of which our example package is using – csv
and json
. You can also install 3rd party modules, and our package is using pytest
. In [Love20], we looked at how Python locates modules when they’re imported. As a reminder, here is the basic Python algorithm for finding modules:
- The directory containing the script being invoked, or an empty string to indicate the current working directory in the case where Python is invoked with no script – i.e. interactively.
- The contents of the environment variable
PYTHONPATH
. You can alter this to change how modules are located when they’re imported. - System defined search paths for built-in modules.
- The root of the
site
module.
It’s number 4 we’re interested in now – the site
module.
When you install a 3rd party package (such as pytest), it is installed into a directory named site-packages, which is a well-known location for the Python interpreter (the location may differ, depending on your platform). Whilst it is obviously convenient to have all the packages you want in one place, easily available for use in your Python programs, it can easily become cluttered. In particular, you might not want (or be able) to install the packages you create to the global site location, especially when they’re in early development.
One way to handle this might be to have multiple installations of Python, but this is wasteful unless you genuinely need multiple versions of Python available. A more light-weight way of handling it is to take advantage of Python’s virtual environments. These are a fully-featured Python environment, but cut back to the bare minimum needed. They don’t contain the 3rd party modules installed in the global Python install location (but you can choose to give a virtual environment access to those libraries) except for a few necessities – including the pip
installer module. The important thing is that a virtual environment is entirely independent of all other virtual environments, with its own site-packages location.
The implication of this is that you can create Python virtual environments with different libraries for different needs. This is useful now as a way of quarantining our custom package so that it doesn’t interfere with either the installed Python instance, or anyone else’s virtual environments. You should consider creating your environment somewhere outside of your code folders, maybe by putting the code beneath a new directory (named something like src, for example), and using the parent to hold the new environment.
python -m venv localpy
On some platforms you may be prompted to install a package for venv
to work, for example on my Ubuntu-based Mint distribution, I had to install python3-venv
.
This creates a new Python environment in a directory named localpy as a child of the current directory. You can choose wherever you like for it. If all’s gone to plan, you should now have a directory structure like this:
<project root>/ |__src/ |__main.py |__setup.py |__textfilters/ |__ ... |__localpy/ |__ ...
The structure of the environment will differ, depending on your platform, but will contain Python itself (on Windows, in localpy/Scripts, on *nix it’s in localpy/bin), along with pip
to install more libraries, and a script named activate
.
The activate
script ensures that the virtual environment’s Python and pip
are at the front of the current session’s path. It’s not necessary to always activate a virtual environment, however: you can invoke the Python interpreter by fully-qualifying the directory name, and it will ‘just work’. This extends to using pip
to install packages.
- Windows
.\localpy\Scripts\pip.exe install [package name]
- Mint (Ubuntu)
./localpy/bin/pip install [package name]
Python internally keeps track of where to find the platform-independent and platform-dependent files it needs in order to run, and where to find installed libraries. These are:
sys.prefix sys.exec_prefix
When a virtual environment is in use (either by activation, or by running the Python program), these values will point to the respective locations within the virtual environment. When no virtual environment is in use, these values point to the locations of the respective Python installation locations. Furthermore, when a virtual environment is in use, two more values can be used to find the location of the Python install from which the virtual environment was created:
sys.base_prefix sys.base_exec_prefix
These values enable the virtual environment to operate independently of the main Python installation(s), as well as any other virtual environments. You can find much more detailed information on how these things work in [venv] and [site], but for our purposes, all that remains is to install our local package into the independent environment. It’s as simple as (on Windows):
.\localpy\Scripts\pip install src\dist\TextFilters-0.0.0.dev1-py3-none-any.whl
If you now run a Python session using the virtual environment’s Python, you can import the textfilters
package, and see from where it was imported:
>>> import textfilters >>> textfilters <module 'textfilters' from '\\path\\to\\localpy\\lib\\site-packages\\ textfilters\\__init__.py'>
(This will look slightly different on non-Windows platforms, but the idea is the same).
What have we learned?
- You can create your own installable package to make sharing code even easier.
- Python wheels are zip-files.
- The site module is where Python looks for installed packages for use in code.
- Python virtual environments are a powerful way of segregating requirements with its own, independent site module.
It depends
Know your dependencies | |
|
Sometimes, a package you create will require other packages to be installed. In the case of our package, it can be used without anything other than Python’s standard libraries, but it does have some tests. Whilst they don’t depend exclusively on pytest
, which is the testing package we used in [Love20] (other frameworks are available, such as Nose2 [Nose2], which would also work just fine), we can use it to explore another feature of package creation.
In the setup.py file we created for our package, we can indicate that our package requires other libraries. In this case, we can tell the setup tools that the package pytest
should also be installed when our package is installed.
Listing 6 shows a change to setup.py with the addition of a parameter to the setup()
function named install_requires
. This is a list of packages, which in this case has only one item, but you can specify as many as you need here.
from setuptools import setup, find_packages setup( name = 'TextFilters', version = '0.0.0.dev1', packages = find_packages(), install_requires = [ 'pytest' ], ) |
Listing 6 |
Now re-create the package, and re-install it with an upgrade:
localpy\scripts\python src\setup.py bdist_wheel localpy\scripts\pip install --upgrade dist\TextFilters-0.0.0.dev1-py3-none-any.whl
You will see that pytest
, along with its requirements, is also automatically installed.
Sometimes you need a particular version of a dependent package, or perhaps you’ve tested on a particular stable release, and wish to constrain the versions of your dependencies. This is also specified in setup.py3:
install_requires = [ 'pytest>=5.0' ],
You can also depend on specific versions of Python itself in the setup.py parameters. In the case of our package, we may well want to ensure our users are on Python v3 or above. There are many reasons to do this, but chief among them is that the code in a package depends on some feature that was introduced in a specific Python release.
python_requires = '>=3',
There is much more you can specify, and describe, about your package in the setup.py file, but you can find a wealth of documentation on that in the Python packaging guide ([PPG]). We do need to revisit one aspect we’ve already looked at briefly – the version number.
As we’ve already seen, the version number specified in setup.py gets used to generate the file name of the resulting package wheel. In our example, we marked the version with a trailing .dev1
, which marks the package as a pre-release – specifically, still in development – which is used by pip
when performing upgrades.
Given a package with a version number indicating it’s stable (e.g. 0.0.1
), and a later version that’s marked as a pre-release (e.g. 0.1.0a1
), when performing an upgrade, pip
will by default give you the latest applicable stable release, which in this case is 0.0.1
. You can explicitly request that pre-releases are considered by passing the --pre
argument to pip
on the command line, or by specifically requesting a pre-release version.
Whilst we’re in development mode, and installing specific locally-created wheels, this isn’t an issue for our package, of course, but it does make a difference for the dependent packages in the install_requires
list.
It also makes a difference in a file that’s normally named requirements.txt (but needn’t be, necessarily), which is a file you can use alongside a virtual environment to have pip
install a whole collection of packages. This is a useful technique for specifying the library contents of a virtual environment, with needed packages at specific versions. It’s common to want this to ensure, for example, that different developers on a team have identical environments; if one person is developing against version 1 of some package, and someone else is using version 2, chaos is bound to ensue! The requirements file provides a way of creating a coherent environment that the whole team can use.
The simplest way to create the requirements file is to have pip
itself create one:
localpy\scripts\pip freeze > requirements.txt
The requirements file should contain something similar to this (truncated here for brevity):
... pytest==5.4.1 six==1.14.0 TextFilters==0.0.0.dev1 ...
Here, the file requires a specific version of each installed package. You can modify the version numbers if you need versions after a particular one, or within a range of versions, for example. Note that our own package, TextFilters
, is explicitly naming the pre-release version. Suppose we had been working on the package for a while, and had a few releases available in our dist directory:
TextFilters-0.0.0.dev1-py3-none-any.whl TextFilters-0.0.1-py3-none-any.whl TextFilters-0.0.2.dev1-py3-none-any.whl TextFilters-0.0.2a1-py3-none-any.whl TextFilters-0.0.3a1-py3-none-any.whl
We have stable 0.0.1
and 0.0.2
versions, but only a pre-release for 0.0.3
. Our requirements.txt file might have this line:
TextFilters>=0.0.1
We might create our virtual environment from scratch as follows:
python -m venv localpy localpy\scripts\pip install -r requirements.txt -f src\dist
Here, the -r
parameter to pip
instructs it to read the list of packages to install from the indicated file. By default, pip looks on PyPI [PyPI] for packages, but we haven’t published our package there yet, so the -f
parameter tells pip
to find packages in the specified location (which might, for example, be a file share available to the team), and look in PyPI for packages not found there.
This would result in our new environment having version 0.0.2
of our TextFilters package, because it’s the latest stable version available. If we had also added the parameter --pre
to the pip command line, the latest pre-release version – 0.0.3a1
– would have been installed.
What have we learned?
- An installable package can explicitly define other packages upon which it depends.
- The
pip
installer makes sophisticated use of the version numbers exposed by a package to determine how to install requirements. - You can easily create a canned fully-working virtual environment by using a library requirements file.
A wider audience
In this article we’ve explored in more detail the idea of Python ‘namespaces’, and how you can take advantage of package initialization to make using your package easier for your users. We’ve looked at some of the pitfalls of wild-card imports, and highlighted the benefits of creating a public API for your modules that might not match its physical structure. We also explored virtual environments, and how to create and install your own package ‘wheels’, and looked at why this segregation is important. Finally we looked at package dependencies, and how to manage them in concert with virtual environments and the pip
installer.
Taken all together, these things will help you structure your packages so they can be shared easily, and your users will find your packages easier to install and use as a result.
There is more you can do with your own packages. For example, in the previous article we looked at the pytest
unit-testing framework, and in this article we’ve looked at Python’s venv
. Both of these are installable modules that can be run, e.g.:
python -m venv
This is achieved by adding another special file to the package: __main__.py, which is executed when the package is run in this way4.
The ultimate sharing of packages with the wider community means publishing it to the Python Package Index ([PyPI]). There is excellent documentation on this in the Python packaging guide ([PPG]). Taking this extra step involves some extra responsibility, of course, in maintaining and documenting your package.
These things – and more! – I leave for you to discover.
References
[Love20] Steve Love (2020) ‘The path of least resistance’ in Overload 155, February 2020, https://accu.org/index.php/journals/2749
[Nose2] Nose2: https://docs.nose2.io/en/latest/
[PEP440] ‘Python Version Identification and Dependency Specification’, https://www.python.org/dev/peps/pep-0440/
[PEP427] ‘The Wheel Binary Package Format’ (PEP 427), https://www.python.org/dev/peps/pep-0427/
[PPG] The Python packaging guide, ‘Packaging and distributing projects’ at https://packaging.python.org/guides/distributing-packages-using-setuptools/
[PyPI] The Python Package Index, https://pypi.org/
[SemVer] ‘Semantic Versioning Scheme Specification’, https://semver.org/
[site] Python Documentation – Site specific configuration hook, https://docs.python.org/3/library/site.html
[venv] Python Documentation – Creation of virtual environements, https://docs.python.org/3/library/venv.html
Other resources
‘Packaging a Python library’, https://blog.ionelmc.ro/2014/05/25/python-packaging/#the-structure
Footnotes
pip
comes as part of the Python install for versions later than 3.4- You may need to install the
wheel
package from PyPI. - Setting an upper limit on the version is possible too, but be careful of that. If you tie down your requirements too tightly, it might make your package unusable.
- I wanted to explore this a bit more in the example package, but was defeated by the fact I’d (deliberately) used names that clashed with built-in Python modules. Another example of why not to do that!
is an independent developer constantly searching for new ways to be more productive without endangering his inherent laziness.
Overload Journal #156 - April 2020 + Programming Topics
Browse in : |
All
> Journals
> Overload
> o156
(8)
All > Topics > Programming (877) Any of these categories - All of these categories |