Variables in Python generally have a lifetime of their own. Or rather, the Python runtime interpreter handles object lifetime with automated garbage collection, leaving you to concentrate on more important things. Like Resource lifetime, which is much more interesting.
Python provides some facilities for handling the deterministic clean-up of certain objects, because sometimes it’s necessary to know that it has happened at a specific point in a program. Things like closing file handles, releasing sockets, committing database changes – the usual suspects.
In this article I will explore Python’s tools for managing resources in a deterministic way, and demonstrate why it’s easier and better to use them than to roll your own.
Why you need it
Python, like many other languages, indicates runtime errors with exceptions, which introduces interesting requirements on state. Exceptions are not necessarily visible directly in your code, either. You just have to know they might occur. Listing 1 shows a very basic (didactic) example.
def addCustomerOrder( dbname, customer, order ): db = sqlite3.connect( dbname ) (1) db.execute( 'INSERT OR REPLACE INTO customers \ (id, name) VAlUES (?, ?)', customer ) (2) db.execute( 'INSERT INTO orders (date, custid,\ itemid, qty) VALUES (?, ?, ?, ?)', order ) (3) db.commit() (4) db.close() (5) |
Listing 1 |
If an exception occurs between lines (1) and (3), the data won’t get committed to the database, the connection to the database will not be closed, and will therefore ‘leak’. This could be a big problem if this function or other functions like it get called frequently, say as the backend to a large web application. This wouldn’t be the best way to implement this in any case, but the point is that the db.execute()
statement can throw all kinds of exceptions.
You might then try to explicitly handle the exceptions, as shown in Listing 2, which ensures the database connection is closed even in the event of an exception. Closing a connection without explicitly committing changes will cause them to be rolled back.
def addCustomerOrder( dbname, customer, order ): db = sqlite3.connect( dbname ) try: db.execute( 'INSERT OR REPLACE \ INTO customers \ (id, name) VAlUES (?, ?)', customer ) db.execute( 'INSERT INTO orders \ (date, custid, itemid, qty) \ VALUES (?, ?, ?, ?)', order ) db.commit() finally: db.close() |
Listing 2 |
It is a bit messy, and introduces some other questions such as: what happens if the sqlite3.connect
method throws an exception? Do we need another outer-try block for that? Or expect clients of this function to wrap it in an exception handler?
Fortunately, Python has already asked, and answered some of these questions, with the Context Manager. This allows you to write the code shown in Listing 3.
def addCustomerOrder( dbname, customer, order ): with sqlite3.connect( dbname ) as db: db.execute( 'INSERT OR REPLACE \ INTO customers \ (id, name) VAlUES (?, ?)', customer ) db.execute( 'INSERT INTO orders \ (date, custid, itemid, qty) \ VALUES (?, ?, ?, ?)', order ) db.close() |
Listing 3 |
The connection object from the sqlite3
module implements the Context Manager protocol, which is invoked using the with
statement. This introduces a block scope, and the Context Manager protocol gives objects that implement it a way of defining what happens when that scope is exited.
In the case of the connection object, that behaviour is to commit the (implicit, in this case) transaction if no errors occurred, or roll it back if an exception was raised in the block.
Note the explicit call to db.close()
outside of the with
statement’s scope. The only behaviour defined for the connection object as Context Manager is to commit or roll back the transaction when the scope is exited. This construct doesn’t say anything at all about the lifetime of the db
object itself. It will (probably) be garbage collected at some indeterminate point in the future.
You can do it too
This customer database library might have several functions associated with it, perhaps including facilities to retrieve or update customer details, report orders and so on. Perhaps it’s better represented as a type, exposing an interface that captures those needs. See Listing 4 for an example.
class Customers( object ): def __init__( self, dbname ): self.db = sqlite3.connect( dbname ) def close( self ): self.db.close() def addCustomerOrder( self, customer, order ): self.db.execute( 'INSERT OR REPLACE \ INTO customers (id, name) \ VAlUES (?, ?)', customer ) self.db.execute( 'INSERT INTO orders \ (date, custid, itemid, qty) \ VALUES (?, ?, ?, ?)', order ) # Other methods... with Customers( dbname ) as db: db.addCustomerOrder( customer, order ) db.close() |
Listing 4 |
Unfortunately, the line containing the with
statement provokes an error similar to this:
File "customerdb.py", line 21, in <module> with Customers( dbname ) as db: AttributeError: __exit__
You can’t use with
on just any type you create. It’s not a magic wand, either: the changes won’t get committed to the database if commit()
is not called! However, the Context Manager facility isn’t limited to just those types in the Python Standard Library. It’s implemented quite simply, as seen in Listing 5.
class Customers( object ): def __init__( self, dbname ): self.dbname = dbname def __enter__( self ): self.db = sqlite3.connect( self.dbname ) return self def __exit__( self, exc, val, trace ): if exc: self.db.rollback() else: self.db.commit() return None def close( self ): self.db.close() def addCustomerOrder( self, customer, order ): self.db.execute( 'INSERT OR REPLACE \ INTO customers (id, name) \ VAlUES (?, ?)', customer ) self.db.execute( 'INSERT INTO \ orders (date, custid, itemid, qty) \ VALUES (?, ?, ?, ?)', order ) # Other methods... with Customers( dbname ) as db: db.addCustomerOrder( customer, order ) db.close() |
Listing 5 |
The __init__()
method is still there, but just saves the name away for later use. When the with
statement is executed, it calls the object’s __enter__()
method, and binds the return to the as
clause if there is one: in this case, the db
variable. The main content of the original construction method has been moved to the __enter__()
method. Lastly, when the with
statement block scope is exited, the __exit__()
method of the managed object is called. If no exceptions occurred in the block, then the three arguments to __exit__()
will be None
. If an exception did occur, then they are populated with the type, value and stack trace object associated with the exception. This implementation essentially mimics the behaviour of the sqlite3 connection object, and rolls back if an exception occurred.
Returning a false-value indicates to the calling code that any exception that occurred inside the with
block should be re-raised. Returning None
counts – and is only explicitly specified here for the purposes of explaining it. A Python function with no return statement is implicitly None
. Returning a true-value indicates that any such exception should be suppressed.
Object vs. Resource Lifetime | |
|
Consistent convenience
Having to explicitly close the connection after the block has exited is a bit of a wart. We could decide that our own implementation of the __exit__()
method invokes close()
on the connection object having either committed or rolled back the changes, but there is a better way.
The contextlib
module in the Python Standard Library provides some convenient utilities to help with exactly this, including the closing
function, used like this:
from contextlib import closing with closing( Customers( dbname ) ) as db: db.addCustomerOrder( customer, order )
It will automatically call close()
on the object to which it’s bound when the block scope is exited.
Python File objects also have a Context Manager interface, and can be used in a with
statement too. However, their behaviour on exit is to close the file, so you don’t need to use the closing utility for file objects in Python.
with open( filename ) as f: contents = f.read()
So much for consistency! It’s a little odd having to know the internal behaviour of a given type’s Context Manager implementation (and the documentation isn’t always clear on which types in the Standard Library are Context Managers), but sometimes the price of convenience is a little loss of consistency.
To reiterate the point about lifetime, even though the connection and file objects in the previous two examples have been closed, the lifetimes of the objects has not been affected.
When one isn’t enough
Sometimes it’s useful to associate several resources with a single Context Manager block. Suppose we want to be able to import a load of customer order data from a file into the database using the facility we’ve already made.
In Python 3.1 and later, this can be achieved like this:
with closing( Customers( dbname ) ) as db, \ open( 'orders.csv' ) as data: for line in data: db.addCustomerOrder( parseOrderData( line ) )
If you’re stuck using a version of Python earlier than that, you have to nest the blocks like this:
with closing( Customers( dbname ) ) as db: with open( 'orders.csv' ) as data: for line in data: db.addCustomerOrder( parseOrderData( line ) )
Either syntax gets unwieldy very quickly with more than two or three managed objects. One approach to this is to create a new type that implements the Context Manager protocol, and wraps up multiple resources, leaving the calling code with a single with statement on the wrapping type, as shown in Listing 6.
class WrappedResources( object ): def __init__( self, dbname, filename ): self.dbname = dbname self.filename = filename def __enter__( self ): self.db = sqlite3.connect( self.dbname ) self.data = open( self.filename ) def __exit__( self, *exceptions ): if not any( exceptions ): self.db.commit() def close( self ): self.data.close() self.db.close() def addCustomerOrder( customer, order ): pass # do the right thing here with closing( WrappedResource( dbname, fname ) ) \ as res: for line in res.data: res.addCustomerOrder( parseOrderData( line ) ) |
Listing 6 |
That really is a little clunky, however you look at it, since it’s fairly obvious that the class has multiple responsibilities, and exposes the managed objects publicly, amongst other things. There are better ways to achieve this, and we will return to this shortly.
Common cause
Having implemented a (basic) facility to import data from a file to our database, we might like to extend the idea and optionally read from the standard input stream. A simple protocol for this might be to read sys.stdin
if no filename is given, leading to code like this:
with options.filename and \ open( options.filename ) or sys.stdin as input: # do something with the data
That’s all very well, but is a little arcane, and closing the standard input handle when it completes might be considered bad manners. You could go to all the bother of reinstating the standard input handle, or redirecting it some other way, but that too seems more complicated than what is required.
Python’s contextlib
module has another handy utility to allow you to use a generator function as a Context Manager, without going to the trouble of creating a custom class to implement the protocol. It is used to decorate a function, which must yield
exactly one value to be bound to the as
clause of a with
statement. Actions to perform when the block is entered are put before the yield
, actions to perform when the block is exited are put after the yield
. It follows the basic pattern shown in Listing 7:
(1) will be called when the with
statement is entered. It’s the equivalent of the __enter__()
method
(2) will be called when the block is exited. It’s the equivalent of the __exit__()
method
import contextlib @contextlib.contextmanager def simpleContext(): doPreActionsHere() (1) yield managed_object doPostActionsHere() (2) |
Listing 7 |
This allows us to define a couple of factory functions for our inputs, as shown in Listing 8.
import contextlib def openFilename(): return open( options.filename ) @contextlib.contextmanager def openStdIn(): yield sys.stdin opener = options.filename and openFilename \ or openStdIn with opener() as f: pass # Use f |
Listing 8 |
Since opening a ‘real’ file returns an object that is already a Context Manager, the function for that isn’t decorated. Likewise, since we do not want to perform any action on the sys.stdin
object on exit, that function has no behaviour after the yield
.
It should be clear that the Context Manager protocol is more general purpose than just for performing some clean-up action when leaving a scope. Exception safety is the primary purpose of the Context Managers, but the __enter__()
and __exit__()
methods can contain any arbitrary behaviour, just as the decorated function can perform any actions before and after the yield
statement. Examples include tracking function entry and exit, and logging contexts such as those Chris Oldwood shows in C# [Oldwood].
Many and varied
As previously mentioned, it’s sometimes necessary to manage multiple resources within a single block. Python 3.1 and later support this by allowing multiple Context Manager objects to be declared in a single with
statement, but this becomes cluttered and unmanageable quickly. You can, as we demonstrated, create your own Context Manager type, but that too can be less than ideal. Once again, Python 3.3 answers the question with another contextlib
utility, the ExitStack
.
It manages multiple Context Manager objects, and allows you to declare them in a tidy (and indentation-saving) manner. See Listing 9.
with contextlib.ExitStack() as stack: f = stack.enter_context( open( \ options.filename ) ) db = stack.enter_context( sqlite3.connect( \ options.dbname ) ) |
Listing 9 |
Objects have their __exit__()
method called, in the reverse order to which they were added, when the block is exited.
The ExitStack
can manage a runtime-defined collection of context managers, such as this example taken directly from the Python 3.4 documentation [Python]:
with ExitStack() as stack: files = [ stack.enter_context( open( fname ) ) \ for fname in filenames ] # All opened files will automatically be closed # at the end of the with statement, even if # attempts to open files later in the list raise # an exception
Conclusion
Python’s Context Managers are a convenient and easy-to-use way of managing Resource Lifetimes, but their utility goes beyond that, due to the flexible way they are provided. The basic idea is not a new one – even in Python, where it was first introduced in version 2.5 – but some of these facilities are only available in later versions of the language. The examples given here were tested using Python 3.4.
Exception safety facilities like the Python Context Manager are common to many languages that feature the use of exceptions to indicate errors, because this introduces the need for some local clean-up in the presence of what is (in effect) a non-local jump in the code. They are, however, useful for things beyond this need, and Python provides several useful utilities to help manage the complexity this brings.
References
[Python] Python 3.4 Documentation. https://docs.python.org/3.4/library/contextlib.html
[Oldwood] Oldwood, Chris. Causality, http://chrisoldwood.com/articles/causality.html
Overload Journal #133 - June 2016 + Programming Topics
Browse in : |
All
> Journals
> Overload
> o133
(8)
All > Topics > Programming (877) Any of these categories - All of these categories |