Title: Testing Times (Part 2)

Author: Bob Schmidt

Date: 08 March 2018 17:14:48 +00:00 or Thu, 08 March 2018 17:14:48 +00:00

Summary: Pete Goodliffe continues the journey into software testing.

Body:

If you donâ€™t care about quality, you can meet any other requirement.
~ Gerald M. Weinberg

In the previous column, we started a journey into the world of software testing; we considered why itâ€™s important, why it should be automated, and who should be doing this. We looked at the types of test we perform, when we write them, and when we run them.

So, letâ€™s round this off by looking at what should be tested, and what good tests look like.

What to test

Test whatever is important in your application. What are your requirements?

Your tests must, naturally, test that each code unit behaves as required, returning accurate results. However, if performance is an important requirement for your application, then you should have tests in place to monitor the codeâ€™s performance. If your server must answer queries within a certain time frame, include tests for this condition.

You may want to consider the coverage of your production code that the test cases execute. You can run tools to determine this. However, this tends to be an awful metric to chase after. It can be a huge distraction to write test code that tries to laboriously cover every production line; itâ€™s more important to focus on the most important behaviours and system characteristics.

Good tests

Writing good tests requires practice and experience; it is perfectly possible to write bad tests. Donâ€™t be overly worried about this at first â€“ itâ€™s most important to actually start writing tests than to be paralysed by fear that your tests are rubbish. Start writing tests and youâ€™ll start to learn.

Bad tests become baggage: a liability rather than an asset. They can slow down code development if they take ages to run. They can make code modification difficult if a simple code change breaks many hard-to-read tests.

The longer your tests take to run, the less frequently youâ€™ll run them, the less youâ€™ll use them, the less feedback youâ€™ll get from them. The less value they provide.

I once inherited a codebase that had a large suite of unit tests; this seemed a great sign. Sadly, those tests were effectively worse legacy code than the production code. Any code modification we made caused several test failures in hundreds-of-lines-long test methods that were intractable, dense, and hard to understand. Thankfully, this is not a common experience.

Bad tests can be a liability. They can impede effective development.

These are the characteristics of a good test:

Short, clear name, so when it fails you can easily determine what the problem is (e.g., new list is empty)
Maintainable: it is easy to write, easy to read, and easy to modify
Runs quickly
Up-to-date
Runs without any prior machine configuration (e.g., you donâ€™t have to prepare your filesystem paths or configure a database before running it)
Does not depend on any other tests that have run before or after it; there is no reliance on external state, or on any shared variables in the code
Tests the actual production code (Iâ€™ve seen â€˜unit testsâ€™ that worked on a copy of the production code â€“ a copy that was out of date. Not useful. Iâ€™ve also seen special â€˜testingâ€™ behaviour added to the SUT in test builds; this, too, is not a test of the real production code.)

These are some common descriptions of badly constructed tests:

Tests that sometimes run, sometimes fail (often this is caused by the use of threads, or racy code that relies on specific timing, by reliance on external dependencies, the order of tests being run in the test suite, or on shared state)
Tests that look awful and are hard to read or modify
Tests that are too large (large tests are hard to understand, and the SUT clearly isnâ€™t very isolatable if it takes hundreds of lines to set up)
Tests that exercise more than one thing in a single test case (a â€˜test caseâ€™ is a singular thing)
Tests that attack a class API function by function, rather than addressing individual behaviours
Tests for third-party code that you didnâ€™t write (there is no need to do that unless you have a good reason to distrust it)
Tests that donâ€™t actually cover the main functionality or behaviour of a class, but that hide this behind a raft of tests for less important things (if you can do this, your class is probably too large)
Tests that cover pointless things in excruciating detail (e.g., property getters and setters)
Tests that rely on â€˜white-boxâ€™ knowledge of the internal implementation details of the SUT (this means you canâ€™t change the implementation without changing all the tests)
Tests that work on only one machine

Sometimes a bad test smell indicates not (only) a bad test, but also bad code under test. These smells should be observed, and used to drive the design of your code.

What does a test look like?

The test framework you use will determine the shape of your test code. It may provide a structured set-up, and tear-down facility, and a way to group individual tests into larger fixtures.

Conventionally, in each test there will be some preparation, you then perform an operation, and finally validate the result of that operation. This is commonly known as the arrange-act-assert pattern. For unit tests, at the assert stage we typically aim for a single check â€“ if you need to write multiple assertions then your test may not be performing a single test case.

Listing 1 is an example Java unit test method that follows this pattern, and the key stages are:

@Test
public void stringsCanBeCapitalised()
{
  String input = "This string should be uppercase."; <1>
  String expected = "THIS STRING SHOULD BE UPPERCASE.";

  String result = input.toUpperCase();               <2>

  assertEquals(result, expected);                    <3>
}

Listing 1

<1> Arrange: we prepare the input
<2> Act: we perform the operation
<3> Assert: we validate the results of that operation

Maintaining this pattern helps keep tests focused and readable.

Of course, this test alone does not cover all of the potential ways to use and abuse String capitalisation. We need more tests to cover other inputs and expectations. Each test should be added as a new test method, not placed into this one.

Test names

Focused tests have very clear names that read as simple sentences. If you canâ€™t easily name a test case, then your requirement is probably ambiguous, or you are attempting to test multiple things.

The fact that the test method is a test is usually implicit (because of an attribute like the @Test we saw earlier), so you neednâ€™t add the word test to the name. The preceding example need not be called testThatStringsCanBeCapitalised.

Imagine that your tests are read as specifications for your code; each test name is a statement about what the SUT does, a single fact. Avoid ambiguous words like â€˜shouldâ€™, or words that donâ€™t add value like â€˜mustâ€™. Just as when we create names in our production code, avoid redundancy and unnecessary length.

Test names need not follow the same style conventions as production code; they effectively form their own domain-specific language. Itâ€™s common to see much longer method names and the liberal use of underscores, even in languages like C# and Java where they are not idiomatic (the argument being strings_can_be_capitalised requires less squinting to read).

The structure of tests

Ensure that your test suite covers the important functionality of your code. Consider the â€˜normalâ€™ input cases. Consider also the common â€˜failure casesâ€™. Consider what happens at boundary values, including the empty or zero state. Itâ€™s a laudable goal to aim to cover all requirements and all the functionality of your entire system with system and integration tests, and cover all code with unit tests. However, that can require some serious effort.

Do not duplicate tests: it adds effort, confusion, and maintenance cost. Each test case you write verifies one fact; that fact does not need to be verified again, either in a second test, or as part of the test for something else. If your first test case checks a precondition after constructing an object, then you can assume that this precondition holds in every other test case you write â€“ there is no need to reproduce the check every time you construct an object.

A common mistake is to see a class with five methods, and think that you need five tests, one to exercise each method. This is an understandable (but naÃ¯ve) approach. Function-based tests are rarely useful, as you cannot generally test a single method in isolation. After calling it, youâ€™ll need to use other methods to inspect the objectâ€™s state.

Instead, write tests that go through the specific behaviours of the code. This leads to a far more cohesive and clear set of tests.

Maintain the tests

Your test code is as important as the production code, so consider its shape and structure. If things get messy, clean it, and refactor it.

If you change the behaviour of a class so its tests fail, donâ€™t just comment out the tests and run away. Maintain the tests. It can be tempting to â€˜save timeâ€™ near deadlines by skipping test cleanliness. But rushed carelessness here will come back to bite you.

On one project, I received an email from a colleague: I was working on your XYZ class, and the unit tests stopped working, so I had to remove them all. I was rather surprised by this, and looked at what tests had been removed. Sadly, these were important test cases that were clearly pointing out a fundamental problem with the new code. So I restored the test code and â€˜fixedâ€™ the bug by backing out the change. We then worked together to craft a new test case for the required functionality, and then reimplemented a version that satisfied the old tests and the new.

Maintain your test suite, and listen to it when it talks to you.

Picking a test framework

The unit or integration test framework you use shapes your tests, dictating the style of assertions and checks you can use, and the structure of your test code (e.g., are the test cases written in free functions, or as methods within a test fixture class?).

So itâ€™s important to pick a good unit test framework. It doesnâ€™t need to be complex or heavyweight. Indeed, itâ€™s preferable to not choose an unwieldy tool. Remember, you can get very, very far with the humble assert. I often start testing new prototype code with just a main method and a series of asserts.

Most test frameworks follow the â€˜xUnitâ€™ model which came from Kent Beckâ€™s original Smalltalk SUnit. This model was ported and popularised with JUnit (for Java) although there are broadly equivalent implementations in most every languageâ€”for example, NUnit (C#) and CppUnit (C++). This kind of framework is not always ideal; xUnit style testing leads to non-idiomatic code in some languages (in C++, for example, itâ€™s rather clumsy and anachronistic; other test frameworks can work betterâ€”check out Catch as a great alternative [1]).

Some frameworks provide pretty GUIs with red and green bars to clearly indicate success or failure. That might make you happy, but Iâ€™m not a big fan. I think you shouldnâ€™t need a separate UI or a different execution step for development tests. They should ideally be baked right into your build system. The feedback should be reported instantly like any other code error.

System tests tend to use a different form of framework, where we see the use of tools like Fit [2] and Cucumber [3]. These tools attempt to define tests in a more humane, less programmatic manner, allowing non-programmers to participate in the test/specification-wring process.

No code is an island

When writing unit tests, we aim to place truly isolated units of code into the â€˜system under testâ€™. These units can be instantiated without the rest of the system being present.

A unitâ€™s interaction with the outside world is expressed through two contracts: the interface it provides, and the interfaces it expects. The unit must not depend on anything else â€“ specifically not on any shared global state or singleton objects.

Global variables and singleton objects are anathema to reliable testing. You canâ€™t easily test a unit with hidden dependencies.

The interface that a unit of code provides is simply the methods, functions, events, and properties in its API. Perhaps it also provides some kind of callback interface.

The interfaces it expects are determined by the objects it collaborates with through its API. These are the parameter types in its public methods or any messages it subscribes to. For example, an Invoice class that requires a Date parameter relies on the dateâ€™s interface.

The objects that a class collaborates with should be passed in as constructor parameters, a practice known as parameterise from above. This allows your class to eschew hard-wired internal dependencies on other code, instead having the link configured by its owner. If the collaborators are described by an interface rather than a concrete type, then we have a seam through which we can perform our tests; we have the ability to provide alternative test implementations.

This is an example of how tests tend to lead to better factored code. It forces your code to have fewer hardwired connections and internal assumptions. Itâ€™s also good practice to rely on a minimal interface that describes a specific collaboration, rather than on an entire class that may provide much more than the simple interface required.

Factoring your code to make it â€˜testableâ€™ leads to better code design.

When you test an object that relies on an external interface, you can provide a â€˜dummyâ€™ version of that interface in the test case. Terms vary in testing circles, but often these are called test doubles. There are various forms of doubles, but we most commonly use:

Dummies
Dummy objects are usually empty husks â€“ the test will not invoke them, but they exist to satisfy parameter lists.
Stubs
Stub objects are simplistic implementations of an interface, usually returning a canned answer, perhaps also recording information about the calls into it.
Mocks
Mock objects are the kings of test double land, a facility provided by a number of different mocking libraries. A mock object can be created automatically from a named interface, and then told up-front about how the SUT will use it. A SUT test operation is performed, and then you can inspect the mock object to verify the behaviour was as expected.

Different languages have different support for mocking frameworks. Itâ€™s easiest to synthesize mocks in languages with reflection.

Sensible use of mock objects can make tests simpler and clearer. But, of course, you can have too much of a good thing. Tests that are encumbered by complex use of many mock objects can become very tricky to reason about, and hard to maintain. Mock mania is another common smell of bad test code, and may highlight that the structure of the SUT is not correct.

Conclusion

Tests help us to write our code. They help us to write good code. They help maintain the quality of our code. They can drive the code design, and serve to document how to use it. But tests donâ€™t solve all problems with software development. Edsger Dijkstra said: Program testing can be used to show the presence of bugs, but never to show their absence.

No test is perfect, but the existence of tests serves to increase confidence in the code you write, and in the code you maintain. The effort you put into developer testing is a trade-off; how much effort do you want to invest in writing tests to gain confidence? Remember that your test suite is only as good as the tests you have in it. It is perfectly possible to miss an important case; you can deploy into production and still let a problem slip through. For this reason, test code should be reviewed as carefully as production code.

Nonetheless, the punchline is simple: if code is important enough to be written, it is important enough to be tested. So write development tests for your production code. Use them to drive the design of your code. Write the tests as you write the production code. And automate the running of those tests.

Shorten the feedback loop.

Testing is fundamental and important. This chapter can only really scratch the surface, encourage you to test, and prompt you to find out more about good testing techniques.

Questions

How can you best introduce test-driven development into a codebase that has never received automated testing? What kind of problems would you encounter?
Investigate behaviour-driven development. How does it differ from â€˜traditionalâ€™ TDD? What problems does it solve? Does it complement or replace TDD? Is this a direction you should move your testing in?
If you donâ€™t already, start to write unit tests for your code today. If you already use tests, pay attention to how they inform and drive your code design.

References

[1] The Catch unit test framework (available from http://github.com/philsquared/Catch).

[2] Fit: http://fit.c2.com/

[3] Cucumber: http://cukes.info

Notes:

More fields may be available via dynamicdata ..