Title: Writing Maintainable Code

Author:

Date: 01 April 2004 22:53:48 +01:00 or Thu, 01 April 2004 22:53:48 +01:00

Summary:

Recently, I've been thinking hard about what makes code maintainable, and how to write code to be maintainable. This interest has partly been driven by the mentoring of those starting out in C++ that I've been doing, both through the ACCU mentored developers program, and for work.

The principles I've identified have not really been hidden; since they've been widely documented for years, and they're actually things that most good developers do as a matter of course. However, as with many things, you don't necessarily realize their benefits until you rediscover them yourself.

Body:

Long Functions

Some of the things that have shown themselves to be useful are relatively obvious when you think about it, but very easy to not do. For example, long functions can easily make code very hard to understand, especially if they end up many times the number of lines visible on one page in your editor. In fact, at modern screen resolutions a function that fills one page is probably too long (there are 70 lines visible at once in my editor at the screen resolution I use, for example). However, it is very easy to write long functions, especially when maintaining old code - you need to add some extra processing here, but you don't want to disturb the structure of the code too much, so you just write it as part of the same function. Alternatively, you start off with a switch statement that has a few case labels, each of which does something small, and over time it grows into a huge monolithic monster with lots of case labels, each of which has a page of code attached.

Long functions often arise when a function has too much responsibility - it is trying to do more than one thing. If this is the case, then it will probably be relatively easy to split it into several smaller functions: do this, then that, and possibly that as well. Taking the monolithic switch statement as an example, it would probably be simpler to understand if each case just invoked an appropriate doXXX function which contained the attached page of code. It might even be improved by replacing it with a table-driven approach.

Multiple Responsibilities

It is not just functions that can have too much responsibility; the same applies to classes and modules as well. It is a common scenario for the same class to be responsible for processing data, displaying it to the user, and handling user input events, but such a mix can easily lead to spaghetti code where it is hard to see where the boundaries lie. This is what is referred to when people talk about the cohesion of the code - code with high cohesion has a clearly defined responsibility, and it is clear how it manages that responsibility. Having multiple responsibilities can also affect exception safety - it is very hard to write exception-safe code that manages two resources, unless you divide up the responsibility and have two objects (maybe member sub-objects of the original), each of which manages one resource. This is the idea behind the adage "Do one thing well", the side effect of which is that the code becomes more reusable - there is more chance that you will need code that does precisely the same thing elsewhere (whether in the same application, or in another application) if the thing that it does is small and well-defined.

Just as an example, I recently found that managing Windows Combo Box controls was so much easier once I had written a few small functions to populate the boxes and retrieve their data. All these functions did was wrap the two or three API calls required in each case, but they greatly simplified the code that uses them, and they are used everywhere in my applications, wherever Combo Box controls are used. (Aside: the main application where I made this change was using MFC; given that MFC is such a huge framework, it surprises me that these things are still so complicated)

Don't Repeat Yourself

This example also illustrates another principle - duplicate code is bad. Often quoted as "Don't repeat yourself" (DRY), or "Once and only once" (OAOO), the idea here is that having the same code in more than one place is just asking for problems - if you make a mistake, and the code needs fixing, then you have to remember to fix it everywhere. Assuming you duplicated it correctly in the first place, that is. If instead you create a class or function that does what you want, and use it everywhere it is needed, then the benefits are twofold - not only is there only one place to fix the code if there are any problems, but you've increased the expressiveness of the places that use it (assuming you've chosen a sensible name for your class or function). Rather than each occurrence of the code being complicated by the actual technical details of the duplicate code, there is instead a function call or a class object that documents the intent of what is to be achieved. Oh, and it probably makes the client code have shorter functions, too.

A consequence of removing duplication ruthlessly is that you end up with less code overall, and that that is there is clearer, so the whole codebase is easier to understand.

Many refactorings (see later) assist with duplication removal - by Extracting a Method you can then call it from other places that do similar things, for example.

A Rose by any other name...

Once you've broken down your classes and functions into lots of small classes and functions, that Do One Thing Well, it is important to give them good names. Names are a form of documentation that is part of the code. They are better than comments, but they can be just as helpful in aiding the understanding of code as a well written comment or API document. Good naming might include providing named constants for intermediate steps in a complex expression, rather than just relying on temporaries.

A rose by any other name might smell as sweet, but you don't want to have to smell it to find out.

Be Assertive

Another really helpful tool is the use of assertions. Filling your code with assertions serves double duty:

If the assertion is violated, you've just found a bug.

The assertion provides additional documentation to the maintainer. e.g.

void foo(int i) {
  ASSERT(i>0);
  ASSERT(i<maxVal);
  bar(i);
  baz(someArray[i-1]);
}

Here, the assertions tell the maintainer about the allowed range of values for i, thus enabling him to reason better about the code, and verify that the access to someArray won't go out of bounds, for example.

The asserts shown in the example, are using asserts to define a contract. This is one of the tools used by the Design By Contract technique of defining pre-conditions and post-conditions for every function, which are then rigidly enforced. Programming languages such as Eiffel include support for DBC as part of the language, but other languages rely on assertions.

Improving existing code

Code isn't always in the form you might desire, even if you intended it to be, and it started out that way; requirements changes, bug fixes and new features often require modifications that aren't entirely within the spirit of the existing design, so the code starts to get ugly. Left unchecked, code can get very ugly indeed, so how can we get it back under control? The answer is, of course, Refactoring - changing the code in ways that improve the structure and maintainability of the code without changing its behaviour.

Ideally, you do refactoring in small steps, as you develop the code, with a comprehensive suite of automatic tests to ensure you don't break anything. However, you can refactor any code that's less than ideal, even if it is old and large and very ugly, with no automatic tests; you just have to do it in small steps.

Firstly, it is very important that you don't add new functionality whilst refactoring, since this just leads to confusion. Refactor the code first to make the new functionality easy to add, then refactor afterwards to tidy up, but don't refactor at the same time.

Secondly, refactor in small steps; take a long function and break it down into smaller functions one at a time for example. If you have automatic tests, run them after every change to ensure you haven't broken anything. If you don't have automatic tests, consider adding them, and then refactoring; you definitely want to be extra careful in this scenario, since it is not immediately obvious if you've broken something - was the function you just renamed virtual? Did you rename the function in the base class or derived classes as appropriate?

There are many resources available on refactoring, not least Martin Fowler's book.

Testing, Testing, 1, 2, 3

As mentioned above, refactoring is easier if you've got automatic tests, and most code needs refactoring as new features are added, or bugs are fixed, in order to keep it tidy.

One way to ensure you have a comprehensive suite of automatic tests is to write the tests first. Don't write a line of production code until you have a test to verify it does what is expected. Indeed, that is how you identify what is expected - you think of something the code should do, and write a test to show how it should happen from the client point of view, then you write the code that does it. This often means you are writing tests for non-existent classes and functions, because you haven't written them yet.

Such a technique is called Test-First Development, or Test-Driven Development, and is often advocated by agile development methodologies; it is one of the core practices of eXtreme Programming (XP), for example. This also involves step-wise design - your design evolves as you think through what is required for each test, rather than designing the whole architecture up front. If your test requires functions or classes that aren't there, or access to data that isn't directly available, then you add the missing features to make the test pass, and then refactor the code so it's tidier. You should never have more than one failing test at a time unless you're feeling really brave.

There are two key benefits to this technique: you have a comprehensive test suite for your code, so the chances of unwittingly adding a bug when you make a modification are small, and as a consequence of this, you gain more confidence when refactoring so it is easier to keep the code tidy, well factored and maintainable.

Of course, writing tests first isn't the only way to get a comprehensive automatic test suite, but it is certainly the easiest way I've found - it doesn't actually add that much time to the development process, and you're less likely to forget. It also ensures a high degree of coverage, if you don't write any code not required by the test, which is hard to achieve if you write tests afterwards - not least because the code has been written with testing in mind, so is inherently easier to test. Anyone who has tried to add tests to legacy code knows how difficult it can be, due to the interdependencies between things.

If you write the tests first, it is also hard to skimp on tests when the deadline looms and the pressure is on, since you already have them written. If I had a penny every time a project had its testing time cut or dropped altogether due to schedule pressure, I would be a rich man. If you're writing tests after the code, it is hard to have the discipline to ensure they are all written with sufficient coverage. Also, if you are writing tests for code that you've already written, and which you feel "works", then it feels more like drudgery than writing the tests first.

Refactoring in small steps and testing often has an additional advantage: the code is always in a releasable state, since you are never more than a few changes away from a version that passes all the tests. It might be that the code is half one design and half another, as you are midway through a large refactoring (one small step at a time), but because all the tests pass you know the application works as far as it goes, so you can release it easily. If you make large changes before testing, or allow multiple tests to be broken for a while, then you either have a lot of work to finish, or a lot of work to undo before you can get to a releasable state.

Keep it Simple, Stupid!

Simple things are easier to maintain. That means using simple algorithms and facilities until it is proven that a more complex one is necessary - don't use a relational database, when a simple text configuration file will suffice. It also means using the appropriate abstractions, and making good use of the available library facilities. Anyone who writes a sorting algorithm when using C++ had better have a jolly good reason not to use std::sort or std::stable_sort, for example.

Actually, writing simple code can be quite difficult; it requires a good domain knowledge to be able to choose appropriate abstractions, and a willingness to refactor continuously as your knowledge improves - both knowledge of the domain in general and of the customer's requirements in particular. It also requires discipline - it is very easy to over-engineer, with the idea "I'll need this later". A key idea that comes from the Agile methodologies is "You Aren't Gonna Need It" (YAGNI) - only add features and infrastructure if you actually need them for what you're currently doing, rather than in anticipation of future requirements. The chances are that when the future requirements come, if they come at all, they are sufficiently different from what you anticipated that the infrastructure you were going to build would have been insufficient, or addressed the wrong areas.

Conclusion

Writing maintainable code is not hard, it just requires careful thought. The key recommendation that I have is to Keep it Simple - most of the others flow from there, such as short functions, single responsibilities, well chosen names and doing things Once and Only Once. Refactoring is the means to keep things simple in the light of new requirements, and asserts and automated tests give you confidence that refactoring isn't going to break anything, whilst documenting how the system is to be used and constraints put on it.

References and Further Reading

[Fowler] Martin Fowler, Refactoring: Improving the design of existing code, Addison Wesley.

[Astels] Dave Astels, Test Driven Development: A Practical Guide, Prentice Hall PTR.

[YAGNI] http://www.c2.com/cgi/wiki?YouArentGonnaNeedIt

Notes:

More fields may be available via dynamicdata ..

Journal Articles