Journal Articles
Browse in : |
All
> Journals
> CVu
> 153
(14)
All > Journal Columns > Professionalism (40) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Professionalism in Programming #20
Author: Administrator
Date: 06 June 2003 13:15:58 +01:00 or Fri, 06 June 2003 13:15:58 +01:00
Summary:
Software evolution or software revolution
Body:
"Change in all things is sweet." [Aristotle]
If only software grew like plants. You'd put the seed of an idea into some fertile programming soil, add a little water and keep the conditions just right. Maybe at first you'd have to do some work to tend it: put a little rod in to help the thing grow upright, and cover it to keep the birds off. In time a seedling would sprout and grow, and when the program plant was big enough you'd be able to release it to the world. If it needed a little extra functionality you'd just keep watering it, perhaps add some fertiliser, and it would continue to develop. The trunk would strengthen in order to support the new branches and the plant would keep in perfect balance. If it was growing in a direction you didn't like then a little pruning would soon set it straight.
Ah, if only it was an ideal world. Sadly it's not. Not by a long chalk.
The truth of the matter is that software is a live entity. OK, we're not quite to the point where it's sentient or organic, but it has a life of sorts. It goes from being conceived, develops steadily, then comes of age and is sent out into the big wide world to make its living, hopefully garnering respect, admiration, and ultimately fulfilment. It may continue to develop, perhaps to the point where it gains a bit of a middle age spread and no longer has the handsome looks of its youth. Over time it gets tired and old, and is eventually retired, put out to pasture in the digital knacker's yard where it can gracefully die.
That's an idealistic view of the lifetime of a piece of software, but it's reasonably accurate. We need to look at how we cultivate our programs, especially after the initial round of development is done and dusted. Unfortunately, programs require thoughtful tending and seldom receive the care and attention they really deserve. What can we do to prevent early death from a slowly spreading code cancer?
To answer this we'll work backwards. We'll take a look at the symptoms of bad code growth, explore how we grow our code, and using this determine some strategies to develop healthier software.
Bad things happen to good code. No matter how well you start off, no matter how good your intentions are, no matter how pure your design and how clean the first release's implementation, time will warp and twist your masterpiece. Never underestimate the ability of code to acquire warts and blemishes during its life.
The 'maintenance' phase[2] of software development is always the longest, and overall where most of the effort goes - even if this effort is not scrunched into the compact focused ball that the initial design/development work is. We must explode the myth that software only develops during its initial stages of life. Boehm states as much as 40-80% of total development time is spent in maintenance [Boehm].
During the initial development stages you can keep a firm grip on the code and work it around as much as you like, within the available time constraints. After it has been released you're generally more restricted. These restrictions may be practical:
-
changes have to be minimised as much as possible to reduce their impact on the carefully tested code,
-
APIs have been published, and once used by third parties have become much harder to modify, or
-
the UI is known by users and can't be changed gratuitously.
These restrictions may be psychological, the developers' (erroneous?) preconceptions:
-
it works this way, so we can't change it like that,
-
it's too much work to revise the architecture at this late stage in the game, or
-
it's not worth making this modification 'properly' now, the product won't be around for very much longer (as if).
simple lack of understanding - another programmer may not understand original author's mental model of the code, and this prevents them making the most appropriate modifications.
Despite all this, after a release code is never expected to stand still. No matter how well it was tested there will always be odd faults that crop up which will need fixing. Customers demand that new features are added. Requirements change under the development team's feet. Assumptions that were made during development prove to be incorrect in the Real World, and require adjustments.
The bottom line is that more code gets written after you think the project is completed. Where the fine line lies between 'maintaining an existing product' and working on new development of the 'next version' is a moot point. Whatever you call it, the original code base gets modified. Sometimes by the original author, often not.
And this is where the rot sets in. In fact, it's a damned-if-you-dodamned- if-you-don't scenario. If you never again touched the code, if you don't keep the program up to date with fixes and modifications, the program will degrade. In the worse case it will stop working as the platform it runs on changes, or the assumptions made become out of date. The 'Y2K' bug is a glorious example of this. Maybe the program will just putrefy as competing solutions develop more features, and gain more popularity. Untouched code slowly rots away.
However, if you do make extensions and fixes, the code doesn't only grow. It also rots. Fixing a fault usually sees the programmer introduce more faults as a side effect. Brooks found that as many a 40% of fixes introduced new faults [Brookes]. The Programmer's Drinking Song (sung to the tune of '100 Bottles of Beer') written by a minstrel unknown, sums this up neatly:
99 little bugs in the code,
99 bugs in the code,
Fix one bug, compile it again,
101 little bugs in the code.
(Repeat until BUGS == 0)
It's not only newly inserted faults that can rot our code. Even functionally working modifications can cause code blight. Quick and dirty fixes pile on top of one another, putting nail after nail into the original neat design's coffin. The more you maintain carelessly and degrade the structure of code, the harder future maintenance gets. The plant analogy above is a good one. Many, many quick modifications don't make the trunk grow any stronger. The more heavy branches you add to the top of the system, the less stable the entire code base is. Eventually it totters over.
Does all this sound unduly pessimistic? Surely code won't 'rot' if you're careful? Adequate care does not seem to be taken in today's software industry. It's a culture thing. Programs have a habit of hanging around much longer than they were ever intended to. So many quick hacks live on, well past their expected life.
We should be on the look out for rotten code. Beware of the telltale signs of bad code, whether you're writing it now or stumbling across someone else's mess. Rot sets in with any change that leads to a lack of clarity, or makes the system more complex. Unnecessary complexity comes in many guises. Here are some of the flashing red lights and klaxon calls:
-
Many separate bits of code crop up to do the same thing (e.g. to convert string formats, or display warning messages).
-
It's no longer clear where to go to find a certain bit of functionality.
-
A piece of data keeps getting converted between different type representations as it works its way through the system (e.g., display data is transferred between std::string, char*, unicode, UTF-8 and back again)
-
APIs get 'blurred'; once neat interfaces are now too broad in scope, with new features being thoughtlessly added.
-
Bits of private API leak out to be public, to allow other quick hacks to work. Private implementation member variables get exposed.
-
New features are added with no documentation.
-
There are functions with hulking great parameter lists.
-
A function requires many parameters that it never uses, that just get passed through to a subordinate function.
-
APIs change rapidly between code revisions.
-
There are many big complex classes, or long functions.
-
You find code that's too scary to even think about improving.
-
Function names are misleading.
-
The code is littered with incredibly complex functions, with many nested loops and special-case handlers.
-
Complex module interconnections and dependencies mean that a small change in one place ripples out across the entire code. Think about the surface of a liquid; how viscous is your code? Does a small change in one secluded tributary disrupt something on the other side of the lake.
-
The code compiles 'noisily', with many warnings generated.
-
The code is littered with workarounds, fixes for symptoms not causes, which hide the real problem. The edges of the system get cluttered up with these rather than fixing the fault at the centre.
Many of these forms of rot are particularly visible in the code, and can be seen with a quick inspection or even the use of certain tools. However, there is a class of more subtle 'invisible' degradations that usually occur at a higher level than the syntactic gunk. Modifications that fudge the original code architecture or subtly work around normal program conventions are much harder to spot until you're immersed deeply in the system.
Code rot often sets in more voraciously when the original author of a piece of code leaves the company or project. Although code ownership is not necessarily a good thing, and is seldom written in stone in a company culture, the creator often takes a pride in their work and does housekeeping on their source files, even when other people make modifications. Once they've gone, this maintenance role slips away, and their files begin to rot more quickly.
Why do we make such a big mess of code? The answer is simple: complexity. A program is a huge collection of information organised on several levels: the overall architecture, its design in a particular language, the interface mechanisms to the outside world, the actual implementation of each little bit of code, and so on. That's a lot to understand before you start working with a chunk of code. There is seldom enough time to work out how a few lines actually work, let alone how they fit into the overall picture. We haven't yet learnt to manage this vast complexity.
No code development really follows the classic model of: lock down all requirements, design completely, code completely, integrate, test, release. Unexpected modifications happen to an existing code base. New pieces get grafted in somehow. It's an incremental development cycle towards ever shifting goalposts.
In reality code growth happens by one of the following mechanisms, loosely ranked in order of disgust.
-
Luck
The most frightening way to make code, and not as uncommon as it should be. Code that grows by luck never had any design. It is modified without thinking. Its structure is down to happenstance, and the fact it works is attributable to miracle.
Even if code was originally designed carefully, a lot of maintenance modifications can follow this happy-go-lucky approach. Hit-and-hope fixes may not be real solutions, they may just mask the immediate problem and make the real fix harder later on.
-
Accretion
We need a new feature added. Doing it properly would probably involve ripping up the interfaces between a few key modules and revising a lot of code. There's no time to do all this, and it would probably be too complicated for us, anyway. We'll just graft on another clump of code. It'll hang off one of the existing modules, well, perhaps a few of them, and use its own protocol to talk to them, rather than any existing mechanism. We'll have developed something demonstrable really quickly.
Forget the fact that it's a monstrous kludge. Never mind that the performance will be awful. It's not important that the modules no longer have any clear roles and responsibilities. The system won't have a neat design any more, in fact it will look a lot more philosophical, and maintaining this in the future will be a nightmare. But we'll get this version out quickly, and we don't have any time to do the work properly now, anyway.
Maybe later on we'll come back and do it properly…
-
Rewrite
What happens when you recognise that a bit of code is truly awful, not easily understood, or fragile and can't be extended? It needs a rewrite. If this is done based on what was learnt the first time around, it's usually much quicker to complete and of a much higher quality. However, rewrites rarely get done.
Rewrites get riskier as you attack more at once. Rewriting a whole product is a different thing from rewriting a troublesome function or class.
Good modularity and separation of concerns should mean you don't have to rewrite a whole system, just a module, keeping the original interface. If the interface sucks, or the reason you need a rewrite is that the system isn't modular enough anyway, then it's a different story.
-
Refactor
A formalised cousin of rewrite. If your code is mostly OK but bits of it need some work you can refactor these unpleasant parts. Refactoring is a mechanism of making changes to a body of code in order to improve its internal structure, without changing its external behaviour. It improves the design so you can change it more easily in the future. It's never about performance improvement, just design enhancement. Not as drastic as a complete rewrite, refactoring is a series of gentle massages of what you already have.
In many ways this is a fancy name for particular kinds of improvement. Martin Fowler has formalised it and described a systematic improvement process documenting a number of small, understandable steps [Fowler].
-
Design for growth
If you have some understanding of the ways your code will expand in the future, say some features have been deferred until the next release, you can carefully design the system so it's easy to make these future additions. Most of the time this doesn't make the job much harder.
Even if you don't know the set of future additions, careful design can factor in an amount of room for growth. A good extensible system needs clearly defined interfaces and hinge points for new functionality to be plugged in. Be careful that this isn't an exercise in chasing the wind[3] though, trying to guess the future when you don't have a clue how the system will expand. Any extra design features come at the cost of complexity. If you correctly guess where this complexity is needed you win, if you guess incorrectly you'll make an unnecessarily complex system. This is the danger of over-design, and it's especially likely when design occurs by committee.
There is a school of thought, in Extreme Programming for example, that insists on the absolute simplest design that can possibly work in any given situation. This is (or sometimes just appears to be - jad) at odds with the design for growth mentality. Exactly how much design for growth you should employ can be a hard balance to strike.
The problem with writing code is that doing it well takes a long time and a lot of effort. You may make false starts down wrong design alleyways, and encounter flaws in the original plan that need piloting around, whilst putting up with huge product redefinitions en route. There is never enough time to accommodate all this, so we try to shoehorn as much as we can into the limited time available. Something has to give, and it's usually the purity of the code.
So code is shaped by design, yes, but also by the organisation that built it and its life history. A case in point: We have some particularly baroque user interface code. It works (mostly), but is pretty much unfathomable unless you devote a significant portion of your life to meditation at the temple of complex code. It's just an intense lump of intertwining logic with no discernible architecture. And it's like that for a reason.
The code was initially created as a simple one-off television UI for a single customer, and was only ever specified to be a minimal closed system. It used the simplest communication protocol possible with the main system, and was as lightweight as it could be. However, it was then sold as a feature to a second customer, who wanted it to look different. A skin feature was hacked on. Then it was sold to another customer in a different country. Internationalisation was hacked on. Then it was sold to another customer, who wanted some new UI facilities, so these were hacked in. This story continued. For a long time. Today this simple UI component is unrecognisable from its former self. It's pretty much unmaintainable; each addition has been a quick hack since the whole thing has always needed rewriting.
If the initial design had incorporated all these features the code would still be lean and logical. However, it would have been too much work upfront and the company probably would never have started the project. Pity the poor programmers who work with this.
Perhaps the reason we see so much bad code and so many quick hacks is the mistaken belief that it takes longer to do the job properly. When you factor in the time spent debugging, and the ease of making more modifications later on this proves to be a wrong assumption. You may be able to close a single fault report quickly by hacking out a fix, but it's not a complete solution. As professional programmers, we need to be aware of this, and take responsibility for what we do to code.
In the corporate world, there is often a management expectation of quick fixes. It's reasonably easy to show a manager that a five tonne block of concrete stuck on top of a flimsily erected flagpole won't stay up for very long. It's harder to make them stand underneath the thing. And it's much harder to get the same message across when we're talking about software. They just don't get it. As far as most managers are concerned programmers are magicians. They practice dark mystical arts and have limitless powers. Managers just tell them what to do and when to do it by, and it will happen, however many all-nighters they need to pull.
And, being gifted and dedicated, sometimes we come through on this assumption. Doing so can actually make matters worse as management will now expect that this tactic will always work, and that it's our fault when it doesn't. Sadly, there comes a time when that kind of hacked up software just cannot be made to expand any more, when it really just wants to keel over and find its final resting place in a quiet dark corner somewhere. Management will not be happy[4].
Code growth is easier if the culture of a company is to develop software in small iterative steps. This way evolution is almost built into the design strategy, and rewriting some code parts to accommodate change is implied. The alternative, when you have to attack a monolithic code edifice with a small pickaxe in twenty seconds flat, is nowhere near as reasonable.
Now that we've identified some of the problems of working with an evolving code base, how do we manage the mess? What strategies can we adopt to avoid some of this?
The first and most important thing is to have recognised the problem. Too many programmers hack away without thinking about what they're doing to the quality of their code. As long as they silence the users' screams in the shortest time possible they don't care what state they leave the code in. Someone else can deal with it next time.
Before we think about working with existing code, there are a few considerations for the creation of new code that will greatly aid later maintenance. Extension and malleability need to be designed in, but as we've seen, not at the expense of complexity. Modern component/object based paradigms promise greater reuse and extensibility. They do give us clear interface points between code modules. However, if the interfaces don't support later extensions then something has to budge. Think very carefully about your system interfaces as you create them. It's hard to design for change, so don't necessarily try to support kitchen-sink functionality.
Simple things like writing neat, clear code that can easily be understood and worked with, accompanied by good documentation and well-defined and clearly documented APIs are key. Consider using literate programming tools to document interfaces. This increases the chance that the documentation will be updated when the programmatic interface evolves.
Modularity and information hiding are the cornerstones of modern software engineering. Try to isolate any likely changes to a small part of the system, making your system more viscous and therefore stable under change.
Consider the interconnection of modules, and try not to make every module link to every other module. Inter-module dependencies can take several forms: making function calls, getting notifications, using header files, opening network connections, and so on. It's advisable to avoid having one central module that every other module depends on, since a single change there will affect every other module in the system.
KISS: Keep It Simple, Stupid. Don't over-complicate, don't overengineer. Optimise an algorithm only when you know that there are performance issues, not just because you think you know a good way to make it run faster. Simplicity is nearly always more desirable than performance, and it certainly makes later maintenance easier.
There is a difference between maintaining good code and maintaining bad code. With the former, you must carefully preserve the integrity of the design and ensure you don't introduce anything that is out of place with the system as a whole. With the latter you have to do your best not to make the mess any worse, and if at all possible try to improve things on your way through. If you can't get as far as a rewrite of the offending sections, a little refactoring can go a long way.
Before you even begin to touch any code, a couple of organisational issues should be considered:
-
Prioritise any changes that are needed. Balance importance against complexity of work and decide what should get done first. What early changes will impact later work?
-
Only change what's necessary. If it ain't broke, don't fix it. Don't gratuitously 'improve' bits of code because you think they need it - only make the changes that are really required.
-
Keep an eye on how many modifications are being made at once. Certainly making several parallel modifications yourself is either incredibly clever or completely foolish, most likely the latter. Do one thing at a time. Carefully. If several people are working on the code at once, be aware of what's changing. There is a danger of too many separate hacks causing odd conflicts. Methodical change by a single developer gives visibility of where the code is being stretched, and where most care is needed. Several simultaneous modifications might make the code pull thin without anyone understanding or noticing.
-
Just as the initial code should be reviewed during its development, subsequent changes should also be reviewed. Organise formal reviews, and try to include the original code reviewers. It's very easy to introduce subtle new bugs with small code extensions; reviews will prevent many of these kinds of error.
As well as guarding against the warning signs we saw in a previous section, here are some practical suggestions to help you when working with existing code:
-
When you come to make modifications, quickly inspect the code to get a feel for its quality. This is surprisingly easy to do, and you can rapidly get a feel for how easy the code is going to be to work with. Collate all the documentation so you know what's available, then start to digest it. You may find it helpful to use tools to visualise the code. A picture conveys a thousand words, and perhaps several thousands of code statements. Use metrics to gauge the quality of the code. This will make you wary of the places that hidden gotchas could be lurking.
-
If you are fixing a fault, do you really understand the cause? If you can write test code to trigger it then this will prove that you have made the fix and help you verify it in all conditions. Once you have made a successful fix, look around the related parts of the system for similar faults. This overlooked step can make a big difference. Many problems hang around in packs, and it's much easier to defeat them in one crushing blow than slowly chip away at them as they each manifest themselves.
-
When maintaining any code, retain the programming style of the source files you are working with, even it it's not your style or the house style. A file with code in several formats is confusing and hard to work with. Apply tidy ups as you go if they're not too gratuitous, but be aware that source code diffs across versions will be much harder if you do so.
-
Before you modify a particular file or module of code there are a few key things to understand: where the code sits in the whole system, what interdependencies it has (so you know what other components might be affected by a change), what assumptions were made when the code was created (hopefully documented in the various related specifications), and the history of modifications that have already been made.
-
Use tests to check you've not broken anything. Exhaustive regression testing is the only real way to be secure about the changes you've made. This is a key point, and often carelessly overlooked. Ensure you have an adequate test suite, and run it regularly.
-
Adopt the correct attitude. Avoid that 'just one more hack' mentality. Don't dismiss code, thinking that in the future it will be thrown away or rewritten. I've been there. It won't. No, don't argue: it won't.
-
Learn when you need to do some redesign work. Don't be afraid to redo something if necessary. For 'legacy' code this may be considered uneconomical. Sadly, it's legacy code that makes cash and is unlikely to be phased out.
-
Try not to introduce extra dependencies with newly added code. An increase in coupling makes code more complicated and harder to change.
-
If you make a wrong change, back it out quickly. Don't litter code with unnecessary dead wood.
As professional programmers, we should naturally shy away from the pressure to do a quick bodge job. Sadly we don't work in ivory towers, and sometimes compromise is required; it's not always a commercial reality to complete a task in the theologically 'correct' way. It's unprofessional to flatly refuse to extend code in a distasteful manner. My experience with the TV UI is a good example of why. This explains why so much code is brittle, flaky and dangerous. But it also explains why there's any code out there at all. If there wasn't the commercial drive to get it shipped, programmers would spend forever tweaking code to get it just right, writing and rewriting, by which time the company would have totally collapsed around them.
I'm not sure whether I agree with Aristotle. Change can be a right pain in the rear end. We should manage code changes carefully. That way we can evolve our good programs into something greater, rather than degrade them into an unstable mess.
Perhaps not understanding how to maintain software well and expand it correctly is one of the reasons software development is not yet a true engineering discipline like mechanical or structural engineering.
[Boehm] Barry Boehm. "Software Engineering." In: IEE Transactions on Computers, Volume C-25, No 12. Pages: 1226-1241. Available from: http://www.computer.org/tc/
[Brookes] Frederick P. Brookes. The Mythical Man Month, Anniversary Edition. Addison-Wesley, 1995. ISBN: 0201835959.
[1] Well, I've seen a lot of revolting code in my time...
[2] That is, work done after initial delivery which isn't considered a major new release.
[3] Ecclesiastes 2:11
[4] Of course this is a gross generalisation, but not too inaccurate. Many managers used to be programmers themselves, and know the tensions they must balance. A good manager should listen to the programmers' objections. A good programmer will make their boss listen. Too often, neither happens. Software suffers..
Notes:
More fields may be available via dynamicdata ..