Title: Lies, Damn Lies, and Statistics

Author:

Date: 10 October 2010 14:00:00 +01:00 or Sun, 10 October 2010 14:00:00 +01:00

Summary: Making a good decision is vital. Ric Parkin looks at what information we use.

Body:

Our boss has been asking that really horrible question: 'When will the new release be out?' This is a non-trivial thing to answer at the best of times, so I started to think about what we knew to work out what the influences were. And it's really not encouraging.

I've worked through many release cycles and, while they all have many differences, certain things recur often enough to suggest there are some lessons we can learn and keep an eye out for.

The first is that most estimates are almost certainly wrong. This is not just me being cynical - there are many reasons for errors to be made (and usually in the undesired direction of being unduly optimistic). An obvious one is that estimates tend to be made at the start of a project, as this information is needed to decide whether to approve a project, estimate its likely budget, and plan for coordinated activities such as marketing efforts. Furthermore, software development is in many ways a learning activity - collecting requirements, and finding out how to turn them into a working system - and so by definition you don't know what you'll actually find out, and so cannot reliably estimate how long it will take! Taking previous experiences into account can help a lot, but unless your projects are similar to ones you've done before, the uncertainties remain high. You could ask the 'experts', but there can be conflicts of interest (eg they're the ones proposing a costly development so will tend to be overly optimistic), and senior developers are often promoted into management roles leading them to have less knowledge of how a system actually works in practice than a programmer working on it every day. So is there a better way of estimating taking into account as much information as possible?

I recently got around to reading the whole of The Wisdom Of Crowds [Surowiecki], which suggested to me an interesting approach to this sort of complex estimation. Instead of asking an expert for an opinion, ask everybody and aggregate the answers. The reasoning behind this is that no one can know everything, but everyone will know something about the system. By combining the guesses, say by averaging the project time estimates, the idea is that you'll capture what people know, but the errors will tend to cancel out (this is due to the Central Limit Theorem [CLT], with the caveats about under which circumstances it holds). Note that this is by no means an excuse to avoid taking responsibility for the final estimate! But I think it could be a useful exercise to find out what the group as a whole expects, and not just the 'experts'. You have to ask a wide group of people because, in order for this to work, you need a diverse and independent group to sample their opinion - just asking one subset fails to capture other opinions, and people must be asked privately to avoid the danger of people being influenced by the others. In particular, it should include a wide range of people who wouldn't normally be involved in estimation exercises, such as testers, technical writers doing documentation, and the groups who polish off the product and make it ready for release. This is because it's often these people slogging through the bug list and getting reports from users who really understand how much effort goes into those final stages, often more than the architects and developers. This is often because while such developers might design and write the framework and the bulk of the functionality they will often have moved on to design and write the next project, leaving others to finish off the release even though such efforts can take a similar amount of time again.

I'm reminded of an old project where I was involved in maintaining and releasing several versions of a product while the next major rework was being designed and developed. Having had to evolve the previous code to reflect what was actually required, I had a good gut feel of how much complexity was actually present and how much effort was needed to take a functionally complete project to production quality. My estimate of how long it would take was three times longer than the value which had been used to okay the project. I take no pleasure to note that I turned out to be optimistic. This isn't even an isolated example - it's happened severely to projects taking tens of man years at least twice in my personal experience, and to some extent on most projects. While a large company can probably take the hit, for small start-ups it can be devastating to their balance sheets and customer trust.

Another problem is that quite often the cost of the later stages depend on how well things have gone before, what utilities people have put together, and how stable some key code is. This can't be known up front and project plans and estimates should take this into account by being more vague and conservative the further in the future things are. This implies that detailed planning and estimates happen on a rolling basis, and shorter release cycles are encouraged. In other words big over-detailed plans are discouraged, and short agile cycles are the norm which allow plans to respond to circumstances in a much more flexible fashion.

Once the initial estimate has been made and the project starts, it is important to keep track of progress and update the estimated completion date. The usual use for this is to spot potential problems causing overruns as early as possible, but it also gives a sense of progress to the team which can be a vital tool in keeping morale up and momentum going. An interesting question is how detailed should you measure the progress. I often think that people tend to be unrealistically precise, such as estimating individual tasks and measuring progress in terms of hours, often encouraged by the project planning tools. I've had good results by only breaking tasks down to the granularity of a day, or even a week, and only measure in terms of Not Done/In Progress/Done, as that allows for some flexibility and doesn't over-burden people with tiresome paperwork and endless Gantt chart updating. Another good trick was to get three estimates instead of one - as well as the normal 'How long will this task take', you also ask for a best and worst case estimate. By taking a weighted average (with ratios such as 1:4:1, which roughly follows a normal distribution) you get the expected time that tends to be a bit longer than the most likely estimate. This is because if things go well they'll be a bit quicker, but if they go badly they take a lot longer. For example, I might reckon it'll probably take 10 days, but could be as short as 8, or as long as 18. My expected time is (8 + 4x10 + 18)/6 which is 11 days. That means that if you use your original estimate only, every two weeks you will most likely fall behind by a day. Another thing this exercise gives you is a clue of how well understood the development is - very large estimates, or a wide spread are a sign that there is a lot of unknown risk, and that area should be investigated some more. And returning to the wisdom of crowds, perhaps you should get several people to estimate and combine.

After a while you've slogged through the feature list and the project is complete. Not so fast! It still has to be polished off ready for release. This is the point where I start to look at graphs of bug numbers. Hopefully you'll have already been using a bug tracking system to capture found defects and schedule them to be fixed (yet another source of project slippage - people say a task is complete and so it is closed. But bugs are sure to be found, and fixing them now takes place during time that was expected to be used for some other task, so making that late). During the main development I try not to worry too much about the total number and its fluctuations, as they tend to be dominated by one-off factors - for example, to start with not many people will be actively using a fast changing early system so only major bugs are reported. When functionality has settled, more people test it in detail and start reporting smaller UI glitches, and your bug count will go through the roof. But eventually you'll run out of new areas to report on and things will stabilise, and then it's worth getting your tools to investigate the trends. Recently I've been using the absolute number, and the number open that are assigned to the current sprint. I've thought about using a weighted total according to how difficult the bugs are thought to be (or using the Story Points used in agile planning), but I worry that the overhead of keeping such information correct could make the results unreliable. I'd be interested to hear if anyone does something like this though.

Then you have to interpret the graphs. This is going to depend a lot on your local situation as everyone has different patterns of bug reporting and speed of fixing issues. But I think there are some common things to look out for. Ideally your sprint graphs should look nicely triangular with a fairly steady slope down to zero at the target. If you repeatedly miss then it could be a sign to adjust how much to put into each sprint. The total is a bit trickier. I've found that to start with it will lurch up as people suddenly test a new area with lots of bugs, and down when people fix a lot of simple small tweaks. Apart from this chaotic churn overall numbers tend not to change very much, as test/fix resources are applied or eased off accordingly. Then what you want to look for is The Corner, where the lurching has died down and a solid downward trend appears. What's happened is that despite a continued test effort it's proving hard to find anything new, and yet the bug fix rate has continued. This is good news. Once that trend has settled in, you can look at the slope and estimate when it will reach zero (although remember that the last few bugs will tend to be the difficult ones, so the slope will level off a bit at the bottom until they're fixed or you decide they are not release-stopping bugs). Congratulations! You can now say with some confidence what the release date will be. Unfortunately it will be in the near future so your oracular powers of prediction won't be as appreciated as highly as you'd like.

And finally, I found an interesting take on an old problem - what does being 'professional' mean, and should there be a formal body to enforce standards? In computing there are certain bodies who have been given a charter to grant such a professional status, but many companies don't insist on it, trusting on people to be professional in their dealings rather than being part of a formal professional body. Well, it turns out that Canadian engineers have been doing both. There is the usual professional body that grants Professional Engineer status, but individuals can also go through The Ritual Of The Calling Of An Engineer [Ritual ] (created by Rudyard Kipling no less), where they are presented with an Iron Ring to wear on the little finger [Ring] to remind them of the responsibility and humility of their professional dealings. This ritual is a more recent version of ones such as the Hippocratic Oath [Hippocrates] which are intended to impress the serious nature of the calling, and to establish a basis for ethical standards. I thought it was a wonderful idea, very much in keeping with what we as a industry ought to aspire to.

[CLT] http://en.wikipedia.org/wiki/Central_limit_theorem

[Hippocrates] http://en.wikipedia.org/wiki/Hippocratic_Oath

[Ring] http://en.wikipedia.org/wiki/Iron_Ring

[Ritual] http://www.ironring.ca/

[Surowiecki] ISBN 0385721706 http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds

Notes:

More fields may be available via dynamicdata ..