Journal Articles
Browse in : |
All
> Journals
> CVu
> 283
(8)
All > Topics > Programming (877) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Debugging – What Has Changed in the Last Decade?
Author: Martin Moene
Date: 03 July 2016 20:57:24 +01:00 or Sun, 03 July 2016 20:57:24 +01:00
Summary: Neil Horlock travels through time in search of bugs.
Body:
Ten years ago I read an interesting article in CVu entitled ‘A review of debugging tools’. It introduced me to the concept of reversible debugging, and to a company called Undo Software, whose founders Greg Law and Julian Smith co-authored the piece. The idea of reversible debugging was understood well enough but the tooling was not where it needed to be. But my interest had been piqued and I kept track of the technology’s progress.
Obviously, a lot has happened in the last decade when it comes to software development. Therefore I thought it would be a good time to review what has changed in debugging, particularly reversible debugging, over that time, and how it relates to people developing and deploying large systems in the real world.
To start with, let’s look at what software life was like ten years ago. I was working for the same international bank as I am now, and I had similar issues to many other people in my position. The software my team developed was mission-critical and highly complex. Therefore, stability was and remains a key tenet of our production platforms and as such having good tools and strong practices is part of our bank culture. Then, standard operational practice in case of any issues was of course to review the ubiquitous Log files. These gave a glimpse of what was happening, hopefully the right glimpse, but certainly not the full picture, making it difficult to see exactly what had been happening before a crash occurred.
Again, like a lot of my peers, the sheer size and complexity of the environment made finding bugs more difficult. As an international bank we had a highly dispersed team, with development taking place in various centres around the globe and often being remote from the actual deployment of the product. Everyone would be using different machines with varying specifications, adding another challenge when trying to recreate bugs from machine to machine in order to track them down. Anyone who has debugged code will fully understand that a lot of time and resource can be spent running code that has crashed once, waiting for it to repeat itself to get a better understanding of what had caused the issue.
This is what first got me interested in reversible debugging. For those that haven’t come across it, the concept is simple. Reversible debuggers enable developers to record all program activities (every memory access, every computation, and every call to the operating system) and then rewind and replay to inspect the program state. This colossal amount of data is presented via a powerful metaphor: the ability to travel backward in time (and forward again) and inspect the program state. This makes it much simpler to pinpoint what was happening in the run up to a bug striking, and hence fix problems faster.
Fast forward to 2016
So what has changed in the last decade? The issues of complexity, time zones, and different machines for development/deployment remain, and have been joined (and exacerbated) by five others.
1 Greater requirements
Back in 2006, a European bank would be typically trading with 10–15 different markets across the continent. That figure has grown to 80 or more in the intervening decade. The roughly eight fold increase in the number of systems we have to interface with has led to a growth in the resources we have at our disposal – however, it has increased pressure on productivity and introduced a need to automate many more activities to make the best use of our resources. Everyone one will no doubt recognise the need to find ways to do much more with similar or lower headcounts and budgets.
2 Higher staff turnover
Software developers are highly prized, and competition to recruit and retain them is fierce. Inevitably this means that these days team turnover is higher than it might have been in the past. The knock-on effect of this is that a person debugging the code may be less familiar with the software itself and its place in the scheme of things. Therefore they can’t rely on innate knowledge to understand ‘how something happened’ – this is much harder without the deeper experience that comes with time spent working with particular systems.
3 Software development has become more mature
Ten years ago concepts such as agile and unit testing were buzzwords that were beginning to gain traction. They are now a core part of what we do, with all our processes built around standard practice. This is obviously a major step forward for the industry, as we now rely on more mature engineering practices. However, it also puts the spotlight even more firmly on debugging across the whole build. Software is much more complex, meaning that a series of interactions can trigger a bug. You therefore need to have a record of the complete chain of events if you want to speed up solving the issue.
4 Moore’s Law – for good and bad
Thanks to Moore’s Law, our hardware has changed out of all recognition. Whereas before our development and deployment systems ran on top spec Solaris servers with 64 CPUs, we can now get similar compute power on a $5k Xeon machine. However, we still potentially have differences between machines – with a production or user test machine being higher specification and larger scale than one used by developers. This makes recreating bugs more difficult, but on the flipside the increase in performance also makes using powerful debugging tools much more practical, as any overheads are dramatically reduced.
5 Regulatory change
The banking industry has seen significant regulatory change since 2007, so not only are we dealing with many more trading markets, but there is also a need to demonstrate compliance in new ways for each of them, leading to even more complexity. We therefore have a greater duty of care about how we carry out engineering and ensure releases are problem-free. This means we have to be able to trace all code changes as part of our compliance efforts.
Where reversible debugging helps
As I’ve said a significant part of any debugging exercise is recreating the problem in a repeatable fashion. Reversible debugging eliminates many of these needs, at least from the perspective of observing the bug, though the scenario still needs to be reproduced to adequately test the fix.
In the same way that software development has moved on in the last decade, so has reversible debugging, meaning there are many more products now in the space, each providing their own way of solving problems. These include solutions from Record and Replay (rr), Chronon and Time Machine for .NET, as well as Undo. There are also many improved ways of instrumenting live running code, from detailed tracing tools to products that can record live running processes without a debugger attached. In these cases, once the recording is saved, it can be shared with your development team for offline analysis. To illustrate this it would mean that if we found bugs when testing code in London we could simply send the recording back to the developers for them to run. No matter where they are, or what machines they have available, they can precisely reconstruct the program’s original behaviour and step backwards as well as forwards in the code to find the root cause of the bug. This overcomes one of the main issues of first generation reversible debuggers – the difficulties of recreating bugs on the often different machines used in development, QA testing and production.
Using reversible debugging helps in four main ways:
- The ability to record and rewind gives much deeper insight into what caused a crash.
- The ability to collect evidence on one machine and replay an exact copy of it on another helps overcome the different machines/configurations used in development/testing/deployment. This allows faster tracking of small issues.
- It also works well in more structured testing, ensuring larger bugs are found.
- In User Acceptance Testing (UAT) systems. These are one step away from production, so are much more complex, with multiple connections to other systems. However, we’re looking at how reversible debugging can help us in the future as the technology develops and performance further improves.
Looking to the future
While the potential of reversible debugging and the use of live recordings offers great potential productivity gains today I expect the technology to improve greatly in the coming years. Looking forward, I expect the technology to further improve in two key areas:
- Performance
Since first trying reversible debugging ten years ago, we’ve seen a better than tenfold reduction in the performance overhead. This is through a combination of development of the software and the greater processing power we now possess. If this trend continues to move significantly downwards, it opens up new possibilities in terms of the types of systems and software where we can use reversible debugging. The holy grail here would be to have recording always on.
- Multi-language
The days when organisations developed in a mandated, single language are increasingly in the past. This is particularly true as we move to a mobile-first world and developers adopt the best language for individual applications or needs. For example, currently Undo supports C++ on Linux and Android, and I expect this range to expand, driven by customer and market demand.
As I said, it is now ten years since I read that article in CVu, and came across reversible debugging and Undo. Since then I’ve watched the market change as reversible debugging has really gained momentum. There are now more entrants in the market, new ideas appearing and more importantly the innovation has been recognised by a number of other people.
For example my bank is a strong supporter of Accenture’s FinTech Innovation Lab programme, a twelve week initiative that facilitates introductions for tech companies to stakeholders in leading financial institutions. Demonstrating the importance of software quality to the world’s leading banks, Undo beat hundreds of other entrants to win the programme, giving it the chance to showcase its technology to more than 300 industry leaders. To me this shows the value of disruptive technologies such as reversible debugging in an era where code quality and security have never been more important. It endorses my original interest, and shows how banks such as mine can support and help innovation to grow and flourish.
Ten years is a long time in software development, so I won’t risk predicting what things will look like in 2026. However, it is fair to say that we’ll be developing and deploying in ever-more complex environments, with software central to the operations of all organisations, whatever sector they are in. Bugs will still be with us – and are likely to be more difficult to track down than ever. Expect more innovation from the likes of Undo to make debugging easier and to increase productivity – here’s to the next 10 years.
Notes:
More fields may be available via dynamicdata ..