Journal Articles
Browse in : |
All
> Journals
> Overload
> 92
(7)
All > Topics > Programming (877) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Code Rot
Author: webeditor
Date: 09 August 2009 09:57:00 +01:00 or Sun, 09 August 2009 09:57:00 +01:00
Summary: Maintaining code is vital to keep it working. Tom Guest explores what happens when you neglect it.
Body:
Those of us who have to tiptoe around non-standard or ancient compilers will know that template template parameters are off limits.
- Hubert Matthews [Matthews03]
Dvbcodec fail
Long ago, way back in 2004, I wrote an article for Overload [Guest04] describing how to use the Boost Spirit [Spirit] parser framework to generate C++ code which could convert structured binary data to text. I went on to republish this article on my website, where I also included a source distribution.
Much has changed since then. The C++ language hasn't, but compiler and platform support for it has improved considerably. Boost survives - indeed, many of its libraries will feed into the next version of C++. Overload thrives, adapting to an age when print programming magazines are all but extinct. My old website can no longer be found. I've changed hosting company and domain name, I've shuffled things around more than once. But you can still find the article online if you look hard enough, and recently someone did indeed find it. He, let's call him Rick, downloaded the source code archive, dvbcodec-1.0.zip [DVBcodec], extracted it, scanned the README, typed:
$ make
... and discovered the code didn't even build.
At this point many of us would assume (correctly) the code had not been maintained. We'd delete it and write off the few minutes it took to evaluate it. Rick decided instead to contact me and let me know my code was broken. He even offered a fix for one problem.
Code rot
Sad to say, I wasn't entirely surprised. I no longer use this code. Unused code stops working. It decays.
I'm not talking about a compiled executable, which the compiler has tied to a particular platform, and which therefore progressively degrades as the platform advances. (I've heard stories about device drivers for which the source code has long gone, and which require ever more elaborate emulation layers to keep them alive.) I'm talking about source code. And the decay isn't usually literal, though I suppose you might have a source listing on a mouldy printout, or on an unreadable floppy disk.
No, the code itself is usually a pristine copy of the original. Publishers often attach checksums to source distributions so readers can verify their download is correct. I hadn't taken this precaution with my dvbcodec-1.0.zip but I'm certain the version Rick downloaded was exactly the same as the one I created 5 years ago. Yet in that time it had stopped working. Why?
Standard C++
As already mentioned, this was C++ code. C++ is backed by an ISO standard, ratified in 1998, with corrigenda published in 2003. You might expect C++ code to improve with age, compiling and running more quickly, less likely to run out of resources.
Not so. My favourite counter-example comes from a nice paper 'CheckedInt: A policy-based range-checked integer' published by Hubert Matthews towards the end of 2003 [Matthews03], which discusses how to use C++ templates to implement a range-checked integer. The paper includes a code listing together with some notes to help readers forced to 'tiptoe around non-standard or ancient compilers' (think: MSVC6). Yet when I experimented with this code in 2005 I found myself tripped up by a strict and up-to-date compiler (see Figure 1).
$ g++ -Wall -c checked_int.cpp checked_int.cpp: In constructor `CheckedInt::CheckedInt(int)': checked_int.cpp:45: error: there are no arguments to `RangeCheck' that depend on a template parameter, so a declaration of `RangeCheck' must be available checked_int.cpp:45: error: (if you use `-fpermissive', G++ will accept your code, but allowing the use of an undeclared name is deprecated) |
Figure 1 |
I emailed Hubert Matthews using the address included at the top of his paper. He swiftly and kindly put me straight on how to fix the problem.
What's interesting here is that this code is pure C++, just over a page of it. It has no dependencies on third party libraries. Hubert Matthews is a C++ expert and he acknowledges the help of two more experts, Andrei Alexandrescu and Kevlin Henney, in his paper. Yet the code fails to build using both ancient and modern compilers. In its published form it has a brief shelf-life.
Support rot
Code alone is of limited use. What really matters for its ongoing health is that someone cares about it - someone exercises, maintains and supports it. Hubert Matthews included an email address in his paper and I was able to contact him using that address.
How well would my code shape up on this front? Putting myself in Rick's position, I unzipped the source distribution I'd archived 5 years ago. I was pleased to find a README which, at the very top, shows the URL for updates, http://homepage.ntlworld.com/thomas.guest. I was less pleased to find this URL gave me a 404 Not Found error. Similarly, when I tried emailing the project maintainer mentioned in the README, I got a 550 Invalid recipient error: the attempted delivery to thomas.guest@ntlworld.com had failed permanently.
Cool URIs don't change [W3C] but my old NTL home was anything but cool; it came for free with a dial-up connection I've happily since abandoned. Looking back, maybe I should have found the code a more stable location. If I'd created (e.g.) a Sourceforge project then my dvbcodec project might still be alive and supported, possibly even by a new maintainer.
How did this ever compile?
These wise hindsights wouldn't fix my code. If I wanted to continue I'd have to go it alone. Figure 2 is what the README had to say about platform requirements.
REQUIREMENTS and PLATFORMS To build the dvbcodec you will need Version 1.31.0 of Boost, or later.
You will also need a good C++ compiler. The dvbcodec has been |
Figure 2 |
A 'good C++ compiler', eh? As we've already seen, GCC 3.3.1 may be good but my platform has GCC 4.0.1 installed, which is better. If my records can be believed, this upperCase() function (see Listing 1) compiled cleanly using GCC 3.3.1 and MSVC 7.1.
std::string upperCase(std::string const & lower) { std::string upper = lower; for (std::string<char>::iterator cc = upper.begin(); cc != upper.end(); ++cc) { * cc = std::toupper(* cc); } return upper; } |
Listing 1 |
Huh? Std::string is a typedef for std::basic_string<char> and, as GCC 4.0.1 says, there's no such thing as a std::basic_string<char><char>::iterator:
stringutils.cpp:58: error: 'std::string' is not a template
The simple fix is to write std::string::iterator instead of std::string<char>::iterator. A better fix, suggested by Rick, is to use std::transform(). I wonder why I missed this first time round? (See Listing 2.)
std::string upperCase(std::string const & lower) { std::string upper = lower; std::transform(upper.begin(), upper.end(), upper.begin(), ::toupper); return upper; } |
Listing 2 |
Boost advances
GCC has become stricter about what it accepts even though the formal specification of what it should do (the C++ standard) has stayed put. The Boost C++ libraries have more freedom to evolve, and the next round of build problems I encountered relate to Boost.Spirit's evolution. Whilst it would be possible to require dvbcodec users to build against Boost 1.31 (which can still be downloaded from the Boost website) it wouldn't be reasonable. So I updated my machine (using Macports) to make sure I had an up to date version of Boost, 1.38 at the time of writing.
$ sudo port upgrade boost
Boost's various dependencies triggered an upgrade of boost-jam, gperf, libiconv, ncursesw, ncurses, gettext, zlib, bzip2, and this single command took over an hour to complete.
I discovered that Boost.Spirit, the C++ parser framework on which dvbcodec is based, has gone through an overhaul. According to the change log the flavour of Spirit used by dvbcodec is now known as Spirit Classic. A clever use of namespaces and include path forwarding meant my 'classic' client code would at least compile, at the expense of some deprecation warnings (Figure 3).
Computing dependencies for decodeout.cpp... Compiling decodeout.cpp... In file included from codectypedefs.hpp:11, from decodecontext.hpp:10, from decodeout.cpp:8: /opt/local/include/boost/spirit/tree/ ast.hpp:18:4: warning: #warning "This header is deprecated. Please use: boost/spirit/include/ classic_ast.hpp" In file included from codectypedefs.hpp:12, from decodecontext.hpp:10, from decodeout.cpp:8: |
Figure 3 |
To suppress these warnings I included the preferred header. I also had to change namespace directives from boost::spirit to boost::spirit::classic. I fleetingly considered porting my code to Spirit V2, but decided against it: even after this first round of changes, I still had a build problem.
Changing behaviour
Actually, this was a second level build problem. The dvbcodec build has multiple phases (Figure 4):
Figure 4 |
- it builds a program to generate code. This generator can parse binary format syntax descriptions and emit C++ code which will convert data formatted according to these descriptions
- it runs this generator with the available syntax descriptions as inputs
- it compiles the emitted C++ code into a final dvbcodec executable
I ran into a problem during the second phase of this process. The dvbcodec generator no longer parsed all of the supplied syntax descriptions. Specifically, I was seeing this conditional test raise an exception when trying to parse section format syntax descriptions.
if (!parse(section_format, section_grammar, space_p).full) { throw SectionFormatParseException( section_format); }
Here, parse is boost::spirit::classic::parse, which parses something - the section format syntax description, passed as a string in this case - according to the supplied grammar. The third parameter, boost::spirit::classic::space_p, is a skip parser which tells parse to skip whitespace between tokens. Parse returns a parse_info struct whose full field is a boolean which will be set to true if the input section format has been fully consumed.
I soon figured out that the parse call was failing to fully consume binary syntax descriptions with trailing spaces, such as the the one shown below.
" program_association_section() {" " table_id 8" " section_syntax_indicator 1" " '0' 1" .... " CRC_32 32" " } "
If I stripped the trailing whitespace after the closing brace before calling parse() all would be fine. I wasn't fine about this fix though. The Spirit documentation is very good but it had been a while since I'd read it and, as already mentioned, my code used the 'classic' version of Spirit, in danger of becoming the 'legacy' then 'deprecated' and eventually the 'dead' version. Re-reading the documentation it wasn't clear to me exactly what the correct behaviour of parse() should be in this case. Should it fully consume trailing space? Had my program ever worked?
I went back in time, downloading and building against Boost 1.31, and satisfied myself that my code used to work, though maybe it worked due to a bug in the old version of Spirit. Stripping trailing spaces before parsing allowed my code to work with Spirit past and present, so I curtailed my investigation and made the fix.
(Interestingly, Boost 1.31 found a way to warn me I was using a compiler it didn't know about.
boost_1_31_0/boost/config/compiler/gcc.hpp:92:7: warning: #warning "Unknown compiler version - please run the configure tests and report the results"
I ignored this warning.)
Code inaction
Apologies for the lengthy explanation in the previous section. The point is that few software projects stand alone, and that changes in any dependencies, including bug fixes, can have knock on effects. In this instance, I consider myself lucky; dvbcodec's unusual three phase build enabled me to catch a runtime error. Of course, to actually catch that error, I needed to at least try building my code.
Put more simply: if you don't use your code, it rots.
Rotten artefacts
It wasn't just the code which had gone off. My source distribution included documentation - the plain text version of the article I'd written for Overload - and the Makefile had a build target to generate an HTML version of this documentation. This target depended on Quickbook, another Boost tool. Quickbook generates Docbook XML from plain text source, and Docbook is a good starting point for HTML, PDF and other standard output formats.
This is quite a sophisticated toolchain. It's also one I no longer use. Most of what I write goes straight to the web and I don't need such a fiddly process just to produce HTML. So I decided to freshen up dead links, leave the original documentation as a record, and simply cut the documentation target from the Makefile.
Stopping the rot
As we've seen, software, like other soft organic things, breaks down over time. How can we stop the rot?
Freezing software to a particular executable built against a fixed set of dependencies to run on a single platform is one way - and maybe some of us still have an aging Windows 95 machine, kept alive purely to run some such frozen program.
A better solution is to actively tend the software and ensure it stays in shape. Exercise it daily on a build server. Record test results. Fix faults as and when they appear. Review the architecture. Upgrade the platform and dependencies. Prune unused features, splice in new ones. This is the path taken by the Boost project, though certainly the growth far outpaces any pruning (the Boost 1.39 download is 5 times bigger than its 1.31 ancestor). Boost takes forwards and backwards compatibility seriously, hence the ongoing support for Spirit classic and the compiler version certification headers. Maintaining compatibility can be at odds with simplicity.
There is another way too. Although the dvbcodec project has collapsed into disrepair the idea behind it certainly hasn't. I've taken this same idea - of parsing formal syntax descriptions to generate code which handles binary formatted data - and enhanced it to work more flexibly and with a wider range of inputs. Whenever I come across a new binary data structure, I paste its syntax into a text file, regenerate the code, and I can work with this structure. Unfortunately I can't show you any code (it's proprietary) but I hope I've shown you the idea. Effectively, the old C++ code has been left to rot but the idea within it remains green, recoded in Python. Maybe I should find a way to humanely destroy the C++ and all links to it, but for now I'll let it degrade, an illustration of its time.
Is it possible that software is not like anything else, that it is meant to be discarded: that the whole point is to see it as a soap bubble?
Alan J. Perlis
Thanks
I would like to thank to Rick Engelbrecht for reporting and helping to fix the bugs discussed in this article. My thanks also to the team at Overload for their expert help.
References
[DVBcodec] Download of the DVBcodec is available from: http://wordaligned.org/docs/dvbcodec/dvbcodec-1.0.zip
[Guest04] Thomas Guest, 'A Mini-project to Decode a Mini-language - Part One', Overload #63, October 2004. Available from: http://accu.org/index.php/journals/241
[Matthews03] Hubert Matthews, 'CheckedInt: A Policy-Based Range-Checked Integer', Overload #58, December 2003. Available from: http://accu.org/index.php/journals/324
[Spirit] 'Spirit User's Guide' Available from: http://www.boost.org/doc/libs/1_39_0/libs/spirit/classic/index.html
[W3C] 'Cool URIs don't change' Available from: http://www.w3.org/Provider/Style/URI
Notes:
More fields may be available via dynamicdata ..