Journal Articles

CVu Journal Vol 12, #6 - Dec 2000 + Professionalism in Programming, from CVu journal
Browse in : All > Journals > CVu > 126 (17)
All > Journal Columns > Professionalism (40)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Professionalism in Programming #5

Author: Administrator

Date: 06 December 2000 13:15:41 +00:00 or Wed, 06 December 2000 13:15:41 +00:00

Summary: 

Documenting Code

Body: 

Do we write code for computers or humans?

Well, presumably we write code for 'customers'[1], so we are hopefully aiming for the latter. My question more specifically is: do we write code for it to be read by computers or humans?

We certainly write code in order to instruct a computer what to do. But we are actually writing code to express the concepts in our heads, to describe the way we want to mould the flow of execution and how to manipulate the data structures we have generated.

I think it is fair to say that machine code is supposed to be read by computers - it is the language of the CPU. You cannot talk to a microchip in English, after all. However, any higher-level language like C or C++ is removed from the machine code by a number of levels[2]. When we write code in these languages do we really write for the computer? It is more accurate to say we write code for both ourselves and the computer.

However, we are writing the code in a more human-friendly way, primarily for ourselves. We then get a compiler to 'translate' our human-friendly language into computer-friendly language the CPU can understand. In this respect, the programming language we choose has been designed so that we can talk to the machine. But it has been created mainly so that we can convey our thoughts and concepts in a natural manner for us.

So then, programming languages are how we express our programs, both to the author and for the (human) reader. We use programming languages to prevent ambiguity. We use them as concrete documentation of the algorithms we choose.

Self-documenting code

So if we are writing code for humans to read, we should be taking care to make the code as clear as possible, and as easy to understand as possible. By necessity the code is something that more people than just the author must be able to understand. This actually is not that hard to do, and takes little effort. By improving the ease of understanding of the code we write we gain a number of benefits:

  • We are less likely to make mistakes, since errors are more explicit

  • Maintaining the code is cheaper - it takes less time to 'get into it'

Remember that computer programs are inherently much harder to read than they are to write. Anyone who has used Perl will understand this. Perl has been described as the ultimate write-once language. If you come back to some of your old Perl code, or look at someone else's, you'll be lucky if you are able to understand it.

It has been argued that the only document that explains some code completely and correctly is the code itself. This is one of the Extreme Programmer's axioms [BECK]. It is certainly true, but it does not necessarily follow that it is best description possible. Whether you are using that as an excuse (sorry, reason) not to write a specification or code comments, or believe that we should still write supporting documentation, it is clear that we will benefit from writing code that is as self-documenting as possible.

By 'self-documenting' code, I mean easily readable code that stands comprehensible on its own. We can improve the clarity of our code in many ways. Some are very basic and have been drilled into us since we were taught to program. Others are more subtle and come with experience. For example:

Choosing meaningful names

Variable, function and type names, as well as file names should be meaningful and not misleading. Their name should describe what they contain or define. If you cannot name something meaningfully do you really understand what it is doing? Function names describe what the function does. Because of this they generally begin with a verb.

It takes a bit of experience to be able to pick the correct balance of name length vs. readability. Purists argue that you should not abbreviate the words in your variable names at all. If we wanted a fight now would be a good time to mention Hungarian notation.

Back in the days of yore, name choice was harder because compiler technology limited the number of characters available and file systems imposed rigid constraints. However, in these enlightened times, this kind of problem is practically non-existent, and there is no reasonable excuse for sloppy naming.

Choose meaningful types

As far as possible, describe constraints with the available language features. For example, if you are defining a value that will never change (either a file-static constant or an intermediate value of a calculation) enforce it with const. If a variable cannot have a valid negative value, make its type unsigned. Use enums to define several related values.

Provide meaningful comments

Clear code contains an appropriate amount of commenting. It is actually harder than would first appear to describe exactly what that 'appropriate' amount is.

Adding comments to code does not really make it self-documenting. The code and the comments can all to easily get out of sync. Even so, meaningful comments add much value.

Name constants

'Magic numbers' are evil. They hide meaning. Writing 10 in the statement if (10 == count) is not very explicit - what are you actually checking for? Writing static const size_t MAX_NO_BANANAS = 10; (or even better, avoid tripping over the pre-processor by using some lower case letters in that identifier) and then if (MAX_NO_BANANAS == count) is much clearer. It also has the benefit that if you use the value 10, sorry MAX_NO_BANANAS, a lot in your code and you need to change the definition, you can do it just once at the definition line, rather than perform an error-prone search for every 10 in the project.

Emphasise important code

When writing your code, take care to make the important stuff stand out from the mundane stuff. A good example of this would be the ordering of definitions in a class. The public information should come first, since this is what the class user needs to see. Order the private, implementation details last since they are less important to the reader.

Wherever possible hide all the non-essential information. Use the Pimple idiom [MEYERS] to hide implementation details.

Also avoid hiding important code. Only put one statement per line. You can write very clever for loops with all the logic on one line using commas, but it is not easy to read.

Limit the number of nested conditional statements. The important handling of conditions becomes hidden by a nest of ifs and braces.

Do not clutter code with revision history information in comments when you can get it all from your source management system (cvs, rcs, sccs etc).

Group related information

It is much clearer to present all related information in one place. The API for a single module should be presented in one file. If there is so much related information that it becomes messy to present it together, have you designed correctly?

If at all possible the grouping should be enforced by language construct. For example, in C++ we can group items within a namespace. Related values can be defined in an enum.

Provide a file header

Placing a comment header at the top of a file describing its contents and the project to which it belongs takes little effort, but can make a big difference when someone comes to maintain that file in the future. Even writing the author's name can be useful.

Handle errors gracefully

Whatever the error mechanism you are using (which will most likely be related to the language you are using) you should handle every possible error condition that can occur. That's just common sense. But you should also handle errors in their appropriate context. For example, if there is a disk I/O problem you should handle it in the code that accesses the disk. Perhaps handling this error would mean raising a different error (like a 'could not load file' exception) to a higher level. This means that at each level in the program an error is an accurate description of what the problem is.

Self-documenting code helps the reader to understand where an error came from, and what its implications are for the program.

If your language allows, declare variables at first use, and initialise them

OK, so you cannot really do this in C[3]. It is much clearer to declare any variable as close as possible to its first use. If you declare it at the top of a function, and first use it ¾ of the way down the code looses clarity. Any other programmer who comes back to your code will see the variable used, but to find out what type it is they have to scroll all the way back to the beginning of the function.

If the variable has an important initial value, then it should be initialised in the declaration to prevent confusion. Initial values are another reason to declare near first use, by separating the initial value from its first use you can prevent the reader from following the code properly.

Write short functions

They are just so much easier to understand. It is easier to understand a complex algorithm if it is broken into pieces with descriptive names then if it is just a sprawling morass of code on the page.

Know when to break the rules

The experienced programmer knows when the rules are getting in the way of doing a proper job. For example, it is important to give variables meaningful names, but when you have a loop counter for a single line loop, a variable name n may be greatly preferable to indexIntoWidgetTable, especially when your code needs to fit into a certain number of columns per line.

These are some simple ways of writing your code for humans to read. Please write in with suggestions of other ways to make your code self-documenting. However, there are some more elaborate alternatives...

Literate programming

Literate programming is a term coined by the renowned computer scientist Donald Knuth; he wrote a book by this title [KNUTH] describing it. In many ways it is a quite radical alternative to the traditional programming model, although some think that the literate programming episode of Knuth's career was a large and unfortunate sidetrack. Even if it is not the One True Way to code, it does not mean that there is nothing we can learn from the concept.

The idea of literate programming is simple; not only is the source code written for humans to read, it is embodied in the system documentation. The documentation language is bound up tightly with the programming language. You write one file that is primarily a description of what is being programmed, but also happens to compile into it. The source code is the documentation.

It is almost written as a story, it is easy for the human reader to follow (perhaps even enjoyable to read). It is not ordered or constrained in any way to make it easy for the computer to parse. This is more than just a language with inverted comments! It is hard to describe when we think in our traditional file-based programming paradigm, literate programming is a whole different way of thinking about programming.

Knuth originally mixed TEX (a document typesetting mark-up language) and C in a system called WEB. A literate programming tool parses the source file, and generates either formatted documentation (printable or HTML, for example) or source code that can be fed to a traditional compiler.

Of course, since this is just another programming technique, like structured programming or object oriented programming, it still does not guarantee quality documentation. That is, as ever, up the programmer. However the technique shifts emphasis onto writing a description of the program rather than just writing code that implements it.

Literate programming tends towards keeping the documentation consistently accurate - it is updated with code, since it is right beside the code. Many programmers will tell you from first hand experience how easy it is to forget to update a separate specification when you modify some code.

Literate programming also encourages the inclusion of items not normally found in source comments, for example a description of algorithms used, proofs of correctness, and justification for design decisions. This sort of documentation can be included without cluttering source with random comments or bogging down a specification with many issues about specific bits of code.

During the maintenance phase of a product literate programming really comes into it own. With good quality (and quantity) documentation directly on hand it is much easier and cheaper to maintain the source.

Some literate programming tools can include pictures and charts in the source. They allow you to do clever things like describe a C++ class in its entirety (i.e. both interface documentation and implementation documentation) in one source file, and have the tool split it out into the .h and .cpp files for you.

In the third article in this series [GOODLIFFE] we discussed software specifications. So how does literate programming relate to specifications? A literate program will never replace a functional specification describing what work needs to be done. However, it may be possible to develop a literate program from such a specification. The literate program really is more of a combination of the traditional code with the design and implementation specification. See [MALL] for more information on literate programming.

Documentation tools

There is a breed of programming tool that is a sort of halfway house between writing separate specifications and the literate programming approach. They are greatly gaining popularity. These are tools that generate documentation from your source code. The documentation is pulled out from blocks of specially formatted comments that you write. This has perhaps been particularly fashionable since Sun introduced the Javadoc™ program [SUN] as a core component of the Java platform. All of the Java API documentation is created automatically by Javadoc™.

For example, to document a Widget class, you would write something like:

/**
* This is the documentation for the Widget
* class. The tool knows this because the
* comment started with the special '/**'
* identifier.
* @author Author name here
* @version Version number here */

class Widget {
public:
/** * This is the documentation for a method. */
void method(); };

A documentation tool will parse the file, and extract the documentation information, building a cross-referenced database of all the information it finds on the way. Then it will spit out a document (either online or hard copy) containing this information. This is most likely API documentation rather than implementation documentation.

There are a number of such tools available today. Many of them are open source and widely used. I list below a number of them and provide a simple comparison of features (E&OE!). I have never used any commercial documentation packages - the free ones are so good I would like to know what the commercial offerings provide (apart from fancy GUI front-ends). If any readers have experience with them then I would be glad to hear about it.

It is worth noting that most of these tools do not support Java since the language already comes with a de facto tool.

Doc++

Available from: www.linuxsupportline.com/~doc++/

This is a widely used tool. It creates well-structured documentation with good cross-referencing and automatic class graph generation. However, these graphs come in the form of unwieldy Java applets for HTML output. The documentation is not the most elegant to look at since it is quite spread out. The special comment syntax is not completely consistent which makes it a bit odd to read in the source file. The tool provides a good range of special fields (for example, the @author tag).

CcDoc

Available from: www.joelinoff.com/ccdoc/

CcDoc™ is designed to be a Javadoc™ for C++. Its comments are quite like the Javadoc syntax, but not quite. It has a shareware licence, but has open sources. It costs $20.00 per copy. The output is just like Javadoc™ from the 1.0.2 JDK days and looks a bit dated.

Doxygen

Available from: www.stack.nl/~dimitri/doxygen/

Doxygen™ is a very nice program. It produces really cute output with good cross-referencing. The pages have nice class diagrams at the top, which unlike Doc++™ are not Java applets. Doxygen™ is also very flexible - it can accept a number of different comment styles, including Javadoc style, Qt style and a one-line style. Output is similarly customisable.

Doxygen comes with a GUI front-end (written using the Qt library) too. It is a powerful tool.

Cocoon

Available from: www.stratasys.com/software/coccon/

Cocoon™'s output can be 'flavoured', but is fairly lumpy to look at. Its HTML uses frames, but is not particularly elegant. The comment syntax is also a bit curious, and the program requires that you put the keywords class, private, protected and public in first text column. Cocoon™ does not support any extra information tags.

The Cocoon™ licence is freeware. The author explicitly states that it is not open source, but you can download the source code and build it yourself.

KDOC 2

Available from: www.ph.unimelb.edu.au/~ssk/kde/kdoc/

This is easily my favourite documentation tool. It was written specifically for the purpose of documenting the KDE™ libraries. Classes can be grouped into libraries and very easily cross-referenced (a very useful facility). It supports Qt™ signals and slots [TROLLTECH] - none of the other tools do. It is also very easy to use.

KDOC 2™ is the most stable tool I have found, and the output is (in my opinion) the nicest - as the author puts it, it does not generate 'too much extraneous fluff'. It is written in Perl and so should work on Windows as well as Unix.

If you intend to document a project using a tool such as these then it is advisable to do so right from the very start. It can be a lot of work to put good quality documentation into an existent code base. I would suggest looking at either Doxygen™ or KDOC 2™.

<colgroup> <col> <col> <col> <col> <col> <col> <col> <col> <col> <col></colgroup> <thead> </thead> <tbody> </tbody>
Name Languages Generates Platform Licence Output Quality
C C++ IDL On-line Printable Unix Windows
Doc++ HTML TEX GNU GPL OK
CcDoc HTML Shareware OK
Doxygen HTML PDF man LATEX, RTF GNU GPL Very good
Cocoon HTML Freeware Naff
KDOC 2 HTML man LATEX Texinfo DocBook GNU GPL Very good

Conclusion

Writing code is not just about typing fors and switches into a text file. And it is not all about the high level design, either. A good programmer who takes pride in their professionalism is careful to write code that can be easily read, aiming for it to be self-documenting code. We write code primarily to communicate.

Literate programming is one (quite extreme) method of writing self-documenting code. Another less extreme method involves using documentation tools. These tools can generate API documentation very easily, but they do not necessarily take the place of a well-written specification or other descriptive document.

References

[BECK] Kent Beck. Extreme Programming Explained. Addison Wesley Longman. (0 201 6164 1).

[MEYERS] Scott Meyers. Effective C++. Item 34: Minimize complication dependancies between files. Addison Wesley Longman. (0 201 92488 9).

[KNUTH] Donald Kunth. Literate Programming. University of Chicago Press. (0 93707 380 6).

[GOODLIFFE] Pete Goodliffe. Professionalism in Programming #3: Being Specific. C Vu, Volume 12 No 4. (1354-3164).

[MALL] Daniel Mall. Literate Programming Homepage. URL: www.literateprogramming.com.

[SUN] Sun Microsystems, Inc. Javadoc Tool Homepage. URL: java.sun.com/products/jdk/javadoc/.

[TROLLTECH] Trolltech A.S. Qt library. URL: www.trolltech.com/products/qt.qt.html.



[1] Or just plain users, if it is a freeware or open source project.

[2] This level may be different for C compared to C++; many people see C as a kind of 'portable' assembler.

[3] Well, actually you can with judicious use of braces.

Notes: 

More fields may be available via dynamicdata ..