Journal Articles
Browse in : |
All
> Journals
> CVu
> 152
(9)
All > Journal Columns > Professionalism (40) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Professionalism in Programming #19
Author: Administrator
Date: 05 April 2003 13:15:56 +01:00 or Sat, 05 April 2003 13:15:56 +01:00
Summary:
A passing comment
Body:
Comments are free but facts are sacred. (Charles Prestwich Scott)[1]
We all like comments, don't we? We all know that comments are a Good Thing, don't we? Comments are rather like opinions. You're free to make them, but just because they're expressed, it doesn't necessarily mean they're right. In this article we'll spend a little time thinking about the details of writing these things.
From the moment you were taught to program you learnt to write comments. You were told that comments aid the readability of code, and were probably encouraged to write lots of them. But in this game we need to be thinking more about quality than quantity. Comments are our lifeline, memory jog and guide through code. We should treat them with the respect they deserve.
I set my syntax highlighting code editor to display comments in green. That's my thing. I get an immediate feeling of the quality of a bit of code, and how easy it's going to be to work with, as soon as I load up a source file. A nice proportion of green spread through in the right pattern makes me feel good about the world. The opposite makes me stroll to the kitchen for a strong coffee before going any further.
Comments can make the difference between bad code and good code, between a grossly complex and unfathomable morass of logic, and clear algorithms. But let's not over state the case - there are things far more important than comments to get right. When your code is in the right state your comments are the "icing on the cake", delicately placed to add aesthetics and value, rather than liberally slapped on to cover over all the cracks and blemishes.
In this sense, good code commenting is a strategy to avoid writing intimidating code. Comments will rarely be a magic addition that will turn sour code sweet.
Don't skip this section! OK, so this is an excruciating place to start from. We all know what a comment is, surely? But this question is more philosophical than you might think.
Syntactically, a comment is block of source that the compiler will just plain ignore. Put what you like in it, the names of your grand children or the colour of your favourite mackerel, the compiler won't give two hoots as it merrily parses its way through the file[2].
Semantically, a comment is the difference between a dingy dirt track and a well-lit highway. The comment is an annotation of the code it's situated by. You can use it as a highlighter to make a particular problem area stand out or as a documentation medium in your header file. You might use comments to describe the shape of an algorithm aiding the maintenance programmer (which may be you later on) or might just place comment blocks as a marker between each function to help you navigate a source file more quickly.
Notice in all of this that comments are entirely aimed at the human reader, not the computer. In this sense, comments are the most human-focused brick in the programming wall. It's the kind of brick with ornate moulding as opposed to the surrounding functional breezeblocks. If we want to improve the quality of our comments we need to look at what the human really needs as they read code and address that.
Code comments are not the only documentation your code should ever get. Comments are not specifications. They are not design documents. They are not API references. However, they are an invaluable form of documentation that will always be physically attached to the code (unless someone maliciously hits delete). Their close proximity means they're more likely to be updated, and more likely to be read in context. It's an internal documentation mechanism.
As responsible programmers we have a duty to comment well.
Well, they're green aren't they? Or at least they are for me. Traditional C comments come in blocks between /* and */, and can span any number of lines. C++, C99 and Java add the single line comment that follows //. Other languages provide similar block and line type comment facilities, but with different syntaxes.
Again, this is elementary subject matter, but you'll often see these different types of comment block used in subtly different ways. We'll see some of this as we go along. However, any commenting scheme that makes too cute a use of subtle differentiations based on syntax should be viewed warily.
Having established that we need to work at quality, not quantity, we must ask ourselves how many comments we really need? This is totally influenced by what goes into the comments - so the next section will have a large bearing on this. Preview: favour the minimalist comment strategy.
Student programmers are taught to write comments, and lots of them. As a case in point, the university Computer Science course I took had an automated code assignment marker. To be fair, it was quite a clever piece of work. Almost. Part of the grade awarded was determined by the ratio of comments to code. This didn't encourage students to write meaningful comments, just to add large blocks of nonsense to increase their grade. The text of the Jabberwocky in one assignment managed to get me full marks, but was about as inappropriate as you can get.
There is such a thing as too much commenting. Just as bad comments can be worse than no comments at all, so can too many comments. They can easily hide the real meaning of code, especially if you have to spend more time trawling through complex paragraphs of comment than the actual code that you need to read.
I liken this skill to being a good musician. When playing in a band it's not about how much noise you can add in at every conceivable opportunity. The more you play your instrument, the more complex the overall sound, the worse the music. Too many comments muddy the code. A good musician doesn't have to think "When should I stop playing and let someone else have a chance?" They should naturally only play when it will really add something. It's about playing the minimum you can to create the best sound possible. A lot of the beauty is in the space. We should only be writing comments when it really adds something.
The kind of people who will read your comments can also read the code, so prefer to document as much as possible in the code itself rather than in comments. It's what they'll believe, anyway. Comments have a nasty tendency to lie. Consider your code statements as the 'first level' of comment, and make them self-documenting.
How do you do this? Code self-documents through intuitive function and variable naming, through good structure and control-flow, and through clear and logical indenting with layout that reflects the program structure, rather than by using elaborate comments to make up for poor algorithm design. Use named constants instead of 'magic numbers' which need explaining. Choose types well so that it's obvious what are the reasonable values - enumerations are good for this. Sacrifice small optimisations if they'll confuse the way the code reads. If you see a comment stating something that could be enforced by the language itself (e.g. "// this variable should only be accessed by class foo") you should be worried.
Bad code can be at worst wrong. Bad comments can lie and that can be a whole lot nastier. The fewer comments you write the less chance you have of writing bad comments!
Aside from the tedious language-level concerns of what characters we can put inside our comment delimiters, what should we be writing in comments?
Here are a few basic things to consider that can drastically improve the quality of your comments:
-
Only include the truth
When is a comment not a comment? When it's a lie. OK, you'll probably not deliberately type in lies but it's easy to accidentally introduce errors, especially when modifying code that has already been commented. Try to avoid writing any comments that may go out of date if you know some future modifications may be made. See the "Working with comments" section below.
-
Only include comments of worth
Little witty cryptic comments may be witty, and they might only be little, but just don't put them in. They get in the way. They confuse. In-jokes that you and one other programmer get should not enter into the code. Neither should expletives or comments that are unnecessarily critical - you can never tell where your code will end up in a month or year's time...
-
Don't describe the code
Worthless descriptive comments range from the elementary example of "++i; // increment i" to a description of an algorithm followed by the exact code for the algorithm. There is no need to restate the code laboriously in English unless you're documenting a really complex algorithm that is impenetrable without it. And then you should probably worry more about rewriting the algorithm than the comment.
-
Be clear
Your comment serves to annotate and explain the code. Don't be ambiguous. Be as specific as you can (without writing a thesis about each line of a function). If someone reads your comment and wonders what it means you have made the code worse, and slowed down their comprehension.
-
Explain why not how
This is a key point. Read this paragraph twice. Then eat the page. Your comments shouldn't be describing how the code works. You can see that by reading the code. After all, the code is the definitive description of how the code works. And the code has been written clearly and comprehensibly. Hasn't it? You should focus more on describing why something is written the way it is, or what the next block of statements ultimately achieve. Constantly check whether you're writing "/* update WidgetList structure from GlbWLRegistry */" or "/* cache widget information for later */". They might mean the same thing, but one conveys the intent of the code it refers to, the other just tells you what it's doing.
-
Write comprehensibly
You don't necessarily need to write complete grammatically correct English sentences inside every comment you write. However, the comment must be readable. Cute abbreviations of words usually serve to confuse the reader too much - especially if English is not their first language.
-
End of blocks?
You will see that some programmers have a habit of commenting the end of every control block, for example putting "// end if (a == 1)" after the closing brace of an if statement. This is usually a sign of a novice programmer following the advice of a well-meaning teacher. It is a redundant form of comment, and adds noise to the code that needs to be filtered out before real comprehension can occur. The bottom of a block should be viewable from the same page as the top, and the code layout should make the loop/conditional block clear. All extra verbiage should be avoided. There is a similar practice for #ifdef blocks, which although has a slightly more compelling argument I also think is redundant. If you have so many nested #ifdefs that you need this kind of documentation then you have bigger problems with your code to sort out.
-
The unexpected
If any bits of the code you write are unusual, unexpected or surprising then document them with a comment. You will thank yourself when you come back having forgotten all about the problem. If there are specific workarounds, say for an operating system issue, then mention this in a comment.
So let's begin to pull this together and look in more detail at how and when to write a good code comment.
We've seen that well written code shouldn't really need comments, that variable names should be self-explanatory. Function names like f() and g() scream out for comments to describe them, but someGoodExample() doesn't ask for it at all. You can see it's a good example function name.
Here's a little worked example to illustrate some of these principles of commenting. Consider the following snippet of C++ code. Aside from other idiomatic criticisms, it's not entirely clear what's being done.
int j = wlst.sz(); for (int i = 0; i < j; ++i) j(wlst[i]);
Euch. There's some room for improvement here, so let's improve. The code can be made an awful lot clearer by applying some sensible layout rules and adding a few comments.
// Find the number of widgets in the widget list int j = wlst.sz(); // Iterate over all widgets for (int i = 0; i < j; ++i) { // Print out this widget j(wlst[i]); }
Much better! Now it's entirely clear what the code snippet is supposed to be doing. I'm still not entirely happy, though. By giving the functions and variables appropriate names we no longer need any comments at all, the code describes itself:
int const numWidgets = widgets.size(); for (int i = 0; i < numWidgets; ++i) { printWidget(widgets[i]); }
Note that I didn't rename i to something more long winded and tedious. It's a loop variable with a very small scope. Calling it loopCounter would have been overkill, and would arguably make the code harder to read.
Although this is a small and reasonably tedious example, now scale it up. Imagine functions much larger and several times more complex. It doesn't take much to see the difference good commenting strategies make. I've seen plenty of code like that first example above, and had to improve it by taking it to the next stage. When modifying existing code you often don't have the luxury of being able to rename all the functions and class names to more sensible choices.
Beware of the warning signs. If you end up writing reams of comments that explain how a complex algorithm is working, stop. First pat yourself on the back for thinking about documenting what's going on. But then consider whether you could change the code or the algorithm to make it clearer. Remember you don't need to prematurely optimise (and obfuscate). Perhaps you could split the code into several well named functions, rename the variables etc. As Kernighan and Plauger said: "Don't document bad code - rewrite it." [Kernighan-]
If you find yourself using comments to describe use of variables, you probably need to rename the variable. If you are documenting certain conditions that should always hold, perhaps you should be writing an assertion.
You'll hear people religiously tout how you should format your comments. I'm not going to prescribe any One True Way to format them here. But there are a few things to consider. Interpret them as guidelines according to your personal taste rather than dictates.
Commenting should be clear and consistent. Make a point of choosing a way of laying out your comments, and use it throughout. Every programmer has a different sense of aesthetics, so chose what works for you. Do use a house style if one exists, or examine (good) existing code and follow the styles you see there.
Many small formatting issues in comment writing may seem trivial - for example should each comment start with a capital letter or not? However, if all your comments are randomly capitalised it will convey the sense of a lack of cohesion in the code, like the programmer didn't really think all that carefully as he crafted his code, or perhaps that the code grew by accretion rather than by design.
You can tell that I like my syntax colouring editor, but there can almost be too much reliance on syntax colouring. Consider that your code may be read from a monochrome printout or viewed quickly in an editor without syntax colouring. The commenting should still be readable.
A few strategies can help here, especially regarding block comments. Placing the start and end markers (e.g. /* and */ in C and C++) on their own line makes them stand out. Placing a margin character down the left-hand side of a block comment also helps to make it appear as a single item, for example:
/* * This is much more readable * as a block comment in the midst * of a whole pile of code */
than
/* a comment that might span a few lines but without any margin character. */
At the very least line up the comment text so it's not a jagged mess.
A comment shouldn't cut across the code and break up the logical flow. Keep it at the same level of indentation as rest of the code around it. That way the comment appears to apply to the correct 'level' of the code. Maybe it's a personal problem but I always have to stare hard at code like this:
void strangeCommentsAhoy() { for (int n = 0; n < JUST_ENOUGH_TIMES; ++n) { // this is a meaningful comment about the next line doSomethingMeaningful(n); // frankly it's confusing the pants off of me anotherUsefulOperation(n); } }
In a loop without braces (which I'm not convinced is a good idea anyway) don't put a comment above the single looped statement - there be dragons. If you want a comment in there, wrap the whole lot up in braces. It's a safer strategy.
Most comments usually come on their own line. Sometimes a single line comment can follow a statement if it's only short. However its good practice to space the comment away to mark it as clearly apart from code, for example:
class HandyExample { ... some nice public stuff ... private: int numApples; // end of line comments: bool oldManADustman; // make them stand out int transactionID; // from the code };
The above is a good example of using comment layout carefully to improve the appearance of the code. If each end of line comment came directly after the appropriate variable declaration they would look jagged, rather messy and require more squinting to read.
Comments are usually written above the code that they describe rather than below it. This way the entire source code reads downwards, almost like a book. The comment serves to prepare reader for what is to come. Used with whitespace, commenting helps to break the code up into 'paragraphs'; a comment introduces a few lines explaining what they intend to achieve, these lines immediately follow, then a blank line, then the next block. This is such a convention that a comment with a blank line before it feels like a paragraph start, whereas a comment sandwiched in the middle of two lines of code feels more like a statement in brackets or a footnote.
It's sensible to choose a low maintenance comment style. For example you'll see people write C style comment blocks that don't only have left asterisks as a margin, but also include a row of right margin asterisks. Whilst this arguably looks very pretty, the amount of work required to adjust a paragraph of text within such margins is ludicrous. When you could have moved on to the next task in hand, you have to instead waste effort carefully lining up all the asterisks on the right again. If the style involves using tabs things get even nastier: someone with a different sized tab stop opens the file and wonders what the original programmer was on since all the comments look incredibly ugly and badly lined up.
Note that the end of line comments we saw above are an example of reasonably hard to maintain code. How much effort you're prepared to spend is up to you. There is always a balance between good looking source code and maintenance effort. I prefer a little bit of effort to ugly code.
Comments are often used as breakwaters between different sections of code. Programmers use different schemes to differentiate 'major' comments (this is a new section of code) from 'minor' comments (this describes a couple of lines of a function). This is perhaps where different people's aesthetic hackles really rise. It's not uncommon to see a C++ source file which contains the implementation of several classes with something like this between each section:
/************************************************** * class Foo implementation **************************************************/
Some people really go for large blocks of comment between functions, or even a single long comment line as a rule between them. I tend to place a couple of blank lines between functions and that's good enough for me. If you have functions large enough that you really need help to see where they start and end you may need to revise your code.
Try to avoid using these large rules to emphasise every comment in sight. Otherwise nothing gets emphasised. Good indentation and structure should group code together, not impressive comment ASCII art. However, well-chosen breakwater comments can help you to quickly navigate around a file.
Comments can also be used as inline 'flags' in the code. There are a number of conventions for these flags. It's common to see "// XXX" (no, not an 'explicit code' warning!) or "// TODO" littered though files which are still work in progress. Good syntax highlighting editors display these comments prominently by default. The former flag is used to mark troublesome code or something that needs rework. TODO often marks missing pieces of functionality for a later return.
Each file should begin with a comment block that describes the contents of the source file. It's just a quick overview, a preface, providing some essential information that you always want displayed as soon as a file is opened. If such a header exists then another programmer who opens this file will feel safer about the contents; it shows the file was thoughtfully created rather than just hacked up as a dumping ground for some new code.
Some people advocate that this header should provide a list of every function, global variable etc defined, but I think that this is a maintenance disaster, and such a comment would rapidly get out of date.
The kind of information this file header should contain is:
-
the purpose of the file (e.g., implementation of foo interface),
-
the date the file was created (not last modified - this would fail to get updated and so become misleading),
-
the author(s) and any modifiers, and
-
a copyright statement describing ownership and copying rights.
Specifically this header should not contain a source file history describing every modification ever made. This kind of information exists in your source control system and doesn't need duplicating here. Indeed this is not just an issue of duplication; if you have to scroll through ten pages of modification history to get to the first line of code the file becomes tedious to work with. This has caused some to advocate moving the history to the end of the file, but it would still make the file slow to load and bothersome to work with.
If a source file is automatically generated by some tool during the build process, then you must arrange for this file to receive a comment header that states very clearly (in BIG SCARY CAPITAL LETTERS) where it originated from. This should prevent someone mistakenly editing it, only to have the contents regenerated at the next build.
Another practice you'll see is placing comments where faults have been fixed. For example, you may come across code like this in the middle of a function:
// <bug reference> - changed to use blah.foo2() // method because the old code didn't handle // <some condition> properly blah.foo2();
Although these are entered with the best intentions, to help you (or any newcomer) see what's happened in the course of development, they often do more harm than good. To understand the real problem being fixed you'd now have to look up the fault in your fault tracking system. Most of these comments are useless unless you pull out the previous revision of the file from source control to investigate what changed, by which time you may as well have retrieved the check-in comment anyway.
Comments like this quickly proliferate during the later stages of development and maintenance, and end up littering the source code with sidelines, stale information, and distractions from the main thread of execution. This is another of those political issues. There is often an argument for inserting a comment when you make a non-obvious fix, to prevent someone later revising the code and reintroducing the bug. However, in these well-chosen cases you are documenting the unexpected rather than placing a bug-fix notice.
You can use comments as a working tool in a number of useful ways as you go about writing code. But we also need to be careful we don't abuse them.
A common approach when starting to write a new routine is to write out the structure of what needs to be achieved in English comments first, then start to fill the code in underneath each comment line.
If you do this, then when you've finished ask whether the remaining comments are still useful. Evaluate them against the criteria above, and revise or remove them if necessary. Don't just leave them and move on.
The alternative is to write the new routine without all the comment rigmarole, and then come back and add the necessary comments afterwards. However, the experienced programmer will comment as they go along. Experience shows you the right amount of commenting to use. Coming back to comment something either doesn't happen, or doesn't lead to the most appropriate comments because knowing the code so well, the non-obvious bits are all too obvious to you now.
Don't be afraid of using the flags we saw above, like TODO, whilst coding as markers to yourself. It's a good technique to avoid the embarrassment of forgetting to complete dusty code corners. You can easily search your entire codebase for these flags to find what still needs completing.
Comments rot. Well, all carelessly maintained code tends to rot, acquiring unsightly blemishes and losing the original neat design. However, comments seem to rot much more quickly than any other piece of code. They have a tendency to get out of date with the code they describe. This can quite quickly cause profoundly annoying results. For example, I was recently working on a section of code where a file contained the comment "features A and B not yet implemented". I needed both these facilities, so I went about implementing them. Only after having done this did I discover that feature B had in fact already been implemented and I have just reinvented a wheel. Feature A was redundant since the implementation of B had handled it as well. If the person who did this had removed the incorrect comment I would have been saved a lot of work.
The simple solution is this: when you fix, add, or modify any code, fix, add, or modify any comments around it. Don't just quickly hack a couple of random lines to get them to work and move on. Make sure that any code changes don't turn comments into lies. The corollary of this is that we must make comments easy to keep up to date, or they won't be updated. Comments must be clearly related to the correct section of code, not placed in random locations.
Another bad habit to avoid is leaving code commented out. This will bite you when you come back in a year's time, or when any other programmer stumbles across it. If you encounter some code that has been left in the source file but in a comment block you'll wonder why it's there. Was it a fix that was never completed? Is it work in progress? Did the code never work? Is the rest of the code functionally complete? Either leave a note explaining why you have commented the code out or take it out completely. You can always get it back from the source control system, after all. Even if you only think you're knocking something out temporarily, leave yourself a note. It's so easy to forget to come back and finish off a job.
As a maintenance programmer, it's best not to remove any inane comments you find, unless they are downright dangerous. Leave them as a warning for future maintenance programmers. Learn the interesting area flags like 'XXX' and treat them with respect and caution. Also notice printf or other output statements in the code that have been commented out. These are a sure sign that there has been a problem area here in the past, treat the code with care! Be aware of comment rot, for example just because the comment says "this defined in foo.c" it doesn't mean it is any more. Always have faith in code and doubt comments.
I can't write about code comments without touching on this subject. Earlier on I said that comments are not API documents. However, there is one case when they can be. Literate programming tools, like Doxygen [Doxygen] and Javadoc [Javadoc] can read specially formatted comments in your source code and produce neatly formatted documentation from them[3]. I regard this as better than separate written API specs.
Now, if you already have a separate API document then keep it - and consider it the definitive guide. If a code comment and this specification differ, the specification is the one that should be right, since that is a document that will have been reviewed, discussed, agreed with a customer, etc.
However, for any new project I strongly recommend using a literate programming tool to document your code. Public interfaces need very careful specification documents to describe preconditions and explain the exact semantics of each function. By putting this in comments right there in the code, the documentation stands a much better chance of staying up to date, since an API change merely requires the change of a header file comment, rather than an off-line word processor document edit. You also have the documentation easily to hand whilst you're coding - it's conveniently there in the header file.
Documentation has a habit of not getting done. This is a simple and effective way to tackle the problem. Programmers also get a buzz out of running their neat code through a magic tool and getting high level documentation seemingly for free.
We write a lot of comments. That's because we write a lot of code. Learning to write the right sort of comment is important or our code may keel over under the weight of inappropriate and outdated commenting.
Comments are no more important than the code they annotate - you can't make bad code good using comments. Your aim should be self-documenting code that requires no comments at all.
I'd welcome any comments on this article...
[Kernighan-] B.W. Kernighan, P.J. Plauger. The Elements of Programming Style. McGraw Hill, 1978. ISBN: 0-07-034199-0.
[Doxygen] Doxygen. Available from: http://www.doxygen.org/
[Javadoc] Javadoc. Available from: http://java.sun.com/j2se/javadoc/
[1] C.P. Scott (1846-1932) was an eminent British journalist who edited The Guardian for 57 years. He pursued a consistently radical liberal editorial stance, even in the face of public hostility.
[2] Of course, what chews up and spits out the comments differs with the kind of language you're using. In C/C++ the monstrous preprocessor beast devours comments before the compiler proper ever gets a look in. In other compiled languages the compiler itself throws away comments as it tokenises the source. In interpreted languages your intense commenting may slow down execution of the program as the interpreter has to do the work of jumping over the colour of your favourite mackerel.
Notes:
More fields may be available via dynamicdata ..