Journal Articles
Browse in : |
All
> Journals
> CVu
> 012
(9)
All > Topics > Programming (877) All > Journal Columns > LettersEditor (132) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Commenting programs, how and why
Author: Martin Moene
Date: 20 June 2010 08:59:00 +01:00 or Sun, 20 June 2010 08:59:00 +01:00
Summary: A letter has flooded in from Colin Masterson, the author of our series on structured programming:
Body:
I hope that readers might bear with me as I once more bang the drum about one of my 'hot topics' that of commenting programs; let's be specific, commenting C programs.
As Mark Burgess seemed to be suggesting in a recent article in PCW, perhaps the time has come to try to standardise in some way a level of commenting which, together with structuring, will improve the useability, if I may use that word, of our software. As I see it, when we write a function - no matter how small - we have three clear obligations.
These may be summarised as follows:
- Primary Obligation. To other users (& the author)
- To provide sufficient information to allow anyone to call the function correctly and to make sense of any returned value.
- To inform the user WHAT action is performed.
- Secondary Obligation. To author (& further details for user)
- To present a structural summary.
- To note any peculiarities about the method employed such as recursion, calls to BIOS, DOS or I/O which may lead to portability restrictions.
- Tertiary Obligation. (If function complex or part of suite)
- Indicate caller function and/or related functions.
- Indicate function level in the program hierarchy.
- Indicate version, amendments and revision history.
- Indicate any test or QA programs.
I shall propose a function header shortly which consists three parts corresponding to these three obligations, each part comprising a number of 'fields'.
Not all parts will be relevant in every case. The style and layout of the header may be left to the individual but some points may be worth noting. Consideration should be given to a standard form which would enable the use of utilities and filters to perform certain documentation and indexing tasks automatically. (eg through the use of GREP and the like.)
Since most people make use of a text editor with block copy and insert facilities, it makes sense to have a single dummy header and insert this into the source as required.
Furthermore, the NAME and CALL METHOD fields may often be directly copied from the function definition, avoiding repeated typing. (ED - an example of the header is given later on).
The need for information as noted under obligation 1 must surely be clear to everyone. For complex functions the explanation of functions and parameters may be quite lengthy. For simple functions, very brief. The need for the following parts may be less obvious. In general though, it ought to be possible to fully understand the implications of using a function without having to study the code itself. Recursive functions or ones with a large quantity of local variables, place a heavy demand on the stack. This is worth noting.
If calls are made to routines such as interrupts, this has an effect on portability and should also be noted. For more detailed study, or more involved functions, a few lines of pseudo code and a note on peculiarities is always appreciated.
Accepted, it is often difficult to know what may be termed 'peculiar' in a C function. But if you shut your eyes and squint at the function, it doesn't take long to spot the parts for which "he'll wonder what on earth I'm doing that for" becomes an appropriate comment.
Finally, the hierarchy. This becomes very important once the number of functions breaks the 10 or 20 barrier; it's simply too much to remember what calls what.
I'll reiterate later my suggestions for levels (see recent letter in PCW) and use the adventure parser by Martin Houston as an example. (Absolutely no offense intended.)
Before I get called down for asking everyone to write a novel at the start of a three statement function let me at once suggest that these three obligations may in many cases be reduced to just one: the primary. It is always essential that we present the name and parameters to prospective users.
The function definition itself may sometimes be considered sufficiently clear to explain the action and calling requirements. I would argue however, that this is rarely the case. If our functions are created in a generalised form (as they should be to allow re-use) then the exact purpose of each parameter must be clearly stated.
For (what I call) Service Level functions at the bottom of the hierarchy, using only simple data types, then it may be quite plain what is required. But is it?
Consider the case:
int strcount(s,c) char *s; int c;
It is our responsibility to ourselves, and to other first time readers, to at least provide a minimum of explanation. At the very least, for these low level ones, this is acceptably done with brief comments around the definition:
int strcount(s,c) /* returns number of times 'c' is in 's' */ char *s; /* the string being scanned */ int c; /* char to count */
This should be sufficient for a user to make use of the function and to understand its returned value. We have fulfilled our obligations 1a and 1b, albeit somewhat briefly.
Notice that, even in this simple case, coded as:
int i; for (i = 0; *s; s++) i += (*s == c); return(i);
we have made no statement about boundary conditions - what if 'c' == NULL, if 's' is zero length, if no occurrences of 'c' are found, what maximum length for 's' ?
However, even fairly inexperienced users would probably be happy with the above comments.
The foregoing excusing from further commenting is on the grounds that the data types involved are simple, an array of characters being considered moderately simple. Were the function to involve parameters of structures, unions, more complex indirection or to manipulate global variables, then this simplification could no longer be considered valid. In such cases at least the NAME, CALL METHOD, GLOBALS and RETURN fields should be completed.
I don't deny that adding comments at the head of a function may take a few extra moments - but it is well worth it. In these days of 1M byte on a floppy and 40Mbyte hard discs, the space consumed by such comments is not worth mentioning. I am also aware that, in an eagerness to get something going, there will be a reluctance to add comments during early development. That is fair enough. But every now and then, before it's too late, take the time to go back through and add some little header to the start of the functions.
As an example here is a header for a function from a program I am currently working on:
/*----------------------------------------------------------------------- NAME : locate_item - locate next item in file. CALL METHOD : long locate_item(pge,fp) int *pge;FILE *fp; address of present page numberfile being read GLOBALS : end_of_file - set TRUE if EOF reached. token - used to hold next character. limiter - current item delimiter char. ASSUMPTIONS : pagestr has been set up with valid newpage sequence. File is opened in BINARY mode. RETURNS/ EXIT CODES : Offset in file of start of next item, 0L if EOF STRUCTURAL NOTES : Locates the next item in the file using the current limiter char. Check all the time for a possible newpage code and keep a copy of it if it looks as though it could be one. Steps are: While not EOF_CHAR [ If token could be newpage then see if its last one needed. If so then bump page counter, else just save token. If token isn't a delimiter then we've found the next item - return offset. ] get next token If EOF_CHAR then return error flag and set end_of_file VERSION : 1.01 AMENDMENTS : LEVEL : SELECT <- HANDLER <- CONTROL CALLER : sort_file <- main -------------------------------------------------------------------------*/ |
Listing 1 |
Now, here are the hierarchical levels into which I divide my programs with reference to the code by Martin Houston in C VU.
CONTROL PREPARATION HANDLER SELECT DATA SERVICE CONTROL LEVEL:
Top level function, main() itself or one called by main(). If any I/O is carried out at this level then it will be to obtain high level command choices only, passing control to the next level (HANDLER). Initialisation is carried out by calling functions at the PREPARATION level. In the case of 'adshell': main() is the CONTROL level which calls the HANDLER level function parse(). It is likely that, in a real adventure, a PREPARATION level function would be called prior to the while() loop and that, possibly, the entire while loop and request for user input would be contained within a CONTROL level function which was called from main().
HANDLER LEVEL
Supervisor type level, parse() in this case is determining an action function to be called to actually do something. Like an executive in a multitasking system, HANDLER level functions may actually do little themselves but pass control to the functions which arrange for the work to be done.
SELECT LEVEL
help() and fexit() are SELECT level functions. They control the real work being done. At this level, the function will call an actual data gathering or issuing function to perform an action. SELECT level functions may be able to call a number of functions at the DATA level according to conditions at their entry time. In this case, printf() is a DATA level function, sending out actual data.
DATA LEVEL
Stand alone functions which accept fairly simple data types and carry out work under control of the SELECT level. scanf(), printf() and your own particular input routines are examples of DATA level functions.
SERVICE LEVEL
The tools and dirty work level. Library functions (like strcount) are examples of this level. They operate on only fundamental data types and do not rely on global variables which are unique to a particular program. They should be entirely portable and form the 'glue' which allows DATA level (and possibly SELECT level) functions to do their job. I have said enough but look forward to comments from others.
Yours faithfully, Colin Masterson.
Notes:
More fields may be available via dynamicdata ..