Journal Articles

CVu Journal Vol 8, #1 - Feb 1996

Browse in :

All > Journals > CVu > 081 (7)

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Returning Early and Taking a Break

Author: Administrator

Date: 07 February 1996 13:15:26 +00:00 or Wed, 07 February 1996 13:15:26 +00:00

Summary:

Body:

I read the rather over long 'Caught in the Net' column in C Vu 7.6 with rising frustration because there was far too much 'I do it this way' instead of 'The advantage/disadvantage of doing that is ...' It reminded me strongly of one of my Computer Science lecturers declaring 'No globals - they are bad'. When I asked him what I should do with data that was essentially program wide such as 'errno' (or now-a-days, such C++ objects as cout and cin, the reply suggested that he had no idea as to what was wrong with using global variables. His answer was to create a data structure that could be passed as an argument to every function/procedure. I contend that is a cure that is worse than the disease.

The real problem with globals is that they extend the test environment for every line of code. The programmer must consider possible effects that the state of every variable may have on her code. For example any line using a math function in C may change the state of errno (but as you never check, it won't worry you ;-). cout/cin may go into a non-functioning state as a result of any use. If you never check, your program may behave oddly. In other words the state of cout/cin is relevant throughout the program. Putting such items in an auto variable in main() and then relaying it everywhere does absolutely nothing to help. It actually hinders, because passing the data-structure will be an overhead of at least one pointer per function call (probably eating up precious local memory resources) as well as time taken to pass the value. Worse, errno will not work in this way. Think about it, it is designed to communicate between the maths library and users of the library. I know that we have alternatives such as exceptions in C++, but my point is that they are alternatives and intelligent programmers must choose the best compromise for the task they are handling. Trite phrases such as 'Globals are bad' do nothing to improve programming.

While I am on about some of the silly advice that gets thrown around let me add a quick one about choice of identifiers. The primary guide should be 'whatever makes code more readable.' The programming analogues of 'nouns', 'verbs', 'adverbs' etc. should be easy to distinguish in the context of their use. Parameters in prototypes should have identifiers that make their use clear to the client (user of the function). Parameters in definitions should be named to help the implementor (and future maintainers) express algorithms etc. cleanly. There is nothing wrong, per se, with short variable names, and in many cases there is a lot wrong with overly long ones - they can obscure the underlying structure of source code. Think of single letter variables as akin to pronouns in natural languages - only to be used where it is plain what they refer to.

On the subject of making code readable, it is worth reflecting on the common commenting guidelines that are dear to the hearts of so many. I think comments should be like footnotes, non-intrusive extra information for the less well informed and incidental information that all will need but that does not fit into the main flow. If your code needs more than a comment every dozen lines then it is poorly written. Experienced programmers should not need to have common algorithms identified and the need to comment individual identifiers is a sure indication of poorly chosen ones. Even C programmers should avoid declaring variables remote from the point of first use. A variable's lifetime should extend from the point of first use (or just a bit earlier) to the point of last use. Generally the scope (i.e. where the identifier is visible) should be less than a dozen lines of code and if it is more than a couple of dozen lines you should be re-examining your coding style (sometimes large blocks are necessary, but often they just represent the programmers fuzzy focus on a problem).

You should always comment on your reasons for making choices between alternatives - e.g. your reason for selecting a specific sort algorithm should be given even if it is just that you could code it easily. You should also use comments to mark sensitive parts of your code, or places that will need care in future. For example C++ class designers should always comment where they are allowing compiler provision of default constructors etc. because the decision may need to be reviewed later on.

Too many comments are counter productive - they are not read and often not updated. Like noisy compilers that give too many warnings and thereby conceal the important ones, too many comments hide the ones that matter.

Anyone who has had to endure the traditional style of formal geometrical proof will know of their format of assertions followed by justification. Actually the form of proof that was acceptable earlier this century would have made Euclid's toes curl. As we mature in our understanding of a subject we need fewer pointers to help us follow it. But there is an important difference between mathematical proofs and writing code, the former skips forward in as large a step as the reader can manage with the comment supporting the step. But code necessarily has to be complete - we haven't yet got to compilers that fill in our intentions by reading our comments.

Returning

'return' and 'break' are controlled 'gotos'. By the way, I believe that all programming languages need a global goto mechanism. Pascal provides one which Donald Knuth uses in both his TEX and METAFONT programmes in order to handle programs in a 'panic' state - i.e. when the program has detected that it is no longer behaving reliably. C provides setjmp and longjmp to handle such circumstances, while C++ provides exceptions. I do not know of any instance where a local goto (such as that provided by C) is necessary, though I can speculate that in the early days having such a mechanism might have helped the then existing compiler technology. Using a goto to jump out of a nest of structures has always seemed suspect to me as indicative of a poor implementation.

First let me dismiss the spurious claim that allowing multiple returns somehow increases the complexity of a piece of code - on the contrary avoiding it can sometimes increase complexity by introducing an otherwise unnecessary state variable (basically one that says 'The task is complete. I am just passing through to get to the end.') As far as I am concerned you should return from a function precisely at the point at which its task has been completed (do not forget that any function should only supply functionality for a single concept). (Francis has passed back a comment from Sean Corfield on this style which I will tackle at the end of this article.)

By returning at the point of completion those reading the code can see your intent and beliefs. Nothing is worse than maintaining code where one has to wade through reams of code only to discover that a particular thread was already complete.

Let me give you a short example where it is easy to code with either a single return or with two.

void replace(char * destination, const char * source){
  if(destination == source) return;
  free (destination);
  destination = (char *)malloc(strlen(source)+1);
  strcpy(destination, source);
  return;
}

versus:

void replace(char * destination, const char * source){
  if (destination != source) { 
    free (destination);
    destination = 
      (char *)malloc(strlen(source)+1);
    strcpy(destination, source);
  }
  return;
}

(I assume that you can read these without comment, clearly destination must be a pointer to dynamic memory. Oddly there is no way that I know of whereby I can make that constraint statically enforceable)

Should I check source is not a null pointer? I think, on reflection, that I should.

What is the difference between the two implementations? In the first I explicitly check for the exceptional condition (that destination and source are aliases for the same string) and clearly exclude it from further processing. In the second case, I check for the norm and thereby implicitly exclude the exception case. I believe that the first style is easier to follow but in the final analysis it is like the difference between English and German (where do you put the primary clause in a multi-clause sentence?). On second thoughts, perhaps it is more like the difference between writing paragraphs composed of short sentences instead of single monolithic sentences.

Note that my focus is on making code intelligible to a reason-ably experienced reader. This allows intent to be maintained in future revisions. I know some (perhaps many) of the experts will completely disagree but I ask them to reflect on the variety of writing styles that co-exist for natural language.

My simple guideline is 'return when you have finished'. Of course like all guidelines for writing it has to be used intelligently, but for me the test is whether I can read the code with understanding rather than struggle to use it as a guide to the inner workings of the programmer's mind. Note that in a world of reusable code, people really should not be changing working code. If we can teach programming styles that help the programmer to get it right first time then the needs of future maintainers are of much less importance. It is a pre-requisite of code reuse that we have correct code to reuse.

In a recent discussion with Francis about code complexity he suggested a criterion for choosing between 'if ... else' constructs and 'switch ... case' ones. He suggested that 'if ... else' should be restricted to genuine conceptual boolean decisions while 'switch ... case' should be used in all cases where there were potentially more than two choices even if only one case clause was actually written. I think this has much to commend it. 'switch ... case' constructs are far easier to maintain by adding, modifying or removing cases. I think that I would always include a default clause even if it was a null statement. Nested ifs always leave me feeling less than secure because they are so easy to get wrong, none-the-less they are frequently necessary, but much less often than they appear in current code.

Let us have a couple of pieces of code. First a case that is often treated as boolean but isn't:

unsigned char retire_at(char gender){
  switch (gender) {
    case 'm' : return male_retirement_age;
    case 'f' : return female_retirement_age;
    default : return handle_unknown_gender();
  }
}

Some compilers may demand an extra return statement because they cannot do a flow analysis to determine that all actual paths through the code terminate with a return statement. Note that the default clause calls a function whose value is returned. This function might not return because it calls abort(), exit() or uses longjmp or (in C++) throws an exception. It is not the task of the retire_at() function to determine the correct action if gender does not contain an appropriate value it just needs to provide a peg on which the solution can be hung. This assists code reuse because different handle_unknown_gender() functions can be provided as appropriate to the application being written.

Note the multiple return statements. Of course I could write the code something like:

unsigned char retire_at(char gender){
  unsigned char return_value;
  switch (gender) {
    case 'm' : return_value = male_retirement_age;
      break;
    case 'f' : return_value = female_retirement_age;
      break;
    default : return_value = handle_unknown_gender();
  }
  return return_value; 
}

However this code cannot be more efficient, is probably less efficient, and is vulnerable to forgetful programming (forgetting a break). I believe that the coding style of the latter is defective because it adds unnecessary detail and potential breakage points.

Writing this function with 'if ... else' we have:

unsigned char retire_at(char gender){
  unsigned char return_value;
    if (gender == 'm') return_value = male_retirement_age;
    else if (gender =='f') return_value = female_retirement_age;
    else return_value = handle_unknown_gender();
  }
  return return_value;
}

This is even more defective because it requires more care from the programmer to get it right. Just think how easy it is to patch the first example to handle 'M' and 'F':

unsigned char retire_at(char gender){
  switch (gender) {
    case 'M' :
    case 'm' : return male_retirement_age;
    case 'F' :
    case 'f' : return female_retirement_age;
    default : return handle_unknown_gender();
  }
}

Or what about the change to manage potential use in non-English applications?

enum Gender { male='m', female = 'f'};
unsigned char retire_at(char gender){
  switch (gender) {
    case male : return male_retirement_age;
    case female : return female_retirement_age;
    default : return handle_unknown_gender();
  }
}

Now only the enum Gender needs to be touched to support characters that would be more logical in other languages. (Of course male_retirement_age and female_retirement_age are probably also provided via an enum though there are other options that would support late determination of these values).

Now to a case that is essentially boolean:

void something(int size){
  assert (size>0);
  {
    int * handle;
    handle = (int *)malloc (size * sizeof(int))
    if (!handle) out_of_memory((void *)handle, size*sizeof(int));
    /* do rest of process */
  }
}

Once again I call a function to deal with the problem case. I do not know if this function can fix the problem for me and then return to continue the process so I pass it enough information so that a fix can be provided if possible. Most likely the program will terminate through a mechanism that is appropriate to whichever of C or C++ that it is being compiled as.

Breaking early

While I am happier with switch() statements that either consistently finish each case with return, or finish each with break I do not rule out mixtures where some cases logically terminate the functions intent while others leave more to do. I would strongly advocate some sensible layout convention to make mixtures manifest (i.e. stand out). It is the use of break in iterations that frequently causes debate. I like to view it as akin to a split infinitive in English, undesirable as a rule but tolerable in extremis. Removing split infinitives often requires substantial reworking of a sentence, removing break from iterations often means reworking code. If you really cannot find a readable alternative that avoids the use of break, use it but discuss it with other programmers to see if you simply have a coding blind-spot.

On the subject of iterations, try to use the three options provided by C to distinguish different intents. The for(...;...;...) is best reserved for counted situations. Yes, it can be used to the exclusion of the other alternatives but that often hides your intent. If you want to iterate something n times use a for(i=0; i<n; n++), if you want to run some code as long as a condition is true use a while(...). If the code must be run once then use a do {...} while(...); statement. Examine the following code fragment:

/* other code */
char c=1;
while (c!='Q') {
  c=getc();
  /*iteration process */
}

Why has the c been initialised? To ensure that the keyboard is read at least once. But consistent use of the most appropriate construct will make this intent much clearer even if you elect to initialise c as a matter of principle and safer when you forget to initialise c.

char c=0;
do {
  c=getc();
  /* iteration process */
}while (c!='Q');

What I am saying is that you should choose code constructs that help readers understand what you intend to do. There is much more to writing readable code than following some naming conventions and layout rules.

What About `continue`?

Well what about it? If you have understood what I have written so far you will realise that my guideline is simply 'Use continue when its use will help make code easier to understand. Don't use it just to get out of a complicated spot.'

Pre-conditions and Post-conditions

When Francis mentioned the gist of my first draft to Sean Corfield, Sean quite correctly pointed out that the reason for having a single return point in a function is to provide a point where post-conditions can inserted by those maintaining or reworking code. The problem with this is that it does not work in C++ because the existence of exceptions may cause early return from almost any code segment. I think it was Francis who suggested that we could make use of a 'atreturn()' function to do for single functions what atexit() does for programs.

If you want to provide pre-conditions and post-conditions you must examine other coding techniques. First you should consider non-invasive methods (i.e. don't touch working code) such as wrapping the function in another one. You will still have to make some changes - you may need to rename the original function and some value parameters may need to be converted to reference ones (if you are using C++ to support that). Let me give a very simple example:

int fn (int i);

It does not matter what this function does, but suppose that you wish to apply a pre-condition that the argument is positive and a post-condition that the return value is between 1 and 100 and that i is in the range -50 to 50.

In the implementation file for fn() add the following:

int fn (int i) {
  assert (i>0);
  int j=oldfn(i);
  assert (i<51 & i >-51);
  assert (j>0 & j<101);
  return j;
}

Change the original definition of fn() so that the first line is now:

static inline int oldfn(int & i) {

As long as you only want to impose conditions on the parameters and return value this mechanism will work fine in C++. It is not so easy in C because of the lack of reference variables makes checking post conditions on parameters difficult.

If you want to apply post-conditions to internal variables of a function you have no option but to touch the internals of the function. If you want to make the post-conditions robust against exceptions you have to start exploring such things as local classes with destructors that handle the post-conditions but that is another story.

Actually, I think retro-fitting post-conditions that are applied to local variables is generally a poor idea. Organising code for such purposes requires a somewhat different style from scratch. Perhaps what I am saying is that when you write code for a safety critical (or just mission critical) product you should not expect to reuse code that was written in a less demanding environment. In general code should be written to criteria that is appropriate to its purpose. Just as applying the standards of games programming to writing code for a fly-by-wire aircraft is inadequate, applying the standards appropriate to writing code for the control systems of a nuclear generator to a word-processor is probably a costly error. 'You know what I mean.' will do in some circumstances while 'I need know you know what I mean.' is a minimal requirement in others.

Good programming requires the use of an appropriate coding style. A good programmer understands this a poor one does everything the same way. Let me finish with a short Zen story.

It is said that one of the great Zen masters always answered questions by holding up his index finger. One of his students noted this and decided to adopt this deep and wise response to questions. One day the master asked the student a question. When the student raised his index finger, the master cut it off.

Rules of thumb are fine as long as you understand their meaning. They are dangerous (even deadly) if you use them without understanding.

Notes: