Title: Francis' Scribbles

Author:

Date: 06 August 2003 13:15:58 +01:00 or Wed, 06 August 2003 13:15:58 +01:00

Summary:

Body:

Concerning Complexity

Recently there was a long thread about the issue of whether we should use more than one return statement in a function. I think the most important thing is that we should try to understand the issues.

It is easy to say 'functions should have at most one return statement'. Actually if they are truly functions they need at least one return statement because the correct computer science term for something that does not return something is 'procedure'. But let me put that to one side and focus on the concept of a function in the context of C and C++.

There is certainly a correlation between bugs and multiple returns. Functions with more than one return have statistically more bugs. However there is a similar correlation between bugs and number of statements in a function. Functions with many statements tend to have more bugs than those with only a few statements. Finally there is a correlation between number of statements and number of returns; functions with many statements also tend to have multiple returns.

What I am suggesting is that the fault may not lie with multiple returns but with writing complicated code. We should avoid writing code that conceptually has multiple points of return even if we hide that by introducing extra variables or structures. The problem lies not with the multiple returns but with code whose design has more than one logical exit point.

But sometimes an algorithm has logically more than a single exit point. In such cases jumping through hoops to hide this feature breaks what I consider to be the prime directive for programming: 'Code should be a clear expression of the solution of a problem.'

There is a second programming directive 'Stop when you know the answer.' Let me give a couple of examples. The first is admittedly contrived but only to the extent that I want some simple code to illustrate a general principle.

Stop When You Have the Answer

Suppose that you have a two dimensional array of int (perhaps one representing the colours of pixels in a window) and you want to discover if a specific value is used anywhere in the array. For convenience let me assume that the data is stored in a std::vector<std::vector<int> >.

Here is a simple function to supply that answer:

bool contains
    (vector<vector> > const & array,
     int value) {
  int const rows(array.size());
  int const columns(array[0].size());
  for(int row(0); row != rows; ++row) {
    for(int col(0); col != columns; ++col) {
      if(array[row][col] == value) {
        return true;
      }
    }
  }
  return false;
}

Now if you are a believer that functions should never have more than a single return you have a problem because however you reorganise your code the requirement is for two distinct exit conditions. The only ways these can be combined in a single return statement require either continuing processing after you know the answer or increasing the perceived complexity of the code. Increasing perceived complexity is a dangerous game because it increases the probability of bugs.

Yes, I am entirely capable of writing a definition of contains() so that there is exactly one return-statement but the cost will be an extra variable (trivial) and a more complicated condition in each of the forstatements. I happen to think that that violates my prime directive; it makes it harder to see the code as a clear expression of the solution to the problem. To justify that breach I need some compensatory benefit. I remain unconvinced that there is any. But maybe you know otherwise.

A Clear Expression of the Solution

Here is a function from a program of mine that implements Dr Conway's Life automata game:

bool will_be_alive(life_universe const & data,
                   int i, int j) {
  int const diagonal_neighbours(
      data[i-1][j-1] + data[i-1][j+1] +
      data[i+1][j-1] + data[i+1][j+1]);
  int const orthogonal_neighbours(
      data[i-1][j]
      + data[i][j-1] + data[i][j+1] +
      data[i+1][j]);
  int const live_neighbours
      (diagonal_neighbours
      + orthogonal_neighbours);
  if(live_neighbours == 3) return true;
  if(live_neighbours == 2) return data[i][j];
  return false;
}

There are many things about this code that might be worth discussing. One of them is the elimination of magic expressions by using const qualified definitions to provide names for the computational steps. Another is that I have consciously avoided the temptation to write the evaluation of the live neighbours as a pair of nested loops. I want a clear expression of the solution to the problem. I think that my solution is sufficiently clear that even those who know nothing about Life will be able to work out the rule for determining the status of a cell in the next program cycle.

I have not avoided writing the evaluation as a pair of nested loops for efficiency purposes (though I might get that as well) but for clarity of expression. In addition, I have laid out the source for the first two definitions (diagonal_neighbours and orthogonal_neighbours) to assist the clarity and not because of the constraints of the column width of C Vu.

Now look at the final three statements in that definition. Of course I can write them with a single return-statement:

return(live_neighbours == 3) ? true
: (live_neighbours == 2) ? data[i][j]
: false;

I do not think that the use of the conditional operator adds anything to the clarity of the original. I would expect any reasonable compiler to generate identical code for a release version though perhaps not for a debug version.

Subjective v Objective

For the teacher or instructor a rule such as 'functions may only have one return-statement.' is easy to measure but it does not result in good code in and of itself. What the student needs is a clear understanding of quality and I am afraid that that is something much more subjective. Most of us recognise good quality code when we see it but I doubt that we can exactly pin down what it is. Perceived complexity is a function of many things (one of them being the individual reader). Consider the following functionally equivalent implementation of will_be_alive():

bool wba(vector<bitset<512> > const & d,
           int x, int y) {
  int const nd(d[x-1][y-1] + d[x-1][y+1] +
               d[x+1][y-1] + d[x+1][y+1]);
  int const no(d[x-1][y] + d[x][y-1] +
               d[x][y+1] + d[x+1][y]);
  int const l(nd+no);
  if(l == 3) return true;
  if(l == 2) return d[x][y];
  return false;
}

in which I have limited changes to names and layout, isn't that code harder to follow? And what about:

bool will_be_alive(
life_universe const & data,
                int i, int j) {
  int neighbours(-data[i][j]);
  for(x(i-1); x < i+2; ++x) {
    for(y(j-1); y < j+2 ++y) {
      neighbours += data[x][y];
    }
  }
  return(neighbours == 3) ? true
  : (neighbours == 2) ? data[i][j]
  : false;
}

Is that more or less complex? I know what I think, but if you think it is fine, consider how you would change the above code if the rule for life gave different weights to diagonal and orthogonal neighbours, or if the rule included orthogonal neighbours that were two cells away from the cell being considered. Those nested for-statements coupled with the odd initialiser for neighbours just add fog.

Being Helpful

One of the by-products of writing a book is to discover how helpful people can be. Of course I will be giving full credit to the main players in the book's acknowledgements section but as the printed copy won't be around till early December and its target readership has little if any overlap with the membership of ACCU I thought that I should take time out to express my gratitude to several people who have provided a lot of added value.

Because he is a fellow author, and most particularly because one of his books might be considered a competitor (Teach Yourself C++ 7ed) I have to give pride of place to Al Stevens, a well known columnist for Dr Dobbs Journal.

Not only has he made his beginners IDE (Quincy) available to all including rival authors but he also takes the trouble to maintain it and try to address any problems with it that are raised by its users. Giving a product away is one thing but adding in support is another. Both should be appreciated.

Then I emailed him to enquire about his recommendations for providing an install mechanism for a CD. Straight away he responded saying that I was welcome to use the source code provided on the CD that comes with his book. That source code provides a simple little installer and uninstaller that is exactly what computer tyros need. Al is clearly that exceptional author who tries to meet the needs of his readers. I may not agree with all that he writes and my book takes a very different tack from his but my life and the lives of my readers will have been made much easier by his generosity.

Then there is all the work that Garry Lancaster has contributed in refining the original Playpen concept and implementing it so effectively. Again, without Garry's work my book would be a much poorer product. When we have checked that there are no hidden surprises I will arrange for my Playpen Library to go on our website where it will be free for anyone to use. I think it will be useful to any novice who wants to experiment with graphics, a mouse and direct access to a keyboard. For the time being they will be restricted to MS Windows OSs.

However, as soon as I have the time to test it out I have an X Window/Posix version of Playpen, which should considerably extend the portability of Playpen based code. Jean-Marc Bourguet has done that work.

Then there is a little article written by James Holland about a flood-fill algorithm he originally used in Pascal. The algorithm dates from 1988, so that explains why I missed it as all my graphics algorithms come from earlier times. I hope the Editor gets to publish that article because I have a response lined up generalising James' contribution to work in full colour.

Help Needed

Before I can add Linux users to the target readership I still need a simple IDE for that platform. Note the word 'simple' because I consider that essential. Learning to program is intellectually demanding and so we should avoid adding anything that can possibly be avoided.

Perhaps you have some expertise at customising one of the emacs type editors so most of the functionality can be suppressed. Novices do not write highly complicated code where such things as resource editors, class browsers and the like prove helpful. They want to have automated project management. In other words they should be able to determine what source code files and libraries will be used and where the user written header files will be found. They need a single hotkey to compile a source code file, they need a single hotkey to build a project. Finally they need to be able to use the compiler generated error messages to locate the relevant point in their source code.

What I need is a simple coupling between a public domain editor and GCC (preferably 3.2 or later). I am sure there are many people out there for whom this would be a wet Friday afternoon job (i.e. effectively a nobrainer). The real problem is finding one who does not immediately want to add a whole bundle of goodies.

Learning C++ using an MS Windows OS?

I have no doubt that if that describes you that you should seriously consider using the combination of MinGW and Quincy. That combination will allow you to keep focused on you programming rather than juggling with a whole bundle of gadgets.

For those that do not know, MinGW stands for minimalist GCC for Windows and Quincy is Al Stevens' cat.

Problem 10

int foo(bool read_all) {
  if(read_all) {
    string line;
    getline(cin, line);
    return atoi(line); }
  else {
    int i;
    cin >> i;
    return i; }
}

There are quite a few things wrong with the above function definition. Assume that all the appropriate headers have been included, what particular feature of C++ input makes it completely unusable?

Commentary on Problem 9

int foo(int i) {return ++i;}
int bar(int i) {return i;}
int main() {
  int i(21);
  cout << (foo(i) + bar(i++));
}

That one gives us a straightforward case of undefined behaviour because the call to foo() reads the value of i for no other purpose than because it needs the value. On the other hand the call of bar(i++) both reads and writes i. The Standard very definitely states that such mixture of uses (a plain read and a distinct write) without an unavoidable sequence point results in undefined behaviour. Note that the rules change if i is a user defined type. That means that it can matter if a template such as std::vector<> uses a plain pointer as an iterator or not. For example:

std::vector<int> vec(10);
std::vector<int>::iterator i = vec.begin();
std::cout << *i + *i++;

may or may not have undefined behaviour. If std::vector<int>::iterator is a user defined type the above simply has unspecified behaviour (either of the two sub expressions can be evaluated first. But if plain pointers are used then the result is undefined behaviour.

int fooref(int &) {return ++i;}
int barref(int const &) {return i;}
int main() {
  int i(21);
  cout << (fooref(i) + barref(i++));
}

Notice the difference, the fooref(i) does not need to evaluate (read) i, it just binds the parameter to the argument. The internal increment is fine because it is protected by the sequence points after the evaluation of the arguments and before the return. barref(i++) binds an rvalue (so it reads i) to the int const & parameter. As the only unprotected read of i is for the purpose of determining what should be written by the post increment we no longer have undefined behaviour. I think that if you check it carefully you will also discover that the result is still unspecified because the result depends on which function is evaluated first.

Cryptic Clues for Prizes

This time there were several offers of solutions. Perhaps the older readers remember playing with their electronic calculators with seven segment displays. For the younger readers there was a fad in the seventies for making up calculations that resulted in displays that could be read like words (sometimes upside down). In case you did not work it out, the solution to It looks like a call for help. is 505.

There are many potential numbers that can be clued this way (upside down is more productive) and we sometimes have to tolerate mixtures of upper and lower case letters. So we have 'Almost a capital example of what might be sold on an Antipodean shore.' in six digits is 577345. You need to check how a seven-segment display handles 4 to understand that one.

I am sending (when he jogs my memory) a copy of the C++ Pocket Reference to Bob Adler whose response had the added value of referring to his days in the dojo.

Now if I gave you 'That foolish day' as a clue you might quite reasonably think of 1st April. But what might 'An English programmer gets pieces of eight for the day of fools.' give as a two digit number? As an added clue an American would get a three-digit answer.

Again I will give a small prize to someone who supplies the English answer.

Notes:

More fields may be available via dynamicdata ..