Journal Articles

CVu Journal Vol 13, #6 - Dec 2001 + Programming Topics
Browse in : All > Journals > CVu > 136 (8)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Summing Arrays

Author: Administrator

Date: 03 December 2001 13:15:48 +00:00 or Mon, 03 December 2001 13:15:48 +00:00

Summary: 

Body: 

"What's one and one and one and one and one and one and one?" Lewis Carroll had Alice ask. In his real life as the Reverend Charles L Dodgson, an accomplished mathematician and tutor at Christ Church College in the University of Oxford, Dodgson was no doubt aware of the problems this question would cause his students. But he could not be expected to have imagined the difficulty that finding the sum of a list of numbers would cause today's computer programmers.

Let me reword the opening question in a form which you could go away and program. Write a function which takes a pointer, 'a', to an integer array of length 'len_a', and return the sum of the elements as the function value.

And what I am going to talk about in this column is the pitfalls you can get into trying to answer this question and suggest some ways of getting the utmost out of the limited computer resources you have available to solve the problem.

The First Attempt

The solution initially seems easy enough. Use a local variable sum_a to hold the running total of the elements. Then, should not *a and sum_a be of the same type, which is int? And len_a must be of type size_t because, being used as an array index, it is conceptually of the same type as that returned by the sizeof operator.

Look at:

#include  stddef.h
int sum_1 (int *a, size_t len_a)
{
  int     sum_a = 0;
  size_t  i;
  for(i = 0; i < len_a; i++)
    sum_a += a[i];
  return (sum_a);
}

Certainly I have written code like that before now.

But of course, a variable of type integer, or any other type for that matter, can only hold a certain range of numbers. If you want to know what this range is, it is defined as macros in the standard header file limits.h for the integral types and float.h for the floating types.

So sooner or later, depending on what is in the array a, the variable sum_a will overflow. That is, the sum of the elements added up so far will be too big to be held in sum_a. If you are lucky, the computer will stop your program and tell you this. More likely, it will merely store an incorrect answer and another famous "computer error" is born.

This may happen far sooner than you think. Assuming a 16-bit signed integer, and that the first two entries in the array are both over 16383, sum_a will be incorrectly set after only the first two entries of a have been summed. This is not very much if the things being summed are entries, in pence, in a bank account.

The first attempt to fix the problem with sum_1 might be to change the type of sum_a from int to long int. And because a long int can hold a far larger number than a simple int, the sum is less likely to overflow. There is a performance penalty to be paid for this in terms of a longer run time (yes, probably if a 16-bit int is the native type for the hardware, but not likely otherwise- Francis), but that is a small price for ensuring you get the right answer, and may even be nothing for some of the bigger microprocessors now around.

What this has done is to introduce some extra bits into the sum which guard against the overflow. If an overflow of a 16-bit add were to occur, there is somewhere for the computer hardware to safely record this fact without having to potentially return an incorrect answer.

Now a long int also has a limit to the largest number it can correctly hold. Assuming 32-bit long int, that limit is 65536 times bigger than the limit of a 16-bit int. This means that if the values in a are all as big as they can possibly be, at 32767, the length of a must be 65539 or more before any overflow can occur.

Of course, if the values in a are smaller, a proportionally longer array can be summed before there is a problem. And if positive and negative values occur with equal probabilities, you can expect that it will take about 65539 squared times as long before an overflow occurs. That is over four thousand million times longer, so is extremely unlikely to happen.

But now what happens if the type of the array a is itself a long int? sum_a is already a long int, so you are back to the original situation: after summing just two entries you might have an integer overflow to contend with.

To circumvent the problem this time, you cannot make sum_a a long long int because the C Standard does not provide such a type (it didn't, but as of 1999 it does, but that only moves the problem up a level). Now there is no real reason why an enterprising compiler author could not implement a long long int. But if you were to use it you might subsequently find that only his old, extremely expensive, compiler would build your code, and then not for the new microprocessor you wanted to use.

Using a floating type (i.e. float, double, long double) is not really the way to add up a long list of integers, even if the latter two types will generally give a greater accuracy than a long int.

So what are you to do?

By thinking about the sort of numbers you have to sum you can decide what is going to go wrong. Then you can chose between several careful ploys (i.e., tricks) to postpone the problem as long as possible, hopefully to the point where the sum has been found before any overflow occurs. Knowing where the numbers came from will help do this. This may sound a bit like needing to know the answer before you can find the answer, but life is like that.

Try Biasing the Array

If the numbers are all roughly the same (and what "roughly" means is all relative), you might try subtracting an estimate of the mean (or average) from all the numbers before you sum them. Then, of course, the total of the numbers is the sum of the numbers after the mean has been subtracted plus the product of that mean and len_a. Now if the original sum is too big for an int, so too will be that final calculation. So both the estimated mean and the sum of the biased numbers must be returned to the caller.

You have to estimate the quantity to subtract from the numbers without finding their sum, and a fairly safe way to do this is to take the average of the first, middle and last elements in the array. I hold this quantity in bias_a and hope that its calculation does not overflow. The example then becomes:

#include  stddef.h
int sum_2 (int *a,size_t len_a) {
  size_t  i;
  int bias_a = (a[0]+a[len_a/2]+a[len_a-1])/3;
  int sum_a = 0;
  for(i = 0; i < len_a; i++)
    sum_a += a[i] - bias_a;
  return (sum_a + (bias_a * len_a));
}

I return the sum directly here, knowing it will overflow, rather than returning bias_a and sum_a but I think you will see what I mean. And although intended for summing large arrays, the example will work for small values of len_a of 3 or less: always check that your code works for arrays of length one and does something tidy for lengths less than one.

If the sum of the numbers exceeds the range of an integer, ultimately you have to accept that fact. But is there something one can do if, despite the idea of sum_2, the partial sum exceeds the legal range, but the final sum does not?

Well, yes there is, and this is my second trick.

Ask What is Going On

Usually when summing an array, a few positive numbers are added to the partial sum, so advancing towards the positive limit. It is then fortuitously found that several negative numbers are encountered which return the partial sum back towards the negative limit. That is, the partial sum alternately approaches the maximum and minimum integer values. And disaster occurs when this random variation causes the partial sum to cross one of those bounds.

The trick is to postpone this disaster as long as possible and the surprising way to do this is to sort the numbers before summing. Then add up all the numbers, starting from one end of the sorted array. The partial sum will almost certainly overflow before long, but you can check for that and stop just before it occurs. Then start accumulating numbers from the other end of the array. These should have the opposite sign to those just added in, so will move the partial sum quickly away from the danger zone of one limit until the other limit is reached.

Repeat this, changing the end being summed from each time an overflow is about to occur. And so the algorithm proceeds with the running total lurching from one extreme to the other, until all the numbers have been summed.

Of course, if your luck runs out, and the remaining numbers at both array ends have the same sign, this algorithm fails, but then the sum exceeds that which can be held in an integer anyway.

Look at the following example to see how this is done.

#include  stdio.h
#include  stddef.h
#include  stdlib.h
#include  limits.h

int compar (void *ap,void *bp){
  int a = * (int *) ap; 
  int b = * (int *) bp;
  if(a == b) return 0;
  return ((a  b) ? -1 : +1);
}

int o_flow (int sum, int next){
  /* Determine if adding 'next' to 'sum' would
   * cause an overflow.*/
  return (next  0 ?
    sum  INT_MIN - next :
    sum  INT_MAX - next
  );
}

int sum_3 (int *a, size_t len_a) {
  int *al, *au, sum_a, working;
  /* Sort the data into ascending order. */
     qsort (a, len_a, sizeof(a[0]), compar);
  /* Set up the summation.  */
  sum_a = *a;
  al = a + 1;
  au = a + len_a;
  /* Loop from both ends inwards until all data 
   * summed.   */
  while (al  au) {
    working = 0;
  /* Accumulate from the bottom end up until a 
   * further accumulation would cause a negative
   * overflow.  */
  while (al  au) {
    if (o_flow (sum_a, *al))
      break;
    else {
      sum_a += *al++;
      working = 1;
    }
  }

  /* Loop from the top down until a further item 
   * would cause a positive overflow */
  while (al  au) {
    if (o_flow (sum_a, *(au - 1)))
      break;
    else {
      sum_a += * --au;
      working = 1;
    }
  }

  /* If neither of the inner loops did anything, 
   * the sum will not fit an 'int' and so we 
   * cannot find the sum. */
  if (! working) {
    fputs ("'sum_3' failed.\\n", stdout);
    exit (EXIT_FAILURE);
  }

  /* Successful completion. */
  return (sum_a);
}

#define LEN_A_MAX       16000
int     a[LEN_A_MAX];
int main (int argc, char *argv){
  size_t  len_a, i;
  int     sum;
  unsigned int    seed;
  len_a = get_int("Array length to sum: ", 1, 
                 LEN_A_MAX);
  seed = get_int(" Random number seed: ", 0,
            RAND_MAX);
  srand (seed);
  for(i = 0; i  len_a ; i++) {
    a[i] = rand ();
    if(rand()  RAND_MAX/2) a[i] = -a[i];
  }
  sum = sum_3 (a, len_a);
  printf("'sum' = %1d\\n", sum);
  return EXIT_SUCCESS;
}

The summation function is called sum_3. It uses the Standard function qsort together with the local function compar to sort the array.

Note that the body of 'compar' cannot be simply:

 return (* (int *) ap - * (int *) bp);

because if both *ap and *bp are large numbers of opposite sign, this too will overflow. Instead, I compare the numbers and return a constant indicating the sign of the difference. I then rely on the compiler correctly comparing two numbers irrespective of their values. That is what it is for.

As an aside on compar note that although I know that the two arguments point to integers, they must be defined as pointers to void, not pointers to int, because of the definition of the qsort function. compar must then explicitly cast every use of the arguments to a pointer to int. This generally means that a well-written compar will either copy the arguments to local variables which are pointers to the wanted type or, as I have done here, immediately cast the arguments to the correct pointer type and dereference them with the * operator to obtain the values pointed to.

I use al in sum_3 to point to the next element to be summed at the lower end of the array and au to point to the last element already summed at the array upper end. I do things this way round because the C Standard guarantees that the address of the element just past the array end must exist (even though its value does not) but does not require that the address of the element just before the first must exist. This becomes important when checking to see if all array elements have been summed.

The summation is more complex, having an outer loop which runs two inner loops in turn. The first uses al to sum from the array's lower end upwards and the second uses au to sum from the upper end downwards. The array has been completely summed when au and al point to the same object.

The function o_flow is used to see if adding next to the partial sum would result in an overflow. This it has to do without itself causing such a catastrophe. The solution is to see if next is positive or negative, and so whether the most positive, or most negative, limit is that threatened. Decreasing the limit by next results in a limit closer to zero, so is itself within bounds. sum must not exceed this bound (in the appropriate direction) if adding next to it is not to cause an overflow.

The variable working is used to ensure that both inner loops add in at least one number on each pass. This traps the possibility that al and au-1 both point to numbers of the same sign and an overflow would occur if either were added in. As I said earlier, this occurs if the sum cannot fit within an integer. When it does occur, I flag this failure to exit().

The outer loop exits when all array elements have been summed, and that sum is returned as the function value.

My main function is a test for sum_3. It uses the get_int function I wrote about in an earlier article to get from the user an array length and seed for the pseudo random number generator, rand. An array of numbers is generated, summed, and that sum printed.

Normally, one would test a function with small data sets, or array lengths in this case. But because here I am relying on the randomness of the test data to enable the function to keep going, I found that array lengths of 100 or more might be successfully summed by sum_3, whilst arrays only 25 long would often fail, even though the longer array started with the same numbers as the shorter array. When summing a few random numbers, there is too great a chance that their sum is well distant from zero.

When All Else Fails

If neither of the above ideas enable you to find the array total without problems, about the only option left is to resort to using an arbitrary precision arithmetic package. These allow you to do arithmetic to whatever accuracy is necessary to solve the problem to hand. But describing one is well outside the scope of this article, and would probably justify a book in its own right.

Outstanding Points

I have left in a nasty shock for the user of sum_3. Because I sort the array a in place, the array is not the same after sum_3 returns as when it was called. This an is unreasonable side-effect, and I leave it to you to fix for me.

A point that some may like to look at is that the way I set a[i] in main results in it being assigned (on my 16 bit system) only 16384 different values. What should be done to make a[i] potentially have any of the 65536 values allowed of a 16 bit integer?

Despite my earlier claims, neither sum_2 nor sum_3 work if handed an array length of zero. You might like to look at this too. But there is no need to check for a negative array length because of the definition of size_t.

Notes: 

More fields may be available via dynamicdata ..