Title: Student Code Critique Competition 37

Author:

Date: 06 December 2005 05:00:00 +00:00 or Tue, 06 December 2005 05:00:00 +00:00

Summary:

Body:

Prizes provided by Blackwells Bookshops & Addison-Wesley

Please note that participation in this competition is open to all members. The title reflects the fact that the code used is normally provided by a student as part of their course work.

This item is part of the Dialogue section of C Vu, which is intended to designate it as an item where reader interaction is particularly important. Readers' comments and criticisms of published entries are always welcome, as are possible samples.

Before We Start

Remember that you can get the current problem set in the ACCU website (http://www.accu.org/journals/). This is aimed at people living overseas who get the magazine much later than members in the UK and Europe.

Student Code Critique 35 - Further Entry

Following on from the entry for SCC 35 in the last issue, Jim Hyslop <<jhyslop@dreampossible.ca>> wrote:

I'd like to congratulate Simon Farnsworth on a well thought out critique. I agree with his contention that '"GetReport() is a mess." He is also correct in his final statement that the function should throw an exception. To me, that is the first and most important point I’d discuss, not the last one. Null references are not allowed, period. They invoke the dreaded Undefined Behaviour. Do your best not to accidentally create them, and never, ever deliberately create them.

My take on the class is the opposite of Simon's: it seems to me that Report is intended to add vector-like behaviour to a map, i.e. given an index n, find the nth entry in the map. Report IS-A map, with additional features. So, given our two different interpretations of the intent of the class, the next important improvement is to beef up the class's comments to explain in more details the "why" of the class, so that we don't have to guess at the class's purpose.

One important objective to keep in mind when critiquing other people's code, whether you are an instructor, mentor, colleague, or whatever, is not only to say what needs to be done, but why. For example, Simon recommends using iter-> rather than (*iter). Knowing why you should do this is important. Similarly, why should one use pre-increment over post-increment? I agree with the recommendation, and with his reasoning, but his reasoning is a little shallow: in what way is the code simpler? (I leave answering these questions, as well as the question: "why is preferring pre-increment over post-increment not a premature optimization?", as an exercise to the reader :-).

I do, however, disagree with his statement that "the optimizer should fix it for you". The optimizer can only fix this for builtin types. The exact type of a container's iterator is implementation-defined, so an implementation is not required to actually use a pointer: it could use a class or struct. gcc, for example, uses a struct as its iterator. In that case, the compiler cannot arbitrarily substitute pre-increment for post-increment.

Now, back to the original submission.

I'm assuming that the member function definitions are shown inline as an SCC convention, in order to conserve space. Member functions should not, of course, be written as inline until and unless profiling shows them to be a bottleneck.

Retrieving the nth element in a map can be simplified in user code by taking advantage of the distance properties of iterators:

Report & Reports::GetReport( int index )
{
  if ( index <0 || index >=size() )
    throw out_of_range;
  return (begin() + index)->second;
}

- From 17 lines down to 3. Not bad.

Note that the iterator's op+ does not do any range checking. If index is negative, or exceeds the size, you will invoke undefined behaviour (a crash, if you're lucky) so we need to do strict checking on index before we do anything else.

This is not necessarily more efficient than the original code, since iterator's operator +(int distance) (which is implemented in terms of operator +=) basically boils down to:

  if ( distance >= 0 )
    while ( distance-- ) this->operator++();
  else
    while ( distance++ ) this->operator--();

but it is easier to read and follow in user code.

If profiling indicates we're spending a lot of time in the increment, then this can be optimized by determining whether index is closer to begin() or to end():

Report & Reports::GetReport( int index )
{
  if ( index <0 || index >=size() )
     throw out_of_range;
  if ( index > size()/2 )
     return (end() - (size() - index))->second;
  return (begin() + index)->second;
}

On variable naming: What is nIndex? Well, clearly it's an index, but what is its purpose? If one is going to use Hungarian Notation then that can be clearly indicated by using the proper prefix i rather than the generic n (remember, in true, Simonyi notation^[1], i does not mean integer, it means index). iter is, similarly, clearly an iterator, but again: what is its purpose? The variable names do not convey any meaning to the reader. The author is also using a mix of Hungarian Notation, and non-HN. Pick one or the other, and stick to it (as an aside, the iter highlights the problems with using Hungarian Notation in C++ programming: each time you create a new type, you need to invent a new prefix. Personal preference: avoid HN, as it adds very little value.).

So, without good, descriptive variable names, we resort to studying the code to determine what the variables are. As Simon pointed out, the member variables seem to be an optimization (side note: in addition to the robustness concerns he noted, the optimization is neither thread-safe nor reentrant, although I suppose that could all be lumped under 'susceptible to changes between calls'. Side note 2: the author of the code seems to have recognized at least some of the problems Simon pointed out, since there is a test to see if nIndex exceeds the size of the map, implying that the author knows some elements could be removed between calls to GetReport). So, um... where was I? Oh, yeah. The next thing to do is give the member variables meaningful names, make them private,

private:
  int lastIndexRetrieved;
  iterator lastElementRetrieved;
and, of course, initialize them in the ctor:
Reports() : lastIndexRetrieved( 0 ),
   lastElementRetrieved( begin() ) {}

Notice that I did not recommend adding a comment to explain their purpose, I specifically recommended changing the variable names.

If the robustness concerns Simon noted are addressed (not likely, but possible if Reports is populated once and not modified), and profiling indicates we are spending a lot of time incrementing/decrementing iterators, then GetReport can be optimized as follows:

Report & Reports::GetReport( int index )
{
  if ( index < 0 || index >= size() )
     throw out_of_range;
  int distance = index - lastIndexRetrieved;
  lastIndexRetrieved = index;
  lastElementRetrieved += distance;
  return lastElementRetrieved->second;
}

A variation on the begin()/end() optimization shown above can be applied, by determining which is the closest iterator to the desired index: begin(), lastElementRetrieved, or end(). If, for example, there are 1000 elements in the map, the last element retrieved was 14, and we want element 990, then it is faster to count back from end() rather than forward from lastElementRetrieved. But again, only if profiling indicates we're still spending a lot of time iterating through the map. You may end up spending more time figuring out which element is closest, than actually spinning through the list!

ClearAll() appears to me to be a convenience function, to allow the user to reset all data in the reports with a single function call, for subsequent re-population. Given the ownership issues Simon pointed out, we must assume that some external mechanism is responsible for adding and removing Report objects to and from the map, and for cleaning up the data on program termination. There are similar ownership issues around the key, Data *, which we again must assume some other object will take care of. Examining these lifetime issues would be part of a general code review.

In ClearAll(), I would suggest the programmer consider get in the habit of using std::transform() and std::for_each() instead of manually iterating through loops.

Student Code Critique 36 Entries

Here is a C program generating a couple of prime numbers as part of an exercise on encoding/decoding with public and private keys. There are two bugs with the program: it produces the same output each time it is run with one compiler (msvc) and it loops forever with another (gcc). Please critique the code to help the student resolve both these problems with the algorithm. Additionally suggest any improvements to the coding style and point out any other issues with the algorithms used. You can also broaden the critique to include a C++ solution if this may assist the student with their original task.

#include <stdlib.h>
#include <stdio.h>
int main()
{
//need to generate number, then find out
//whether it is a prime, twice. Then need to
//generate e and see if it is a factor of n.
  int i1, rem1, i2, rem2, i3, rem3, rem4;
  int p, q;
  int n, phi, e;
  //These are the two prime numbers output
  int m, d;
  i1 = 0;
  i2 = 0;
  i3 = 0;
  while(i1!=1)
  {
    p = 100 + 99*rand()/((double)RAND_MAX+1);
    //p is random number between 100 and 200.
    i1=p-1;
    rem1 = p%i1;
    //find out whether the number is prime
    while(rem1!=0)
    {
      i1--;
      rem1 = p%i1;
    }
  }
  while(i2!=1)
  {
    q = 100 + 99*rand()/((double)RAND_MAX+1);
    i2=q-1;
    rem2 = q%i2;
    while(rem2!=0)
    {
      i2--;
      rem2 = q%i2;
    }
  }
  n = p*q;
  phi = (p-1)*(q-1);
  // phi is the number of primes less than n!
  // e picked such that gcd(e, phi) = 1
  while(i3!=1)
  {
    e = phi*rand()/((double)RAND_MAX+1);
    //e is a random number between 0 and phi. 
    i3=e;
    rem3 = phi%i3;
    rem4 = 1;

    //this loop finds the highest value of i3 
    //which divides both numbers. It needs to
    // be 1, so they are relatively prime
    while(rem4!=0 )
    { 
      i3--;
      rem3 = phi%i3;
      
      if(rem3==0) 
        rem4 = e%i3;
    }
  }

  //the loop will find the value of m such 
  //that e*d mod phi = 1.
  m = 0;        
  while((e*d)%phi !=1)
  {
    d = (m*phi+1)/e;
    m++;
  };
  printf( "(m,d) is (%i,%i)\n", m,d );
  return 0; 
}

From Seyed H. Haeri (Hossein) `<shhaeri@math.sharif.edu>`

When I start scanning this code, the first thing which meets my eyes is its appropriate commenting. So, one positive point for the student!

Next, I see the variable declarations at the beginning of the main() function. As mentioned in the problem specification, this is a C code, so the student has probably not had any other option. In case he/she is about to move to C++, which the spec lets us to assume so, the programmer should postpone them as late as possible.

Another point which arises whilst moving from C to C++, is that because there is no more any need to use printf(), and we use cout in return, it suffices to replace the top #includes with a mere #include <iostream>. (Note that I've dismissed the tailing .h on purpose.)

A negative point which I give to the student is because of the lack of good interaction with the user (of the programmer); the program outputs "(m,d) is …" without saying what m and d are. The program is supposed to say that beforehand. More explanation of what the program is about to do is not that bad. Furthermore, (m,d) is not a good representation of the purpose. If we suppose that somebody may some day use this program, that's mathematicians. Mathematicians are very likely to misunderstand (,) with GCD - as is usual in Number Theory.

I'm not sure whether this is an assignment in which the student is supposed to implement the algorithm for producing two random prime numbers. Assuming it is not, it suffices to use the Boost stuff to do that within much less lines. I don't know for when this code is, but if it is for recent years, the program is, therefore, overkill.

But what has caused the bug reported by the student? Well, a known problem: Random Number Generation. It turns out that built-in random generators such as rand(), although really produce random numbers, don't produce what we human-beings may consider so. What has caused the students report is also the same. He/she has encountered the problem because rand() produces the same number in consecutive invocations. That is, rand() of MSVC is always producing the same prime number, and that of GCC is always producing the same non-prime number. Therefore, MSVC always outputs the same number, and GCC goes in an infinite loop. (Note that this is a consequence of program's logic.)

Another important point about this code is the lack of good modularisation in it; none of those loops should be more than a function call. Assuming that the programmer knows how to implement each of below functions, (and he/she adds appropriates comments as well) he/she should have written something like this:

#include <iostream>
…
int main()
{
  int p = gimmmeARandom(100, 200);
  int q = gimmmeARandom(100, 200);
  int n = p * q;
  int phi = (p - 1) * (q - 1);
  int e;
  do
  {
    e = gimmmeARandom();
  }
  while(gcd(e, phi) != 1);
  int m = 0, d = 0;
  while((e * d) % phi != 1)
  {
    d = (m * phi + 1) / e;
    ++m;
  }
  cout << "Two random prime numbers
     : " << m << d;
  return 0;
}

Finally here are a few minor points:

The programmer has always used the postfix ++ operator, whilst in all of them, the prefix version does the job. So, he/she should replace all of them.
The variable d is not initialised. I don't know about C, but in C++, local variables are not automatically initialised. Therefore, the student should manually initialise that.

I suggest the student should read about the Boost Random Number Generation library.

From Ken Duffill `<ken@kendee.co.uk>`

On first look this code is quite busy and difficult to follow. There is a bunch of declarations, all with pretty meaningless and very similar names. Then there are four loops, obviously this is where the work goes on, but neither the original description, nor the comments, nor the code itself make clear what is meant to be going on.

This kind of code is very common with beginners, and even experienced developers like myself (I have been programming for nearly thirty years) may produce code like this during some quick prototyping excercise. The next phase of development, though, MUST be to get the code into some sort of order.

I will go through the process step by step so as to try to show what changes I would like to make and why I think they are important. Once we have done this it is possible that at least some of the bugs will have become obvious and been resolved along the way and the code is definitely going to be more easy to understand so if there is a fundamental flaw in the design we should be able to pick it up.

Right from the start we see that the code is one long function called main. Even though it isn't that long; one long function always leads to trouble. It is difficult to tell where the variables have any meaning. They are all declared in the same place and then some are used and forgotten, others are used to carry data from one part of the program to another, but it is not easy to tell which is which.

So, the first step is to put each of the four loops into separate functions, and only leave those variables declared in the main function that are needed to carry data between the functions.

Looking at the first loop, it seems that the variable p is the only data that is needed once the loop has finished. So we create a function (we will call it function1 for now, once we understand its real purpose later on we will think of a more descriptive name) that returns an int. We cut the loop from main and paste it into the new function, replacing the original with the assignment p = function1();

Now we remove i1 and rem1 from the declarations in main and put them into function1 and add a new int p declaration in function1, and return p when the function is complete.

Compile the code using both compilers, to make sure we haven't broken anything.

Oops! The compilers baulk at a statement in main i1 = 0; So we cut that from main and paste it into function1. At this point one of my personal preferences takes over for a moment and instead of having the first few lines of function1 look like this:

int i1, rem1;
int p;
i1 = 0;

I change it to:

int i1 = 0;
int rem1;
int p;

That is to say one declaration per line and if a variable needs initialising, do it in the declaration so you don't forget.

Compile the code using both compilers, to make sure we haven't broken anything. That's better!

Run both versions, to make sure we haven't changed its behaviour.

Next we do the same for the second loop, calling this function2.

Now we can see that functions one and two are only different in that one has variables i1, rem1 and p and the other variables i2, rem2 and q. If we change i1 and i2 to i and rem1 and rem2 to rem in both functions and then change q to p in the second function, the functions become identical, so function2 can be deleted and main can make two calls to function1, one assigning the result to p, the other assigning the result to q, instead.

In examining the next loop in main to decide how to turn it into a function we notice that the two lines of code before the loop include one that assigns n to p*q, and yet n is never used in the code. It is only referred to in the comment that states, "phi is the number of primes less than n!"

Just to make sure that we haven't missed something we delete this line, and the declaration of n and recompile. Sure enough, the compiler is happy. Now, if we have any "domain knowledge" we will understand that the value n = p * q is a very important part of the cryptographic algorithm so we may choose to leave it in for future use. If we do then we should place a comment to this affect. If we decide to remove it we need to change the comment to "phi is the number of primes less than n!. Where n was the product of p and q."

We now create a function (function3, for now) that takes an integer parameter and returns an integer result. Cut the loop from main and paste it into the new function, replace the cut code in main with a call to function3 passing in phi as calculated just before the loop, move all appropriate declarations from main to function3. Exactly as we did for function1 and function2. We then compile and run to show we haven't broken anything.

Ooops, exactly as happened for function1 and function2 the compiler baulks because there is an assignment of i3 = 0 left in main. Move this assignment into function3, taking this opportunity to implement my preferences (one declaration per line and if a variable needs initialising, do it in the declaration so you don't forget).

Compile again, and run.

Don't despair that we haven't fixed the bug yet, gcc still produces an exe that loops forever, and MSVC still gives us the same answer every time we run it. That's OK we haven't tried to fix anything yet we are just getting the code 'in shape'.

We repeat the refactoring process one last time, for the last loop. This time function4 needs to accept two integer parameters (e and phi) and doesn't return anything, it just prints the results. Note the initialisation of m = 0 just before the loop, let's do that in the declaration of m in the function this time.

Now function4 is very small, and we notice straight away that there is an unnecessary semicolon at the end of this while loop, which we remove. Isn't it much easier to see these things in small functions? Also we see that there is an int d that is declared and then used, in the conditional of the while loop, before it has been initialised. Could this be our bug?

So following my preferences we want to initialise d in its declaration, but what do we initialise it to? Well in the body of the loop d is set to (m*phi+1)/e so lets use that. We know that m is initialised to zero so the initialisation of d becomes 1/e because m*phi is zero. Now we see that in the loop, once d has been set m is incremented so maybe, because we have set d in the initialisation, m should be initialised to 1 rather than zero. Essentially we have done the first pass through the loop in our initialisation, at the same time as making sure we don't get any surprises if the runtime environment doesn't initialize in the way we expected.

Was this our bug? Well, no. Having compiled and run the code we get the same behaviour as before. But the code quality has improved.

Now our main is looking much smaller and more understandable. If we adopt my preferences for declarations we have a main that looks like this:

int p = function1();
int q = function1();
int phi = (p-1) * (q-1);
int e = function3(phi);
function4(e, phi);

with a few blank lines and comments in between these lines of code.

Now we can see that if the variable and function names made some more sense we might actually be able to see what is going on!

Just by looking at the comments we can get some better variable names in main.

p and q are random numbers that are prime.
phi is the number of primes less than n! Where n was the product of p and q. I could change this name to xPrimes or something similar, but it may well be that the algorithm being implemented is well known and EVERYBODY who knows it expects to use phi, in which case it would be better to leave it.
I am still not sure what e is so we will leave that.
function1 clearly calculates a random number that is prime, cos that is what p and q are.

Also, it seems as if e and phi are only used as transports between function3 and function4, so by moving the call of function3 into the beginning of function4 we can get main to look like this:

int p = FindRandomPrime();
int q = FindRandomPrime();
function4(( p - 1 ) * ( q - 1 ));

We could go one stage further now and lose p and q altogether the whole of main just becomes:

function4(( FindRandomPrime() - 1 ) *
 ( FindRandomPrime() - 1 ));

But that may be a step too far. Again, with some "domain knowledge" we will find that p and q are important numbers in the whole cryptographic process, so we will leave them be.

Our bug is still present though, and we need to look at the functions more closely.

The FindRandomPrime function and function3 are very similar. Inside a loop we get a random number in a given range. Let's see if we can factor this bit of code out.

We create a function that returns an int and takes two int parameters. It is quite obvious now what this function is going to do, so we will give it a useful name straight away. The prototype will be something like this:

int GetRandomInRange( int from, int to );

We cut the line from FindRandomPrime and paste it into GetRandomInRange, and refactor it to account for the parameterisation so it becomes:

return from + (to-from-1)
*rand()/((double)RAND_MAX+1);

replacing that line in FindRandomPrime with a call to GetRandomInRange thus:

p = GetRandomInRange(100, 200);

We almost don't need the comment anymore. We replace the line in function3 similarly:

e = GetRandomInRange(0, phi);

Comple and run... The bug is still there.

Now we ask ourselves does GetRandomInRange work? It is a one line function, so we should be able to understand it now. rand() returns a random integer between 0 and RAND_MAX, and this function appears to try to convert this to a random integer between from and to.

With this implementation there is a mixture of implicit and explicit casting in this expression. Now, as I have said earlier, I have been developing software for nearly thirty years, more than fifteen of which have been in C. I have the C standard on my desk and I still can't tell for sure how this expression will evaluate. Some integers get cast to doubles, and then the doubles get cast back to integers, but which, where and why? I dunno.

It is my belief that, even if I knew that this expression could be relied upon in all implementations to give me the correct answer, I cannot rely on everybody who may need to look at this code in the future to be able to, and if they can't they are going to break it someday (if it isn't already broken, of course). So as a matter of principal, I don't like expressions that mix explicit and implicit casts.

If we rewrite this expression as two expressions so as to avoid implicit casts altogether we get:

double range = ( double )rand()
  /((double)RAND_MAX + (double)1.0 );
return from + (int)(
  (double)(to-from-1)*range);

When we compile and run we now do not loop indefinitely in the gcc version. Hey, success! Of course, it gives us the same answer every time we run the application, but at least our two compilers now produce an application with the same behaviour.

I am still not entirely sure that this function actually delivers what it seems to want to in extreme cases, and there may be a hidden bug in the application as a result.

I would prefer to see:

double range = (double)rand() /
  (double)RAND_MAX;
return from + (int)((double)( to - from )
  * range );

Now that this code is contained in one simple function, some unit tests should be written to prove the answer is correct in all cases (or fix the code until it does). I will leave that as an exercise for the reader.

So, why are our random numbers not random. A quick look at the standard (or any C standard library reference) tells us the answer lies with rand. rand doesn't actually produce random numbers; rather, it selects numbers from a sequence of pseudo-random numbers. The starting point of this sequence will always be the same unless the rand library is properly initialised. srand should be called somewhere in the application before the first call to rand. The parameter to srand should be some value that is different every time the application runs. It is quite usual to use a value from the system timer for this purpose. So I will use a call to time thus:

srand((int)time(NULL));

This, of course will only work so long as the application is not run more frequently than once per second. Whilst srand is called at the application level this is reasonable, because we humans are slow beasts, but care must be taken if this code is refactored into libraries that may be called rapidly from other code.

We now have tidied our code up considerably, found the two causes of the problems observed and two other problems that had not yet produced observable errors. But there is still more that can be done. With a little more "domain knowledge" we can find better names for function3 and function4.

I did some internet searches based upon the comments like "e picked such that gcd(e, phi) = 1" and "this loop finds the highest value of i3 which divides both numbers. It needs to be 1, so they are relatively prime" and "the loop will find the value of m such that e*d mod phi = 1". From information gleaned, I refactored function3 into FindRandomRelativePrime that calls functions called Euclid and FindLargestFactor. I am not a Crypto expert and I still can't find a better name for function4. I have attached the code so maybe someone else can finish the job.

Now that all the functionality is refactored into small, well named, functions whose purpose is clear and unambiguous then each function can be tested with unit tests, preferably automatically built and run as part of the development environment every time the code is changed.

Who knows what other problems will have been identified and fixed once this process is complete.

You will notice if you look at my code that I have changed the post-increments and post-decrements to pre-increments and pre-decrements. There are cases when the code produced can be more efficient if pre- rather than post- operations are used. It is true that in C these cases are rather rare and insignificant, but in C++ when the objects being pre- or post- operated are not native types but large objects then the saving of the creation of one temporary can be significant, and it is therefore a good habit to get into. One final comment. The original problem stated that the application behaved differently when built with MSVC than with gcc. It is good practice to build your applications with more than one compiler whenever you can. Some interesting differences can come up.

When I was refactoring the code for FindLargestFactor. I decided that the original algorithm was marginally wasteful as the largest factor must be less than or equal to half the candidate. Because of this the candidate must be greater than 2 in order to have any factors. I put a conditional in right at the beginning of the function, but being lazy I only compiled with the gcc compiler. There was no problem. It was some time later when I compiled with MSVC again and the compiler complained because it didn't like declarations that follow code. Although the C99 standard allows this, for maximum portability it is not good practice. If you really MUST have new declarations after code then they should be put in a new block by the appropriate use of curly braces. In this case there is very little overhead to just moving the conditional to just after the declarations, but that isn't always the case.

Another incident was when I began using time to seed the random sequence. Again gcc was happy, but MSVC complained that I hadn't #included time.h

One final final comment. It is also a good idea to pass your code through a static code checker such as Lint (or PcLint), this can also pick up some problems. Probably the problem with implicit casting that was the first bug we cured in this project would have been caught by a static checker.

#include <stdlib.h>
#include <stdio.h>
#include <time.h>
int GetRandomInRange(int from, int to)
{
  double range = (double)rand() /
    (double)RAND_MAX;
  return from + (int)((double)( to - from )
    * range );
}
int FindLargestFactor(int candidate)
{
  // The largest factor of a candidate cannot
  // be bigger than half of the candidate,
  // so lets start there.
  int potentialFactor = (candidate >> 1);
  // The concept of factors only applies to
  // numbers greater than or equal to 2.
  if (candidate < 2)
  {
    return candidate;
  }
  // If there is no remainder when candidate
  // is divided by potentialFactor
  // then potentialFactor is a real factor.
  while(candidate % potentialFactor != 0)
  {
     // There was a remainder so lets try
     // another potentialFactor.
     --potentialFactor;
  }
  // We will always end up with a factor even
  // if it is one.
  return potentialFactor;
}
int FindRandomPrime()
{
  int p;
  do
  {
     p = GetRandomInRange( 100, 200 );
  // Find out whether p is prime.
  // If the largest factor of p is one then p
  // is prime, so we have finished.
  } while ( FindLargestFactor( p ) != 1 );
  return p;
}
int Euclid( int larger, int smaller )
{
    // Find whether the greatest common
    // denominator (gcd) of the two
    // parameters is one or not.
    int divisor = smaller;
    int dividend = larger;
    int remainder;
    while (( remainder = dividend % divisor )
           > 1 )
    {
        dividend = divisor;
        divisor = remainder;
    }
    //If the result is one then it is the gcd,
    //otherwise the result is zero. We are not
    //interested what it is, but if we were it
    //would be dividend at this point.
    return remainder;
}
int FindRandomRelativePrime( int phi )
{
  int e;
  do
  {
    e = GetRandomInRange( 1, phi );
  } while ( Euclid( phi, e ) != 1 );
  return e;
}
void function4( int phi )
{
  // phi is the number of primes less than
  // n!.  Where n was the product of p and q.
  int e = FindRandomRelativePrime( phi );
  int m = 1;
  int d = 1 / e;
  //the loop will find the value of m such
  //that e*d mod phi = 1
  while(( e * d ) % phi != 1 )
  {
     d = ( m * phi + 1 ) / e;
     ++m;
  }
  printf("(m,d) is (%i,%i)\n", m,d );
}
int main()
{
  srand((int)time(NULL));
  {
    int p = FindRandomPrime();
    int q = FindRandomPrime();
    function4(( p - 1 ) * ( q - 1 ));
  }
  return 0;
}

From Jim Hyslop `<jhyslop@dreampossible.ca>`

Let's deal with gcc's behaviour first. In debugging, printf can be your best friend, especially if you don't have an IDE to step through the code. So, after putting in various printfs to find the checkpoints (i.e. "About to find p", "About to find q", and so on), we find that it's never exiting the first loop:

while(i1!=1)
{
  p = 100 + 99*rand()/((double)RAND_MAX+1);
  //p is random number between 100 and 200.
  i1=p-1;
  rem1 = p%i1;
  //find out whether the number is prime
  while(rem1!=0)
  {
    i1--;
    rem1 = p%i1;
  }
}

Oops - what's this: 99 * rand(). If rand() is greater than approx. 21 million (which it will be about 1 time out of 100) then it will overflow. We don't want that, since overflows are undefined behaviour (which, by the way, is probably why MSVC "works"). So, to fix it, we can just force the numerator to a double by using the constant 99.0 instead of 99:

p = 100 + 99.0*rand() / ((double)RAND_MAX+1);

We don't need to worry about the denominator overflowing, since RAND_MAX is first cast to a double, and then incremented.

Looking further down the code, we see that the random number generation is repeated. This needs to be refactored into a function which generates a random number in a given range:

int GenRand( int min, int max )
{
  return min + (double)max *
    rand()/((double)RAND_MAX+1);
}

Doing this by the way, also eliminates the need for the comments //[varname] is a random value between x and y.

Now, when you run it, gcc and MSVC behave the same: you always get the same results. That's because rand() is not a true random number generator: it generates pseudo-random numbers in a sequence determined by the algorithm used. Unless you provide it with a unique seed each time you start the program, it will generate the same sequence of numbers. So the first thing to do is seed the random number generator. For purposes of this exercise, srand( time(NULL) ); before the first call to rand() will suffice. Now, the program will generate different numbers each time you run it (unless you run it twice within one clock second).

Style Comments

Even C has the concept of local block variables. Take advantage of them, and declare variables in the most restrictive scope possible. The variables i? (e.g. i1, i2, etc.) and rem? are each only used within one while loop, so they can be declared there.

None of the variables are initialized. Uninitialized variables account for a large number of programming errors, so variables should be initialized whenever possible. It also makes the meat of your functions more compact, since you don't occupy space within the logic simply initializing variables.

The variable names are not very useful. i1, i2, i3 are meaningless. I seem to recall that p, q, m, d, n, e, and phi are specific terms used in the cryptographic equations so that is reasonable - but there should be a comment to that effect.

Comment placement is inconsistent. Some comments are placed before the line in question, some after, e.g.:

p = 100 + 99*rand()/((double)RAND_MAX+1);
//p is random number between 100 and 200.
[...]
//find out whether the number is prime
while(rem1!=0)

Consistency is important in allowing other programmers to easily read and understand your program. The comment for p should be placed before the statement.

Avoid repetition. There are two loops that perform identical work - they should be refactored (yes, you can refactor even in C) into a function. Applying meaningful names for i1 and rem1, we get:

int FindRandomPrime( void )
{
  int primeCandidate, factor=0;
  while(factor!=1)
  {
    int remainder;
    primeCandidate = GenRand( 100, 200 );
    factor=primeCandidate-1;
    remainder = primeCandidate%factor;
    //find out whether the number is prime
    while(remainder!=0)
    {
      factor--;
      remainder = primeCandidate%factor;
    }
  }
  return primeCandidate;
}

This is a horribly inefficient way of finding primes. It tries all numbers from 1..primeCandidate. You don't need to try any even numbers except 2, and you don't need to try any numbers below sqrt( primeCandidate ). A much more efficient function might be:

int IsPrime( int value )
{
  int factor1 = 3,
      factor2 = sqrt( value ) + 1,
      divisorFound= value%2 == 0;
  while ( !divisorFound && factor1<factor2 )
  {
    divisorFound = value % factor1 == 0;
    factor1 += 2;
  }
  return divisorFound;
}
int FindRandomPrime()
{
  int primeNumber;
  do {
    primeNumber = GenRand( 100, 200 );
  } while ( !IsPrime( primeNumber ) );
  return primeNumber;
}

Note, by the way, that I've split IsPrime into a separate function: this allows you to hook in unit tests to validate that your IsPrime function works correctly.

I don't have my copy of C99 handy, but I'd double check to see if // comments are allowed in C99. I know they are not allowed in C90.

A little more liberal use of whitespace would help readability (although that could be a restriction imposed by the printing requirements in the magazine).

Commentary

With three such varied entrants I don't think much additional commentary is needed - the main items I wanted addressing are all covered: the amount of code duplication, the oddly predictable behaviour of rand(), and the problems caused by the mix of int and double variables.

However no one correctly identified the reason for the difference in runtime behaviour between MSVC and gcc. It is in fact caused by the different values for the (implementation defined) value RAND_MAX. This is defined as 0x7fff with msvc and 0x7fffffff with gcc. With msvc the expression 99*rand()/((double)RAND_MAX+1) will evaluate to a number from 0 to 98 but with gcc on the platform being used it can only be 0 or -1 (caused by signed integer overflow).

The Winner of SCC 36

The editor's choice is:

Jim Hyslop with a strong commendation to Ken Duffill

Please email <francis@robinton.demon.co.uk> to arrange for your prize.

Student Code Critique 37

(Submissions to <scc@accu.org> by Jan 3rd 2006)

For a change I thought I'd set a Java problem. Here is a student's attempt at a simple class to provide access to a single database table, ADDRESS. Please critique the code and suggest what problems there may be with this class when using it in a larger application, and any issues with the simple test harness.

import java.sql.*;
class Scc37 {
  String[] drivers = {
    "sun.jdbc.odbc.JdbcOdbcDriver" };
  String database = "jdbc:odbc:ADDRESS";
  void setDrivers( String[] drivers ) {
    drivers = drivers;
  }
  void setDatabase( String database ) {
    database = database;
  }
  String selectAddress( String query ) {
    try {
      for ( int idx = 0;
            idx != drivers.length; ++idx )
        Class.forName( drivers[ idx ] );
    } catch (Exception e) {
      System.out.println(e);
    }
    String userName="";
    String password="";
    // Get connection
    Connection con = null;
    try {
      con = DriverManager.getConnection(
        database,userName,password);
    } catch(Exception e) {
      System.out.println(e);
    }

    // Execute query
    ResultSet results = null;
    try {
      results =  
        con.createStatement().executeQuery(
        "SELECT * FROM ADDRESS WHERE "
        + query );
    } catch (Exception e) {
      System.out.println(e);
    }
    // Get all results
    String retVal = "";
    try {
      ResultSetMetaData rsmd = 
        results.getMetaData();
      int numCols = rsmd.getColumnCount();
      int i, rowcount = 0;
      // break it off at 50 rows max
      while (results.next() && rowcount<40) {
        // Loop through each column, getting
        // the data and displaying
        for (i=1; i <= numCols; i++) {
          if (i > 1) retVal = retVal + ",";
          retVal = retVal +
            results.getString(i);
        }
        retVal = retVal + "\n";
        rowcount++;
      }
    } catch (Exception e) {
      System.out.println(e);
    }
    return retVal;
  }

  /** Test harness - 'args' list drivers */
  public static void main( String[] args ) {
    Scc37 scc37 = new Scc37();
    scc37.setDrivers( args );
    System.out.println( "Found:" +
      scc37.selectAddress(
        "name LIKE '%Hardy'" ) );
  }
}

^[1] Charles Simonyi, 'Hungarian Notation', http://msdn.microsoft.com/library/en-us/dnvsgen/html/hunganotat.asp

Notes:

More fields may be available via dynamicdata ..