Journal Articles

CVu Journal Vol 16, #3 - Jun 2004 + Student Code Critiques from CVu journal.

Browse in :

All > Journals > CVu > 163 (11)
All > Journal Columns > Code Critique (70)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Student Code Critique Competition 28

Author: Administrator

Date: 03 June 2004 13:16:06 +01:00 or Thu, 03 June 2004 13:16:06 +01:00

Summary:

Body:

Prizes provided by Blackwells Bookshops & Addison-Wesley

Please note that participation in this competition is open to all members. The title reflects the fact that the code used is normally provided by a student as part of their course work.

This item is part of the Dialogue section of C Vu, which is intended to designate it as an item where reader interaction is particularly important. Readers' comments and criticisms of published entries are always welcome.

Before We Start

Given the amount of entries received so far (I hope this attitude doesn't decay as time goes by), I'll be as concise as possible. Please remember that you can get the current problem on the ACCU website (http://www.accu.org/journals/). This should be enough motivation for people who don't participate due to the short time span between the date they receive C Vu and the contest deadline. As Francis stated in the last column, other languages such as Java, C# and Python are also very welcomed, so if you have some snippet you consider suitable for publishing, drop me a line at <caabeiro@vodafone.es>

Late Submission For SCC 26

From Ian (surname not provided)

The simple answer to the student's problem is arrays in C++ being indexed from 0 he does not need the + 1 in his loop which would then read:

for (int i = 0; i <= len; i++)
cout << str[i];

However this leaves a few problems. Addressing the most obvious first main should return an int rather than a void (most obvious because my version of GCC will not compile the code until this is fixed). [gcc 3.3.3 will allow void main() to be compiled. A warning is given rather than an error - Ed]

A lot of the remaining problems stem from the use of C-style strings and changing these for STL strings will address a number of issues. In terms of the current issues related to the C strings the arraySize constant is unnecessary and dangerous, if "Need help with C++." becomes "Need no help with C++." in a moment of hubris and the arraySize value is not increased then there are potential problems, char str[] = "Need ..." will suffice instead. The loop given is printing the final null character which whilst not harmful is not necessary.

Initially changing to use C++ strings tidies these up a bit.

#include <iostream>
#include <string>
using namespace std;
int main() {
  string str = "Need help with C++.";
  int len = str.size();
  cout << "The sentence \"Need help with
       C++.\" has " << len << " characters."
       << endl << endl;
  for (int i = 0; i < len; i++)
    cout << str[i];
}

Even this is correct but could still be improved in a few ways. The "Need help with C++." string is hard coded in two places and the len variable is could be replaced with the str.size() call where it is used. In a program of this size it also not necessary (though not harmful) to flush the output buffers with endl repeatedly, "\n" will suffice. (Incidentally the original loop version dropped the carriage return after each character, which behaviour is required is unclear.)

#include <iostream>
#include <string>
using namespace std;
int main() {
  string str = "Need help with C++.";
  cout << "The sentence \"" << str << "\" has "
       << str.size() << " characters.\n\n";
  for (int i = 0; i < str.size(); i++)
    cout << str[i];
}

Finally the loop could be made more idiomatic by using string iterators rather than indexing (which would also handle the case if the string size happened to overflow the int).

for (string::const_iterator i = str.begin();
     i != str.end(); ++i)
  cout << *i;

But even more so by using an STL algorithm rather than an explicit loop to perform the job, the final version becoming

#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
using namespace std;
int main() {
  string str = "Need help with C++.";
  cout << "The sentence \"" << str << "\" has "
       << str.size() << " characters.\n\n";
  std::copy(str.begin(), str.end(),
            ostream_iterator<char>(cout, ""));
}

Student Code Critique 27 Entries

There are at least two issues with this issue's problem code. The first is finding the immediate logic error in the code that results in false positives. The second, to my mind more important issue, is the student's approach to solving the problem. It is relatively easy to learn to write syntactically correct code but that is not programming any more than writing grammatically correct English is writing poetry or a novel.

Please submit a critique of the code as it is and then append a critique of the student's approach to the problem s/he was set.

In this exercise an ugly number is defined as a compound number whose only prime factors are 2, 3 or 5. Note that a factor can repeat so 20 (2*2*5) is an ugly number. The following program outputs ugly numbers correctly, but also outputs prime numbers. I can't understand why this is happening because the pointer returned by pf() points to a zeroed 1st element of the array flist when the input to pf() is prime. It should fail the while() test and the next number should be factored. I can kludge this with a separate test for primes, but I would like to understand why it's not working as is.

/* find "ugly numbers": their prime factors
are all 2, 3, or 5 */
#include <stdio.h>
#include <stdlib.h>
#define SIZE 20
/* SIZE is the max number of prime factors */
#define TWO 2
#define THREE 3
#define FIVE 5
int *pf(int number);

int main(int argc, char *argv[]) {
  int *flist,idx, n, ugly = 0;
  int start, stop;
  if(argc != 3)exit(EXIT_FAILURE);
  start = atoi(argv[1]); /*enter a range */
  stop = atoi(argv[2]);
  for(n = start; n < stop +1; ++n) {
    idx = 0;
    flist = pf(n);
    while(flist[idx]) {
      if(flist[idx]==TWO ||flist[idx]==THREE
         ||flist[idx]==FIVE) {
        ugly = 1;
        ++idx;
      }
      else {
        ugly = 0;
        ++idx;
      }
    }
    if(ugly == 1)
    printf("%d\n", n);
    free(flist);
  }
  return 0;
}

/* find the prime factors of number */
int *pf(int number) {
  int *flist, quotient=0, divisor=2, idx=0;
  flist = calloc(SIZE, sizeof(int));
  while(divisor < number + 1) {
    if(divisor == number){
      flist[idx] = quotient;
      /* add last factor to list */
      break;
    }
    if(number % divisor == 0) {
      flist[idx++] = divisor;
      quotient = number/divisor;
      number = quotient;
    }
    else ++divisor;
  }
  return flist;
}

From Roger Orr `<rogero@howzatt.demon.co.uk>`

The initial problem which the student reports is the false positives for some primes. For example:

C:>ugly 30 32
30
31
32

where '31' should be missing - but:

C:>ugly 70 72
72

correctly doesn't display '71'. How can we find the bug?

I can think of four main ways to resolve down the problem.

explore the problem further
review the code
use a debugger
change the algorithm

These options are not exclusive - and for more complicated problems several of these might be used together.

Let's look at how the problem might be resolved in each case:

Try more numbers and see if a pattern appears.
```
C:>ugly 10 40
10
11 **
12
13 **
15
16
17 **
18
19 **
20
24
25
27
30
31 **
32
36
37 **
40
```
The ones with asterisks ought not to be displayed. However 23 and 29 are correctly not shown.

We may eventually realise that a prime is printed if and only if the previous number printed was 'ugly'. It is then a small step to realise that the ugly variable - which maintains the state of being ugly - is not being reset between iterations of the main loop.
Review the code

It is possible to perform a walk through of the code and discover the fault. Even for only a few lines of code however it might take a long time to find the error in failing to reset ugly.
Use a debugger

If the program is compiled for debugging and the first example is single stepped through the debugger it should be fairly easy to spot that the variable ugly starts the loop with the value 1 for the second number (31).

Then the while(flist[idx]) evaluates to false since the first element is zero, and so the test if(ugly == 1) is successful.
Change the algorithm

We could simply cover over the problem by changing the algorithm. Many bugs are caused by boundary conditions and special cases.

In this example, pf returns an array with the first number zero for prime numbers which is a special case. This is because, when the number is prime, no number divides it and so the initial value if quotient is returned.

The first approach is to add extra code for this special case - we could change the test
```
if (ugly == 1)
```
to
```
if (ugly == 1 && flist[0])
```
and if we try this out we'll find that the bug appears to have gone.

The danger with this approach - as pointed out by the student - is that this may not have fixed the bug but simply changed the symptoms. The other danger is that the new algorithm may not be correct!

A second approach, for example, is to change the initialisation of quotient from
```
quotient = 0
```
to
```
quotient = number
```
then pf will return the input number as the first item for primes and we remove the special case for primes.

If we try this we will find that both the examples above now work correctly. However we've introduced a different bug - 2, 3 and 5 are now displayed although they are not compound numbers.

What have we done? We've tried to remove a special case - prime numbers - but they are a special case in the problem statement and not just an implementation detail. Albert Einstein is credited with the statement: "Make things as simple as possible, but no simpler", and we've hit the second part of this!

How can we fix the bug? We need to ensure that ugly is reset at the beginning of the loop. The simple, naive approach just adds a line ugly=0, after the for statement.

While this is adequate a better approach moves the declaration of the variable ugly inside the for loop.

Remove:
```
, ugly = 0
```
from the line after main, and put the line:
```
int ugly = 0;
```
directly after the for statement.

It is in general good to declare variables with the smallest scope possible, and close to their use. The variable ugly is required only within the loop and it should not persist its value between iterations. We can make the change and now the program runs as expected.

What else is wrong with the code? The corrected code works for all numbers up to 1048576 when it misbehaves. The problem is the hard-wired limit of SIZE. This limits the program to 19 factors (the last entry can't be used since the flist array is always terminated with a zero.) One easy solution is to check stop is less than 1048576 and report an error if not. More complicated solutions are obviously possible to remove the fixed size limit - but only if necessary.

Using #define to define names for numbers seems strange, to put it mildly. What is going on? The numbers 2, 3 and 5 are special for this program, and I think the programmer wanted to make that explicit.

However the names are not helpful - they literally add nothing to the program (try reading it aloud to see what I mean!)

The principle is a good one though - beware magic numbers. I would like to keep the principle but not the implementation.

For example:
```
const int factors[] = { 2, 3, 5 };
```
This defines an array containing all the factors making 'ugly' numbers, and the rest of the code does not need to refer to them again directly, so the test
```
if (flist[idx] == TWO ||
    flist[idx] == THREE ||
    flist[idx] == FIVE) {
```
can be re-written as
```
if (flist[idx] == factors[0] ||
    flist[idx] == factors[1] ||
    flist[idx] == factors[2]) {
```
Of course, once the magic numbers are in an array it might be worth generalising the check so the code could be modified very easily if you decided that, for example, '7' was also an ugly factor.

The complete check could be re-written using a nested loop as
```
ugly = 0;
int j;
for(j = 0; j < sizeof(ugly_factors) /
  sizeof(ugly_factors[0]); ++j)
  if(flist[idx] == ugly_factors[j])
    ugly = 1;
++idx;
```
While on the subject of names, I don't find the function name pf very informative. It might mean many things, it might be 'pointer to function' for example! Perhaps calling it 'factorize' would be better.

There is also a problem with memory allocation. The code allocates a fixed size buffer in 'pf' which it is the caller's responsibility to free. There are two potential problems with this, firstly it would be easy to leak memory should the caller forget to free the returned pointer and secondly memory allocation can be relatively slow. If the caller passed in the buffer instead then no memory allocation would be required since a local variable in the caller could be used for the array.

And what about performance? The code works, but is not very efficient. The 'pf' function checks each possible divisor of the number in turn, incrementing the divisor by one each time. Try running it over the range 524288 1048575 to see how long it takes!

Does this matter? We don't know - we do not have the information about the proposed use of the program. If performance matters then a different approach to the problem will improve the speed.

One easy way to do this with the current program structure is to exit the main loop in pf prematurely (with flist[0] set to zero) as soon as the divisor exceeds the largest ugly factor, since the number cannot be ugly in this case and further factoring is not necessary for this program. This of course makes 'pf' no longer a general purpose function to obtain all the prime factors of a number.

However, when I tried it, the time taken to calculate the entire range above changed to 1.3 seconds - the original algorithm was still running after 30 minutes (when I got bored and killed it off).

From Nevin Liber `<nevin@eviloverlord.com>`

Logic Error:

As stated, when number is prime, pf(number) returns an array whose first element is 0.

Now, let us look at the code which tests this value:

for(n = start; n < stop+1; ++n) {
  idx = 0;
  flist = pf(n);
  while(flist[idx]) {
    /* ... */
    /* modify ugly */
    /* ... */
  }
  if(ugly == 1)
    printf("%d\n", n);
  /* ... */
}

So, when n is prime, flist[0] == 0, the entire while loop is skipped, and ugly erroneously retains its value from the previous iteration of the for loop. Q.E.D.

Note: it actually doesn't print all the prime numbers. For instance, 23 is skipped because 22 == 2 * 11 is not ugly.

A way to fix this particular bug is to clear ugly at the beginning of each iteration; i.e.,

for(n = start; n < stop+1; ++n) {
  ugly = 0;
  idx = 0;
  flist = pf(n);
  /* ... */
}

Code critique:

#define SIZE 20 /* max number of prime factors*/ This is a latent bug. Besides the comment being wrong (the maximum number of prime factors is 19), if a given number has 20 or more prime factors, the code doesn't work, which can manifest itself either by incorrect results or by a memory corruption.

#define TWO   2
#define THREE 3
#define FIVE  5

These are still "magic numbers". It is clearer just to use the numbers directly.

int *flist, idx, n, ugly = 0;
int start, stop;
if(argc != 3) exit(EXIT_FAILURE);

Always use braces; it makes things far clearer.

Declare one variable per line. It is far more readable and maintainable. Unless there is a good reason not to, initialize the variables as you declare them.

start = atoi(argv[1]);
stop = atoi(argv[2]);

What happens if argv[1] or argv[2] is negative? Does the algorithm still work? What happens if the parameters aren't numbers?

for(n = start; n < stop + 1; ++n)

n <= stop is more readable; the +1 tends to throw people off. What happens when stop is the maximum value an int can hold?

flist = pf(n);

Need some comments that the ownership of the array that pf() returns needs to be managed by the caller. This technique is highly error prone in non-trivial projects.

while(flist[idx]) {
  if(flist[idx] == TWO ||
     flist[idx] == THREE ||
     flist[idx] == FIVE) {
    ugly = 1;
    ++idx;
  }
  else {
    ugly = 0;
    ++idx;
  }
}

Note: Kudos for using pre-increment over post-increment. This is a good habit to get into, especially if you ever move to C++, since pre-increment performs at least as well as post-increment.

Since ++idx is in both clauses of the if statement, factor it out. I might write this as:

while(flist[idx]) {
  ugly = 2 == flist[idx] ||
         3 == flist[idx] ||
         5 == flist[idx];
  ++idx;
}

What this while loop does is test the last non-zero value of flist to see if it is 2, 3, or 5; if it isn't, then the number isn't ugly (since it has other prime factors). This works for the problem as stated because 2, 3, and 5 are the lowest sequential primes. If, for instance, 11 were added to the ugly prime set, it isn't obvious how to extend this algorithm, as just adding || 11 == flist[idx] doesn't work.

if(ugly == 1)
printf("%d ", n);
free(flist);

Good: no memory leak.

Again, use braces. Since the indentation is wrong (and something that is not enforced by the language), to the casual observer it looks like printf() is always called.

/* find the prime factors of number */
int *pf(int number)

Need a better interface. What happens when number is prime? Unless the implementation is studied, it is unexpected that no prime factors are returned for prime numbers. While you can convey this in comments (as well as that the caller has to free() the returned array, don't pass in a number with SIZE or more prime factors, etc.), it is better to make an interface that does what is expected.

while(divisor < number + 1) {
  if (divisor == number) {
    flist[idx] = quotient;
    break;
  }
}

The implementation just isn't obvious. Sometimes quotient means the previous quotient (when the initial value of number isn't prime); sometimes it means 0 (when the initial value of number is prime). Make these kinds of things explicit. Also, while (divisor <= number) is more readable.

Approach Critique:

There is a lot of messy, error-prone bookkeeping involved with calculating the prime factors. Yet the problem isn't asking for the prime factors; it is just asking for compound numbers that aren't solely made up of ugly primes.

Plus, the code which checks for ugliness is tightly coupled with the return value from pf(); i.e., it assumes pf() returns prime factors in ascending order, returns 0 as the first element in the array when the input is prime, the caller is responsible for free() ing the array, etc. That behaviour should all be encapsulated in a function, not spread across main() and pf().

It also isn't data driven; if the definition of ugly primes changes, more code needs to be added. Right now, the algorithm is dependent on the ugly primes being the lowest valued and sequential. If that constraint is violated, the entire algorithm has to be rewritten. We can be more resilient.

Alternate Implementation:

We don't need to know all the prime factors; we just need to check that:

The only prime factors are 2, 3, and 5
The number has at least 2 prime factors.

If both of those hold, the number is ugly.

If I were to implement it, I would do so as follows:

int IsUgly(unsigned long number) {
  unsigned long uglyFactorsCount = 0;
  if(number) {
    static const unsigned long uglyPrimes[] = { 2, 3, 5 };
    size_t p = 0;
    for(; sizeof(uglyPrimes) /
        sizeof(uglyPrimes[0]) != p; ++p) {
      unsigned long prime = uglyPrimes[p];
      while(0 == number % prime) {
        number /= prime;
        ++uglyFactorsCount;
      }
    }
  }
  return 1 == number &&
         1 < uglyFactorsCount;
}

The algorithm iterates over all the primes in uglyPrimes and factors them out of number, counting (in uglyFactorsCount) the number of factors that are divided out along the way. If the result of number is 1, then all of its prime factors are in uglyPrimes, and all that is left is to make sure that there were at least two factors divided out.

Some details:

int IsUgly(unsigned long number)

I pass in an unsigned long so I don't have to deal with negative numbers.

if(number)

If I didn't check for 0 != number, while(0 == number % prime) would always be true, and there would be an infinite loop bug. This check is needed for correctness.

At this point, I started to micro-optimize. Since I have to check number anyway, and I know that 1 cannot be ugly (no matter what numbers are in uglyPrimes), there is no reason to go through the factoring code, and I could say if (1 < number). Since no prime can be ugly, I could have used if (3 < number), but then the value of uglyFactorsCount isn't technically correct in those circumstances, and even though I only care that it has a value less than 2 (which would still hold) in order for the algorithm to give the correct answer, it started feeling messy. I remembered the old adage "measure, then optimize", and abandoned this path.

if(number) {
  static const unsigned long uglyPrimes[]
            = { 2, 3, 5 };

I declare variables as close to their use as possible, and try and limit their scope as much as possible. Also, I declared it const, as I want the compiler to enforce that I don't modify the values in the array.

size_t p = 0;
for(; sizeof(uglyPrimes) /
  sizeof(uglyPrimes[0]) != p; ++p)

By using sizeof in this fashion, the code is always right no matter how many elements are in uglyPrimes.

Note: this works because uglyPrimes is the whole array. If it were passed in to the function instead, it would decay into a pointer, and the result would not be what was expected.

Note: I prefer sizeof(uglyPrimes[0]) to sizeof(*uglyPrimes), as there can be unexpected results with the latter when programming in C++ (then again, there are better ways of doing this kind of thing in C++). I used != p instead of > p , since the former is more general when using with iterators in C++. But I digress...

Note: I tend to put l-values on the right side of expressions. In case I accidentally write something like if (0 = a) instead of if (0 == a), I get a compile time error.

For completeness, main() would look something like:

int main(int argc, char * argv[]) {
  if(3 == argc) {
    unsigned long number
      = strtoul(argv[1], NULL, 0);
    unsigned long stop
      = strtoul(argv[2], NULL, 0);
    for(; number < stop; ++number) {
      if(IsUgly(number)) {
        printf("%lu\n", number);
      }
    }
    if(number == stop && IsUgly(number)) {
      printf("%lu\n", number);
    }
    return 0;
  }
  else {
    return 1;
  }
}

Note: The reason there is an extra if (number == stop /* ... */) is to avoid the infinite loop bug when stop contains the largest possible unsigned long value.

From Duncan Booth `<duncan.booth@suttoncourtenay.org.uk>`

The first thing that strikes me about this code is that the structure does not obviously relate to the question. The question asks for a list of 'ugly numbers', so I would expect to see at least one function with the word 'ugly' somewhere in its name. Instead there is a function which sort of produces a list of prime factors, and the rest of the code is all bundled into main.

The student wonders why some additional numbers are output. This is because for any prime number the while loop never executes, so the variable ugly is never reset. This is a direct result of putting too much logic into the main function - extracting the test for ugliness to a separate function would have prevented this particular error ever happening.

The function to extract prime factors has several problems of its own: it uses a fixed length array for the factors with no check for overflow and the length chosen is too short even for 32 bit integers; if it is given a large prime number to factorise it will test every possible divisor up to the number (which could take a while) when it only needs to test divisors up to the square root of the number to determine the prime factors, or up to five to determine whether the number is ugly. The function also doesn't do what the comment at the top says: if it is given a prime number to factorise then instead of returning the number itself it returns an empty list as a special case. This is likely to surprise anyone trying to use the function, so it would be good if the behaviour were documented, but better still if the special behaviour for primes was removed altogether.

Rather than just showing a working program, I will show the steps I use to get to a solution to the problem. I am going to use Python rather than C since that lets me ignore issues like memory management and overflows and also lets me try a variety of algorithms more quickly than I could in C.

When I have a working solution, then if it needs speeding up, or if there is another reason for requiring a C solution the code can be translated to C fairly easily.

Here is my first effort in Python. It follows the student's logic fairly closely in so far as it counts through all the numbers in the range and tests each one for ugliness. The test is a bit different though, instead of extracting all the prime factors, we just look for the factors 2, 3 and 5 and then see if there is any unfactorised residue.

The main code converts its arguments to integers using the eval functions. This makes it easy for me to test comparatively large numbers as I can type an expression as the argument (say 2**29 instead of having to enter 536870912).

# File ugly.py

import sys

BADFACTORS = 2, 3, 5

def isUgly(n):
  if n in BADFACTORS or n in (0, 1):
    return False
  for factor in BADFACTORS:
    while n%factor == 0:
      n = n/factor
  return n == 1

def printUglyRange(start, end):
  print "Ugly numbers from",start,"to",end
  for i in xrange(start, end):
    if isUgly(i):
      print i
  if name__=='__main__':
    if len(sys.argv) != 3:
      sys.exit("Usage: %s start end" % sys.argv[0])
    start, end = eval(sys.argv[1]), eval(sys.argv[2])
    printUglyRange(start, end+1)

I don't feel this code quite expresses my intent yet, so the next step is to refactor printUglyRange:

def uglyRange(start, end):
  for i in xrange(start, end):
    if isUgly(i):
      yield i

def printUglyRange(start, end):
  print "Ugly numbers from",start,"to",end
  for i in uglyRange(start, end):
    print I

For consistency with how Python works generally, the uglyRange function includes its start value but excludes the end value, so 1 is added to the end value to get the same output as the student expected. The yield statement may be unfamiliar to C programmers: when executed it suspends execution of the generator function it is in and returns its argument to the for loop that invoked it; each time the for loop needs another value the generator is resumed exactly where it was suspended and another value calculated.

This separates the printing of the ugly numbers from the logic, and the program seems to run, but when we get up to large numbers (like 2**29) it takes a long time to output each number. The problem is that we are searching an increasingly sparse space for each ugly number instead of going directly to the next value. A better solution would be to generate only the ugly numbers and completely ignore other numbers. First though, I suggest writing some tests in another file testugly.py:

import unittest
class TestUgly(unittest.TestCase):
UGLYTO11 = [ 4, 6, 8, 9, 10, ]
UGLY100TO130 = [ 100, 108, 120, 125, 128, ]

  def setUp(self):
    import ugly
    self.uglyRange = ugly.uglyRange

  def test0to11(self):
    expected = self.UGLYTO11
    actual = list(self.uglyRange(0, 11))
    self.assertEquals(expected, actual)

  def test100to130(self):
    expected = self.UGLY100TO130
    actual = list(self.uglyRange(100, 130))
    self.assertEquals(expected, actual)

if name__=='__main__':
  unittest.main()

The tests allow us to modify the code with confidence that it will continue working, so we can move from an implementation which is easy to understand to one that is much more complex. I tried a few different algorithms, but the version below is my favourite. It generates ugly numbers in ascending order with the numbers partitioned into three lists according to the smallest factor present in that number:

import sys

BADFACTORS = 2, 3, 5

class UglyFifo:
  '''Stores a list of ugly numbers,
     first in first out. Fifos are
     chained so that pushes ripple
     down the chain.'''

  def __init__(self, factor, nextFifo):
    self.head, self.tail = [], []
    self.factor = factor
    if nextFifo:
      self.callNext = nextFifo.push
    else:
      self.callNext = None
    self.push(factor)

  def push(self, n):
    '''Append n multiplied by the
       factor for this fifo, then
       propagate n to the next
       queue.'''
    self.tail.append(n * self.factor)
    if self.callNext:
      self.callNext(n)

  def pop(self):
    '''Returns the oldest (i.e.
       smallest) ugly number from
       the fifo'''
    if not self.head:
      self.head, self.tail = self.tail,self.head
      self.head.reverse()
    return self.head.pop()

  def pushThenPop(self, n):
    self.push(n)
    return self.pop()

def generateUgly():
  '''Generate a potentially infinite
     sequence of ugly numbers, in
     increasing order.'''
  values = []
  fifo = None

def processLowest(lowest):
  '''Push a new value into a fifo,
     update the working values and
     return an ugly number'''
  ugly, fifo = lowest
  lowest[0] = fifo.pushThenPop(ugly)
  return ugly
  for factor in BADFACTORS:
    while values:
      lowest = min(values)
      if lowest[0] > factor:
        break
      yield processLowest(lowest)
    fifo = UglyFifo(factor, fifo)
    values.append([fifo.pop(), fifo])
  while 1:
    yield processLowest(min(values))

def uglyRange(lo, hi):
  '''Generate all ugly numbers from lo
     (inclusive) to hi exclusive.'''
  for i in generateUgly():
    if i >= hi:
      break
    if i >= lo:
      yield i

def main(start, end):
  print "Ugly numbers from",start,"to",end
  for i in uglyRange(start, end+1):
    print i
  if __name__=='__main__':
    if len(sys.argv) != 3:
      sys.exit("Usage: %s start end" % sys.argv[0])
    main(eval(sys.argv[1]), eval(sys.argv[2]))

Perhaps this seems like overkill for a simple problem, but this version of the program works up to quite large ugly numbers, albeit with a delay if the starting value is large. On my system it takes just over 10 seconds to calculate the 1.7 million values up to 10**100 which is probably fast enough for most purposes and therefore doesn't really need converting back to C.

If the student wants to convert this to C (or is compelled to do so by the teacher), there are several ways to do it, but the one I would recommend would be to use the Python program to write all ugly numbers up to some arbitrary limit into a file (there are 1844 up to 2**32, or 13278 up to 2**64) then write a program to print ugly numbers by reading them from the file and printing those in the required range. The code could handle the numbers as strings which would allow even a simple C program to work with arbitrarily large values, but I'll leave this as an exercise for the student.

From Margaret Wood `<margaretwood@pocketmail.com.au>`

The student is quite correct in stating that any prime number should fail the while() test. The problem is that the value of ugly is only set within the while loop. When n is a prime number, the value of ugly remains at what it was set to on the previous trip through the while loop. Thus any prime number which immediately follows an ugly number will also be printed. To fix the problem it is just necessary to set ugly to 0 at the start of each iteration through the for loop.

Now to look at improvements to the code in general. I'll start with the function to calculate prime factors. This could be a useful general purpose function if it did what the comment claims it does, namely to find the prime factors of a number. As it stands, however, it only finds the prime factors of compound numbers, returning 0 for prime numbers. The student has introduced a special behaviour for primes. I would prefer to modify the function to:

/* find the prime factors of number */
int *pf(int number) {
  int *flist, quotient=0, divisor=2, idx=0;
  flist = calloc(SIZE, sizeof(int));
  while(divisor < number + 1) {
    if(divisor == number) {
      flist[idx] = divisor;
      /* add last factor to list */
      break;
    }
    if(number % divisor == 0) {
      flist[idx++] = divisor;
      quotient = number/divisor;
      number = quotient;
    }
    else ++divisor;
  }
  return flist;
}

then add the special treatment in the main function

if(flist[1] == 0) {
  ugly = 0;
else {
  while flist[idx]) {
    ...
  }
}

Ideally the pf() function should also ensure that the factorisation does not go beyond the end of the array, either by checking that idx remains less than SIZE, or by choosing SIZE so that the array is large enough to hold the maximum possible number of factors of any number that will fit in an int. (Since MAXINT is always 2n-1, where n will depend on how many bytes make up an int, any number that can be stored in an int can have at most (n-1) prime factors).

However, we don't actually need to know all the prime factors of a compound number to determine whether it is ugly or not. We just need to know whether it has at least one prime factor that is different from 2, 3 and 5. For all numbers > 5 we can determine this without storing the factors as follows.

while(n % 2 == 0) {
  n /= 2;
}

while(n % 3 == 0) {
  n /= 3;
}

while(n % 5 == 0) {
  n /= 5;
}

if(n == 1) {
  ugly = 1;
}

The numbers < 6 can be treated as special cases, 4 is the only ugly one:

if(n == 4) {
  ugly = 1;
}

(Remember that ugly is initialised to 0 at the start of the for loop, so 1, 2, 3, 5 will not become ugly)

Thus all that is required (apart from checking that start and stop are valid numbers, with stop >= start, and printing some helpful message if the user has not supplied 2 correct arguments) is:

/* find "ugly numbers": their prime factors
   are all 2, 3, or 5 */
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
  int n, ugly, number;
  int start, stop;
  if(argc != 3) {
    /* print helpful message */
    exit(EXIT_FAILURE);
  }
  start = atoi(argv[1]); /*enter a range */
  stop = atoi(argv[2]);
  if(start > stop) {
    n = stop;
    stop = start;
    start = n;
  }
  for(number = start; number < stop +1; ++number) {
    n = number;
    ugly = 0;
    if(n == 4) {
      ugly = 1;
    }
    else if(n > 5) {
      while(n % 2 == 0) {
        n /= 2;
      }
      while(n % 3 == 0) {
        n /= 3;
      }
      while(n % 5 == 0) {
        n /= 5;
      }
      if(n == 1) {
        ugly = 1;
      }
    }
    if(ugly == 1) {
      printf("%d\n", number);
    }
  }
  exit(0);
}

From Simon Sebright `<simonsebright@hotmail.com>`

Prologue

I always feel slightly awkward when writing these things. Ideally the student would be with the critiquer and the result would be a dialogue. Instead, they have to be monologues and as such are not tempered by little indicators of reasons why things were done or not done in a certain way.

Introduction

Regardless of what comes next (which I have at this stage already written), I have to ask why this student did not find the answer to his or her own problem. A simple mental run through the code, or use of a debugger should have left the matter high and dry. Again this is an issue about which I can only write from my immediate perspective, with no feedback. My four year old daughter has just spent the evening asking me why, why, why? I wouldn't expect a student to have the same naivety.

Part I

The immediate logic error. I haven't run the code, but an inspection reveals that the variable ugly will not be set for the current number if the statement while(flist[idx]) never lets flow into the while loop. This is the case if the number is prime, so prime numbers have their ugliness determined by the last number to be processed. I won't proceed any further with this section because I feel that corrections to the logic are pointless without some design changes.

Part II

It's true there are some aspects of the code I could criticise on a low level, but I fear that would negate what I think are more important high level concerns. For example, I don't think #define THREE 3 is adding anything, but my suggestions below don't use a direct replacement. I'd prefer a static const int, but as in the classic town's person giving directions "I wouldn't start from here". The code appears to be C, although the question did not stipulate. C++ would give us better idioms for things, as I've mentioned below, but I'll stick to a functional approach for the bulk of the critique.

I think the main reason for the occurrence of the logic error is the lack of clarity of the function pf(). First of all, let's not be lazy and call it GetPrimeFactors(). Well, what are prime factors? We have to decide whether or not they should include 1 and the number itself (if prime). I'm not familiar with the mathematical definition, so we'll err on the side of the student and say that these are not prime factors. What is pf() returning? Well, it's a list of prime factors. There's a lack of symmetry here. A C++ solution would have the function return a list containing the prime factors, here main() has to know how the list is allocated (if at all). Likewise, it has to know that the signal for the end of list is a zero (which can not be prime factor either way). list<> wins hands down.

Now, let's take a step back. What are we trying to achieve? The user wants to supply a numeric range. Within that range, we want to output all ugly numbers. Straight away, we could write some pseudo code here, not getting excited about the algorithm for prime factors, after all, I haven't needed to mention them yet:

main
get range to test with validation
for(each number in range)
if current number is ugly
output current number
end if
end for
end main

Well, now I think we need a function to see if a number is ugly:

bool IsUgly(int n);

Here, we get a chance to condense the ugliness of numbers into one place and to get rid of those silly #defines. A number is ugly if its prime factors are only 2, 3 or 5. Hmm, let's abstract that. A number is ugly if all its prime factors (here is where we find it handy that 1 and self are not prime factors) are to be found in the set 2, 3, 5. So, the IsUgly() function needs two things - a list of prime factors and a set of ugly prime factors to test against. Now, we could keep abstracting and have the IsUgly() function defer to a generalised list intersection algorithm passing in the ugly prime factors. For this exercise, that's probably going too far, but after all, we might change the definition of ugly (or generalise later to prettiness, etc.) How about this:

bool IsUgly(int n) {
  static const int UglyPrimeFactors[]
    = { 2, 3, 5, 0 };
  // again, note the need for a
  // termination condition, something not
  // necessary in a list<>
  // Get prime factors
  // See if all prime factors are one
  // of ugly prime factors. In C++,
  // this could be some sort of
  // intersection call on the list of
  // prime factors and the list of ugly
  // prime factors
}

Once and once only, and preferably without the need for comments, that's my motto.

I have to say that in my professional life, I rarely come across main() or command line arguments, so I won't go into too much detail here.

There does seem to be a rather user-unfriendly approach with neither help text available nor error messages in response to "invalid" input. I think I'd like to have main() process input, help text and the like and if all OK, call a function to do the meat of the program, that way we have a function which can be called from main() here, or some UI dialog class, etc. And, test harnesses. So, we can write a test harness which tests a large range of numbers against known ugliness. Ditto for the GetPrimeFactors() function, which we can sort of test through the testing of IsUgly(), but to be sure, we'd again test it with known input/output combinations.

I've now lost the code as my wife has tidied up (put her things on the desk), so nitty gritty comments are out of the window. From what I remember, I'd like to see the word ugly mentioned in code, not comments. My pseudo code above never had an issue with uninitialised variables, I like to declare them where needed. I know in C this was not always possible, but my pseudo code was not in C and didn't have the given logic problem. I still feel like the student ought to be here and contribute or sob. By now, I've totally lost the plot, beer goggles fogging the monitor, but I'll tender one last point - don't get too excited about implementation details. For me the exciting thing is planning what details are going to be needed. It seems as if the student was hell bent on writing a wicked prime factoring function. When you are a professional programmer, that's the last of your worries, or you really are locked away in a darkened room. Rather, you need to step back, look at the view and take it in (understand it).

The Winner of SCC 27

The editor's choice is: Nevin Liber

Please email <francis@robinton.demon.co.uk> to arrange for your prize.

Francis' Commentary

I chose the student code for this critique because there was a point that I wanted to make. The student is more competent than most in so far as coding in C is concerned. Magic numbers are avoided, all uppercase is used for pre-processor constants and uses lower case for other variables. EXIT_FAILURE is used after checking for the correct number of command line arguments fails. However there are a couple of points about the code:

I do not think naming 2, 3, 5 actually adds anything in this case.
replacing return 0 with return EXIT_SUCCESS would add even more polish.
the code does not validate that the command line arguments are actually suitable ones.

Once the logic error has been dealt with the program illustrates reasonable competence with C and from that respect I would give a good grade. However it still, in my opinion, is poor programming because it is a very heavy handed solution to the problem. That is the main point I wanted to make, there is a great deal more to programming than writing syntactically correct source code that produces the correct answer. So let me tackle the problem from the start.

The definition of an ugly number is one that has no prime factors other than 2, 3 and 5 [that is not how I would have defined an ugly number but that is irrelevant]. What the student has done is to reduce each number to its prime factors (not very efficiently as it happens, but that is another artefact of lack of clarity of thought). Then the code checks to see if any of the factors are other than the specified 2, 3 and 5.

Let us step back and think a little. What the problem requires is that we ignore all factors that are 2, 3 or 5 and see if there are any others. We can simply discard the factors in which we are not interested. That idea leads me to write the following little function:

int remove_factor(int number, int factor) {
  while((number % factor) == 0)
    number /= factor;
  return number;
}

In other words as long as the number is exactly divisible by factor (not a perfect name, can you suggest a better one?) reduce number by dividing by factor.

Now look at that little bit of code and decide if there are any ways in which it could fail. OK, it assumes that neither number nor factor is zero. The reason for failure will be different in the two cases. If number is zero the function is well defined, it simply goes into an infinite loop. If factor is zero we have a divide by zero error. Should remove_factor() handle either of those problems? Doing so is not cost free. It is a design decision, but one that can now be clearly identified and handled as the programmer chooses.

We can use this little function to determine if a given number is an ugly one:

bool is_ugly(int number) {
  number = remove_factor(number, 2);
  number = remove_factor(number, 3);
  number = remove_factor(number, 5);
  return (number == 1);
}

You might not want to use bool as the return type but that is your choice. Again there is the issue of whether number is a suitable candidate. We have two issues here: it must not be negative and it must be a compound number (i.e. it must have more than one prime factor). As the second is a defining property of being 'ugly' I think it should be dealt with here. So our function gets modified to:

bool is_ugly(int number) {
  if(number == 4) return TRUE;
  // direct test for smallest composite
  if(number < 6) return FALSE;
  number = remove_factor(number, 2);
  number = remove_factor(number, 3);
  number = remove_factor(number, 5);
  return (number == 1);
  // only 2, 3 and 5 were factors
}

Note that the second test deals with all the other cases (number being zero or negative) and also deals with one of the possible problems with remove_factor(). The other problem with remove_factor() is eliminated because we haven't actually tried to remove zero as a factor. Deal with problems at an appropriate level and assume that the caller of a function knows the pre-conditions (because they have been documented in the manual.)

Now I am ready to write my program:

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

int main(int argc, char *argv[]){
  if(argc != 3) {
    bad_command_line_args("Wrong number");
    return EXIT_FAILURE;
  }
  int first = atoi(argv[1]);
  int last = atoi(argv[2]);
  if((first < 1) || (first > last)) {
    bad_command_line_args("Bad range");
    return EXIT_FAILURE;
  }
  for(int i = first; i <= last; ++i) {
    if(is_ugly(i)) printf("%d\n", i);
  }
  return EXIT_SUCCESS
}

I haven't supplied the code for bad_command_line_args() but writing it should be trivial, certainly for a student who wrote the code we are critiquing. Note that this program is considerably shorter than the original even though it does a great deal to trap errors. All the code conforms to the C99 Standard though some, such as late declarations does not conform to the older standard.

The interesting feature of the above code is that it uses less C technology than the original yet, I hope you will agree, it is a much better program. One of the arts of programming is to use simple things well.

Student Code Critique 28

(Submissions to <caabeiro@vodafone.es> by July 10th)

Program 1

I'm newbie to C++ and I would like to know which would be the best (elegant and correct) solution for the following small (string) read problem with gets.

#include <iostream>
using namespace std;
#include <cstdio>
main() {
  int n;
  int waste; // needs this to work
             // (read) properly!!
  char name[51];
  cout << "Enter any integer number...\n";
  cin >> n;
  cout << "Enter your name...\n";
  cin >> waste; // 'gets' does not read the
                // name without this line!!
  gets(name);
}

Mixing C and C++ functions doesn't seem the best way to write this program. Please provide alternatives, taking also into account its security implications.

Program 2

If I enter 23 and 5 the answer should be 23 * 5 = 115 my answer is off by five or whatever number 2 is. I was told I am not including the first two numbers in the loop. I thought when I cin the numbers it is including them. Does anyone have any suggestions on how I can fix this?

#include<iostream>
#include<iomanip>
#include<string>

using namespace std;

int main()
{
  while (1) {
int num1, num2, total = 0;
cout<<"Enter two numbers: ";
cin>>num1>>num2;
while (num1 != 1)
cout<<setw(5)<<num1<<setw(5)<<num2<<endl;
  num1 /= 2;
  num2 *= 2;
  if (num1 % 2 != 0)
    total += num2;
}
  cout<<"the total is "<<total <<endl;
} //End While 1
return 0;
}

There are numerous errors in both the code and the student's understanding. Please address these comprehensively.

Notes:

More fields may be available via dynamicdata ..