Journal Articles
Browse in : |
All
> Journals
> CVu
> 153
(14)
All > Journal Columns > Code Critique (70) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Student Code Critique Competition
Author: Administrator
Date: 06 June 2003 13:15:58 +01:00 or Fri, 06 June 2003 13:15:58 +01:00
Summary:
Body:
Please note that participation in this competition is open to all members. The title reflects the fact that the code used is normally provided by a student as part of their course work.
This item is part of the Dialogue section of C Vu, which is intended to designate it as an item where reader interaction is particularly important. Readers' comments and criticisms of published entries are always welcome.
I'm creating a program that inputs three integers, and returns the sum, product, average, largest and smallest. Very simple, but I'm getting a bad return on one of my variables. Except for my "largest" variable, the others are returning the correct numbers. I'm hoping that someone can tell me where I'm going wrong.
#include <stdio.h> int main(){ int num1, num2, num3, sum, average int product, smallest, largest; puts("Input three different integers: "); scanf("%d, %d, %d",&num1,&num2,&num3); // read three integers sum = num1 + num2 + num3; average = ((num1 + num2 + num3)/ 3 ); product = num1 * num2 * num3; if(num1 < num2 ) smallest = num1; if(num2 < num1 ) smallest = num2; if(num3 < smallest ) smallest = num3; if(num1 > num2 ) largest = num1; if(num2 > num1 ) largest = num2; if(num3 > largest ) largest = num3; printf("\nSum is %d",sum); printf("\nAverage is %d",average); printf("\nProduct is %d",product); printf("\nSmallest is %d",smallest); printf("\nLargest is %d\n",largest); return 0; }
Please note that I am only doing this as a favour to Francis and not as a competition entry.
I am going to assume that this student has just started and is attempting one of those ridiculous exercises that thoughtless authors and instructors set. I assume that the student has not yet learnt about arrays.
Clearly the student has been taught that the structure of a program is:
Input
Compute
Output
This is the kind of brain deadening guideline that results in bad code. For example, it is because of this that the program declares three similar variables whose only difference is the order in which they acquire values from input. In fact almost all the student's problems are directly attributable to this 'design' decision. Experience teaches me that the instructor will probably be entirely happy with this aspect of the program and waste time on the implementation faults that arise because of it.
Yes, the student should have used the largest available integer type for input (assuming that the question specified input). The student should have considered that an average would generally be of a floating-point type. There is also the issue that when computing the product, we might get overflow. I am going to ignore these issues because it is likely that the student simply does not have the tools to cope with such issues. For example how do you detect overflow when dealing with integer multiplication? Note that actually overflowing an integer type results in undefined behaviour. (The best solution I can come up with is to do all the computation in a long double, check that the result is within the range of a long int - expressed as a long double - and finally convert the result to a long int - assuming that an integer result is required. Such programming techniques are way beyond what we should expect from a student who does not yet know about arrays and loops.)
Anyway here is my solution with commentary:
int main() { long int number = 0; long int largest = LONG_MIN; long int smallest = LONG_MAX; // The above is an idiom for use when you are // going to compute a largest/smallest value, // initialise to the smallest/largest possible long int sum = 0; long int product = 1; puts("WARNING: If you give inappropriate \n" "values to this program you\n" "will get silly results\n"); // Note the use of puts to supply pure text // output and the use of quotes to deal with a // multiline text statement. puts("Type in a whole number: "); scanf("%d", &number); // scanf is a poor function for general use // but with the Student's likely state of // knowledge it is appropriate. However // warnings should be given that this is not // a desirable long term mechanism. if(number > largest) largest = number; if(number < smallest) smallest = number; sum += number; product *= number; // The above give an opportunity to discuss // C's compound assignments and the need to // use initilaised variables // The following is simply a copy and paste of // the above input and processing code puts("Type in a whole number: "); scanf("%d", &number); if(number > largest) largest = number; if(number < smallest) smallest = number; sum += number; product *= number; puts("Type in a whole number: "); scanf("%d", &number); if(number > largest) largest = number; if(number < smallest) smallest = number; sum += number; product *= number; // Now output the results: printf("\nThe sum is %d",sum); printf("\nThe mean is is %d",sum/3); printf("\nThe product is %d",product); printf("\nThe smallest is %d",smallest); printf("\nThe largest is %d\n",largest); return 0; }
Please notice that this code is a very good way to introduce the concept of a for-loop as a way to avoid repetitious source code. Once the code has been rewritten with a for-loop it can be developed to handle more input, both by hardcoding the number of items and by obtaining the number of items interactively.
Unless we are going to re-use the input (for example by sorting it) there is no need for the student to know about arrays. Indeed I would contend that using an array for a question such as this one is overkill, not least because it requires the introduction of dynamic memory management if the question is generalised to user determined amounts of input.
However, were I mentoring students in C++ I would give any student who submitted such a pure C solution a very low grade. Why? Well I believe that students who head off in this direction are actually making their C++ progress much harder by being pre-occupied with low-level detail (such as the use of the address-of operator with scanf()). Yes, it is important that a student who presented the original code should be helped to understand why the code fails to work as expected. But the most important lesson is that bad design leads to logic errors. In this case there is no inherent difference between the three inputs and yet the submitted program handles them as if they were somehow different.
What saddens me is the frequency with which such opportunities to develop a student's design insights are ignored by tutors. It should not be four times more difficult to write a program that handles four numbers rather than three, and yet that, basically, is the consequence of the original design. Even without knowing about a for-loop, my design can be extended with a single copy/paste operation. That is not to encourage such, but to demonstrate that the solution of the problem for three inputs should generalise to the solution for n inputs. If it does not, it is a poor solution.
Critique by Catriona O'Connell <catriona38@hotmail.com>
The problem domain is not well-defined. The mathematical meaning of integer and the C language definition of integer are different. Problems of overflow can be minimised by using the largest possible representation (long) instead of int. However the GNU C compiler defines INT_MAX and LONG_MAX to be the same value (2147483647), so this doesn't necessarily help. The likelihood of problems might be further reduced by using compiler extensions. For example the GNU C compiler supports "long long int" with a maximum value of 9223372036854775807. The only way to avoid problems would be to read the input as a string. It would then need to be converted and processed using multiple-precision (MP) arithmetic routines. This is not likely to be within the student's ability. Without using MP routines there is no portable way to determine if arithmetic overflow or underflow occurs. If this happens then the results are undefined.
The user is asked to enter three different integers, but no check is made for uniqueness. This is important for the comparisons that are made to determine the largest and smallest numbers. If num1==num2 then smallest/largest will have initial value (0), so depending on the sign of num3, the returned value could be wrong.
In the scanf() format string it is probably a mistake to include the commas. This forces the user to separate the input integers with commas. Since scanf() is already smart enough to bypass whitespace it would be sufficient to specify that the numbers be space-separated. If commas are required, then the user should be told of this.
The only reason I can see to read in all three numbers at the same time is that it avoids the problem of excess characters being left in the input buffer. This would cause problems if scanf() were called multiple times. The student would either have to program in assignment to a string for the remainder of the buffer or be introduced to the delights of nonassigning scanf() calls such as
scanf("%*[^\n]"); // skip to end of line. scanf("%*1[\n]"); // skip one newline character.
to clear the input buffer.
The student does not check the return code from scanf() to ensure that three integers have been assigned before using what he/she thinks is valid input. This is a major error. It is almost impossible to recover gracefully from bad input using scanf() but it is even worse to plough on regardless.
scanf() should not really be used for interactive user input. If you are going to use it then it should be restricted to reading well-defined structure input - preferably computer-generated output. When you have enough experience to know how to use scanf() safely, then you probably know enough to avoid using it. There are very few cases where fgets()/sscanf() cannot replace scanf().
A better solution avoids the use of scanf() and instead uses a combination of fgets() to read a buffer of input and sscanf() to parse that buffer. fgets() has the advantage of gets() in that the buffer will not overflow, but input may be truncated if it is longer than the buffer. sscanf() works like scanf() except that it reads formatted data from a string instead of stdin.
The student's method for finding the largest and smallest values does not scale well. The sum, product, largest and smallest values can be calculated as running values, avoiding the need to have all three integers entered at the same time. The average is calculated as the sum divided by the number of integers entered. There is no need to recompute the sum for this calculation. The problem domain does not define if an integer average is acceptable or whether a double/float would be more appropriate. The variable average could be defined as double and the average calculated as sum/3.0.
The accompanying code is my attempt at a re-write using fgets()/sscanf() with return code checking.
#include <stdio.h> #include <limits.h> #define LOOP_COUNT 3 #define BUF_SIZE 80 int main() { long num, sum, average; long product, smallest, largest; char buffer[BUF_SIZE]; int i; int gotint; sum = 0; product = 1; average = 0; largest = LONG_MIN; smallest = LONG_MAX; for(i=0; i<LOOP_COUNT; ++i) { gotint = 0; while(!gotint) { printf("Input integer %d :", i+1); if(fgets(buffer, BUF_SIZE, stdin) != NULL) { if(sscanf(buffer, "%ld", &num) == 1) { gotint = 1; } else { printf("Error in sscanf()\n"); } } else { printf("Error in fgets()\n"); } } sum += num; product *= num; if(num < smallest) smallest = num; if(num > largest) largest = num; } average = sum/LOOP_COUNT; printf("\nSum: %ld", sum); printf("\nPrd: %ld", product); printf("\nAvg: %ld", average); printf("\nSml: %ld", smallest); printf("\nLge: %ld", largest); return 0; }
The first thing I noticed was the amount of code duplication. If the number of integers was changed from three then a comparatively large amount of change would be required to update the code.
The student appears to have split the code into distinct parts, one for input, a second for processing and a third for output, which is laudable. To keep this structure means all the inputs must be held until the input is complete; the obvious choice would be to replace the individual num1, num2, num3 variables with an array, but we have been told that arrays are not appropriate.
This suggests that the design should be changed to combine the input and processing stages so that each input is consumed and discarded in a loop, with the variables being reused each iteration and the results generated incrementally.
As for the results being wrong I initially thought of overflow in the addition and multiplication but we are told these are correct, it is in fact the largest value that is wrong. Looking again at the three lines calculating the largest value it appears it could go wrong if num1==num2, in which case neither of the first two lines will fire and largest will be uninitialised, the result of the comparison with num3 is therefore random. The student may have covered this by prompting the user for different integers but this is a very weak check. The same fault should be true of the calculation of the smallest value as the code is very similar, but we are told this is ok. A possible explanation is that the test numbers are likely to be small, and that the random values in uninitialised variables are unlikely to be zero in all the most significant bytes.
I wonder if the programmer is used to else clauses in the statements. Rather than reverse the operands as in the first two lines of the smallest and largest calculations, they could instead write:
if(num1>num2); largest=num2; else largest=num2;
This would also overcome the problem with not coping with the num1==num2 case.
[Look again, this has another bug, and one that will not always be detectable by the compiler. Francis]
As I have already decided this should be compressed into a loop, there will only be a single condition updated each loop.
if(newvalue>largest) largest=newvalue;
This presents the problem of how to ensure the first input is always assigned to largest. Here are two possible suggestions, either add special case code for the first loop:
if(loop count == 0) largest=newvalue; else { if(newvalue>largest) largest=newvalue; }
Alternatively, and my preference, is to initialise largest with INT_MIN from <limits.h> This is guaranteed to be small as the smallest input so the condition will work as expected. INT_MAX can be used for smallest.
Similarly the sum can be initialised to zero and product to one and both updated on each loop.
Finally there is the input method, which dictates how the program is used. Currently the input is separate from the command line so the user types:
name of executable (return) 3 2 1 (return) (or some other 3 numbers).
Is this is what is desired? Or should it run
name of executable 3 2 1 (return)
in which case the numbers should be extracted from argc and argv rather than a scanf. The numbers could even be input one per line:
name of executable (return) 3 (return) 2 (return) 1 (return)
This obviously depends on the user requirements. Seeing as this is a student assignment the exact method is probably less important than understanding the alternatives - so I will leave it as it is.
Not having programmed in C for a few years I vaguely remember something about preferring atoi over scanf for ints, but I cannot remember the rationale for this so I will stick with scanf.
The code definitely does need some input validation, the code expects three inputs but there is no check that scanf received them. The concept of never trusting user input is fundamental. At a minimum the scanf return should be checked, this will be equal to the number of variables successfully assigned.
Putting all these changes together with a few smaller items such as separating the constant 3, we get
#include <stdio.h> #include <limits.h> #define NUM_INPUTS 3 int main(void) { int product = 1; int smallest = INT_MAX; int largest = INT_MIN; int sum =0; int count; printf( "Input %d integers", NUM_INPUTS ); for(count = 0; count<NUM_INPUTS; ++count) { int num; if(scanf("%d", &num) != 1) { puts("error reading input"); return 1; } product *= num; sum += num; if(num < smallest) smallest = num; if(num > largest) largest = num; } printf("\nSum is %d", sum); printf("\nProduct is %d", product); printf("\nSmallest is %d", smallest); printf("\nLargest is %d", largest); printf("\nAverage is %d\n", sum/NUM_INPUTS); return 0; }
From Walter Milner <w.w.milner@bham.ac.uk>
This is a good attempt at a solution for which you deserve credit. However there are several details of your code which require comment, and the overall design could be improved.
Firstly the details. Your declaration:
int num1, num2, num3, sum, average
omits a trailing semi-colon, but since you are implying your code compiles, I assume it was there and has somehow got lost.
There are several issues raised by
puts("Input 3 different integers: "); scanf("%d, %d, %d", &num1, &num2, &num3);
If you remember, we discussed in class the limitations of scanf for input in real code. The first point is that the commas in the format string "%d, %d, %d" will need to match up with commas separating data values in what the user types in - in other words we are hoping the user enters something like:
1, 2, 3 <return>
If they omit the commas you get a result like:
Input 3 different integers: 1 2 3 Sum is -1717986919 Average is -572662306 Product is 687194768 Smallest is -858993460 Largest is 1
The use of a comma is made worse by the possibility that the user may separate thousand by using a comma, such as
1,000 2,000 3,000 <return>
These problems would be partly solved by changing the prompt to something like 'Enter three different integers separated by commas'. However users do not always obey instructions, and good code should cope with this. In other words, data input should be validated. For example suppose the user enters just 2 numbers, or non-integers, or numbers which are equal? Real code would check this, for example by inputting the data as a string, and then parsing it to check that it did consist of three different integers. We will look at this in detail in a later class.
The line
average=((num1+num2+num3)/3);
also contains several points. Firstly, the outermost pair of brackets is superfluous. Secondly you have just calculated
num1 + num2 + num3
so why do it again - in other words say
average = sum/3;
Thirdly we have a major point about data type. Suppose the user entered 1, 2 and 5. The average of these is 2.6667, which is not an integer. In other words average needs to be a floating point type, single or double. But even this is not enough, since
double average; ... average=sum/3; ... printf("\nAverage is %lf", average);
still gives 2.0000 for input of 1, 2 and 5. The problem is that sum/3; is the division of an integer by another integer, and the compiler uses integer division, which is 'guzinter', as in 5 guzinter 21 4 times remainder 1. If we want the compiler to generate code for floating point division, one way to do this is average = sum/3.0;
If we divide by 3.0 (a floating point constant) rather than 3 (integer) the compiler converts sum to floating point, then uses floating point division as we want. The difference between 3 and 3.0 is a small point (as it were) but an important one.
Another issue is the question of overflow. For example:
Input 3 different integers: 2000, 2001, 2002 Sum is 6003 Average is 2001.000000 Product is -577930592 Smallest is 2000 Largest is 2002
because on a system with 32 bit ints the product is larger than the largest integer which can be represented. This cannot be fixed by say using long instead of int - we could still input a value that was too large. We will see in a later class that <limits.h> contains a value for INT_MAX that we can use to deal with this.
Then we turn to the overall design - which in this case is not related to language-specific details. The overall plan of your code is
-
input the 3 values
-
process them to calculate the required values
-
output the results
The key question is about the storage of the input values - do we need to store all three values at the same time? Or could we deal with them one at a time, using this kind of plan:
for each value input it process it
Now some kinds of process require all values to be available. For example if we needed to find the standard deviation of some numbers, we need to calculate how far each is from the mean, so we need to first find the mean, then calculate the distance from the mean. Unless we input them all twice, this requires us to hold all values in memory at the same time. Similarly sorting them requires all values to be available at the same time.
But in this case by using an idea like 'the biggest one so far' we can do all the processing with only one value in memory at any one time. For example -
input first value biggest so far = first value for the rest if new value > biggest so far biggest so far = new value
We can do the equivalent for the smallest, the product, the total, and hence the average. Why bother? Because with this new design
-
It is trivially extensible beyond three values to any number
-
The storage requirement is small (one integer) and constant, independent of the number of values.
The first of these is probably the most important. Finding the largest of 3 values with the original approach required 3 ifs. Finding the largest of 4 requires a re-writing of the logic of the program. With the re-design, finding the maximum of 4, or 4000, is a trivial change.
A coding with this design follows. We change how many numbers to deal with by changing howmany, which could be done at run-time:
#include <stdio.h> int main() { int num; const int howmany = 3; double average; int product, smallest, largest, sum; int index; printf("Enter first number: "); scanf("%d",&num); smallest = largest = sum = product = num; for(index = 1; index < howmany; index++) { printf("Enter next number: "); scanf("%d",&num); if(num>largest) largest = num; if(num<smallest) smallest = num; sum += num; product *= num; } average = sum/(double)howmany; printf("\nSum is %d",sum); printf("\nAverage is %lf", average); printf("\nProduct is %d", product); printf("\nSmallest is %d", smallest); printf("\nLargest is %d\n",largest); return 0; }
The editor's choice is: Catriona O'Connell. Please email <francis@robinton.demon.co.uk> to arrange for your prize.
(Submissions to <francis@robinton.demon.co.uk> by July 6th)
When critiquing this code please do not restrict yourself to explaining the writer's perceived problem but deal with all aspects of the code. There are some nasty problems as well as issues of good coding practice.
Please look at the following code that is just playing with container of objects.
#include<iostream> #include<fstream> #include<vector> using namespace std; class A { public: A(int p1= 0) : x(p1) { cout<<"CTOR"<<endl; } A(const A& a) { x = a.x; cout<<"COPY CTOR"<<endl; } ~A() { cout << "DTOR: address = " << (long)this <<endl; } friend ostream& operator<<(ostream& out, const A& obj); private: int x; }; ostream& operator<< (ostream& out, const A& obj) { out << "x = "<<(obj.x)<<endl; return out; } int main() { vector<A> v; // Be inefficient, not production code for(int i=0; i<3; i++) v.push_back(A(i)); // Lo behold, container of objects copy(v.begin(), v.end(), ostream_iterator<A>(cout, "")); cout << endl; A * pa = &(v.back()); // pa points to address of reference // to the last element cout << (*pa); // info cout << "pa = " << (long)pa << endl; // address cout << "call pop_back()" <<endl; v.pop_back(); // Line A // destructor WILL destroy last element // Now pa points to deleted memory, should crash cout << (*pa); // Line B cout << "pa = " << (long)pa << endl; // address delete pa; // Line C return 0; }
Lines A and C delete the same memory, which shouldn't be allowed. But program runs fine allowing dereferencing at line B. Why?
Notes:
More fields may be available via dynamicdata ..