Journal Articles
Browse in : |
All
> Journals
> CVu
> 144
(17)
All > Journal Columns > Code Critique (70) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Student Code Critique Competition
Author: Administrator
Date: 06 August 2002 13:15:54 +01:00 or Tue, 06 August 2002 13:15:54 +01:00
Summary:
Body:
Prizes provided by Blackwells Bookshops & Addison-Wesley
Please note that participation in this competition is open to all members. The title reflects the fact that the code used is normally provided by a student as part of their course work.
Note that this item is part of the Dialogue section of C Vu which is intended to designate it as an item where reader interaction is particularly important. Readers' comments and criticisms of published entries are always welcome.
You were asked to write a response to the author of the code below in which you try to identify the causes of his/her difficulties. Of course straightforward bugs had to be dealt with as well.
#include<stdio.h> void count_up(int start, int end, int step) { printf("enter the upper limit"); scanf(%d; &start) printf("enter your lower limit"); scanf(%d; end); printf("enter step size"); scanf(%d; step) int x; for(x=start; x<=finish;x=x+step) { printf("\n %d",x); } } int main() { count_up(1, 10, 1); count_up(-10, 10, 2); return 0; }
I only had two entries. Well let me rephrase that; I can only find two entries though I have a nasty feeling that I have lost one. In addition I have a late contribution for the previous competition. That one raises the problem of the time it sometimes takes for C Vu and Overload to reach some members. Of course it mostly does not matter. However this is one column where timeliness is important. I will see if we can persuade our webmaster to place each competition item on our website so that everyone can fetch it from there even if their issue of C Vu is late. Which reminds me, it might help if our website listed the current issue numbers for both C Vu and Overload. Francis
From David Caabeiro <dac@globalmente.com>
[Please note that David is not a native English speaker. I have done some copy editing but only to help his text flow more smoothly. Francis]
There are several things to point out. First I'll discuss the ones I consider most important.
The student seems to have some misunderstanding regarding function parameters, so I'll try to explain some vocabulary.
When calling a function, you pass it a certain number of arguments, which are values the function will work with while executing. When a function is finished, you typically get back a return value (for those with some mathematical background, this is somewhat equivalent to a scalar field).
How do you create a function? First you have to introduce an identifier to the compiler. This is called a declaration, and its purpose is telling the compiler what the function looks like. This way, it will be able to check it is used correctly [0]. But this is not enough, you still need to code the function itself and get the storage necessary to hold it. This is called a definition.
So how do you declare a function? Simple, you just specify the function name, the parameter list type, and the return type. For example, suppose we want a function that given the (x, y, z) coordinates in space it returns the distance from the origin (yes, this is a scalar field). This function could have three parameters (for each coordinate) and return a value (the distance). So we could declare it:
double dist(double x, double y, double z);
The first keyword is the return value: a double. The parameters are enclosed in parentheses after the function name, separated by a comma. The semicolon indicates the end of the statement. The parameter names are optional, so a valid declaration could also be:
double dist(double, double, double);
But as you may note, sometimes it's not easy to guess what the parameters are used for.
A function definition is similar to a declaration, except that it has a group of statements enclosed in braces. So in our previous example, a possible definition could be something like this:
double dist(double x, double y, double z) { return sqrt(x * x + y * y + z * z); }
Note that this time you do have to specify an identifier for each parameter, which will have automatic storage duration [1]. Also note that since the braces replace a statement or group of statements, there's no need for a semicolon. [a semi-colon before the opening brace would be wrong because the compiler would treat that as ending a declaration. One after the closing brace would also be a technical error because it does not meet the requirements of the grammar of either C or C++.]
Now we can take a look at the code and point out the most evident mistakes.
First of all, there's a missing closing brace at the end of main() (don't know if this was a real mistake or a printing failure). Inside count_up(), there's a missing ';' at the first and last call to scanf(), to indicate the end of a statement. Also, the calls to scanf() are incorrect, as its prototype is
int scanf(const char * restrict format, ...); //[2]
where the variable argument list represents the addresses of the objects to receive the converted input. So a call should have the form:
scanf("%d", &var);
Note the use of the comma to separate the parameters, and the & operator to pass the address of the object.
There's also some inconsistency between the messages shown and where these values are stored (for example, the upper limit is stored in a variable called start). Finally, there's an identifier that was not declared: finish, probably meant to be end.
Now let's see the basic misconception the student is making. He correctly declared a function with three parameters. But then he reassigns all of them with the values gotten from the user. This way, the arguments passed in the function call are lost.
So how can we correct this? One way is creating a function that receives three parameters, as the student already did. But our function will just iterate and print out the results:
void count_up(int start, int end, int step) { int x; for(x = start; x <= end; x += step) { printf("%d\n", x); } }
That was easy. Now comes the tough part. We have to read the input from the user, and make sure it makes sense. Note that count_up() doesn't make any validity checking. It relies on external code for this (of course this is a personal choice). This is one possible way to get a value from the standard input:
printf("Enter the lower limit: "); fflush(stdout); if(scanf("%d", &start) < 1) return EXIT_FAILURE;
Some things to note: first of all, because of buffering, we should make sure the message is flushed for the user to see it. This is the reason the explicit call to fflush() is made. Also note that the return value of scanf() is checked against invalid input. As we're reading just one value at a time, we check for exactly one value read correctly. In case of failure, why do we return instead of asking the user for another try? One of the problems with scanf() is that it is not a simple task to know the status of the stream in case of failure [3].
At last, we should check the step size is a positive number, otherwise, we'd get strange results (e.g. infinite loops in case of zero, etc.)
I also check that the lower limit is at most equal to the upper limit (anyway, if this checking were not present, the loop wouldn't execute)
So how would our program look like?
#include <stdio.h> #include <stdlib.h> /* Step up and print within the given limits * Makes no validity checking */ void count_up(int start, int end, int step) { int x; for(x = start; x <= end; x += step) { printf("%d\n", x); } } int main(void) { int lower, upper, step_size; /* Now we'll try to get the three values * needed In case of error the program * terminates */ printf("Enter lower limit: "); fflush(stdout); if (scanf("%d", &lower) < 1) return EXIT_FAILURE; printf("Enter upper limit: "); fflush(stdout); if (scanf("%d", &upper) < 1) return EXIT_FAILURE; printf("Enter step size: "); fflush(stdout); if(scanf("%d", &step_size) < 1) return EXIT_FAILURE; /* Before calling count_up(), we have to * make sure the values are valid */ if(lower > upper) { fprintf(stderr, "Invalid limits\n"); return EXIT_FAILURE; } if(step_size <= 0) { fprintf(stderr, "Invalid step size: %d\n", step_size); return EXIT_FAILURE; } count_up(lower, upper, step_size); return 0; }
End notes:
[0] See subclause 6.7.5.3 of the ISO-IEC 9899:1999 for more details.
[1] See subclause 6.9.1 of the ISO-IEC 9899:1999 for more details.
[2] See subclause 7.19.6.4 of the ISO-IEC 9899:1999 for more details.
[3] It is generally recommended to use fgets() and strtol() in these cases.
From: Stuart Golodetz <sgolodetz@dial.pipex.com>
Dear Student,
There are a number of issues with your code that need attention, so rather than writing a lengthy introduction, I'll launch straight in and try and explain the problems. The most urgent problems are syntax errors, since these will stop your program compiling: until you fix them, you can't even use your debugger to see where you're going wrong, since you can't even build the program, let alone run it.
The first problem to note is that you have missed a closing brace at the end of your main function. Since the function body must be enclosed in braces, omitting the ending brace will cause problems, due to the fact that the compiler thinks everything up to the closing brace is part of the function definition. In this case, it will reach the end of the file without finding a closing brace, which will cause it to choke.
The next problems we need to look at are in the count_up function. Your calls to the scanf function are rather haphazard and seem to indicate that this is something that you have been taught, but have forgotten. The scanf function is declared as follows:
int scanf(const char *format, ...);
It takes a variable number of arguments, but the first argument tells it how many to expect. Because it needs to read into the variables you specify, you must pass pointers to the variables rather than the variables themselves (unlike C++, C does not have references). This is because parameters are passed by value, meaning that a local copy of them is made on the stack, hence if the scanf function took variables as its parameters rather than pointers to them, it would be modifying local copies rather than the variables themselves. Looking at the code you've written, several points can immediately be made:
scanf(%d; &start)
-
The first parameter is of type const char *. The usual method of passing a format parameter is by using a string literal, which needs to be enclosed in speech marks. Hence you need to write "%d" rather than simply %d.
-
When calling a function, you must separate arguments to it with a comma, not a semi-colon. If you use a semi-colon, the compiler thinks it's the end of the statement and spits out an error. The most likely reason behind this is that you were trying out an example from a book and misread the character (as the visual difference between , and ; in some fonts is negligible). If this is the case, then I can only recommend you look more carefully at the code. Accuracy is absolutely vital when programming.
-
You are missing a semi-colon at the end of the statement. This is probably a simple typographical error, in which case the only solution is to pay more attention when you're typing code.
When all these corrections have been applied, the final version of the statement is:
scanf("%d", &start);
Your other scanf calls suffer from similar problems, but also an additional one relating to the information about scanf I presented above. There is a marked difference (one works, the other causes undefined behaviour: anything from a simple crash to evil demons descending on your house and frying your computer - the standard doesn't specify) between scanf("%d", &end); and scanf("%d", end);. The first one works as intended, but the second one is a common example of how to crash your program. The point is that if (for example), end contains 0, the function will merrily go right ahead and try to dereference it, as if it were a pointer. Needless to say, the results are undefined.
We're entering the home straight as regards syntax errors! Examine the following (whilst bearing in mind that this does not apply if you're using C99):
int x;
A simple declaration of an integer, you think. Unfortunately, you seem to have forgotten the famous estate agent's mantra: "Location, location, location." In this case, remembering it would have led you directly to the solution to your problem, namely that it needs to go at the start of the scope.
The final syntax error can now be ironed out. In the line:
for(x=start; x<=finish;x=x+step) {
finish is an undeclared identifier. It's fairly clear that you've made this error by forgetting that you called your variable end. The best way to avoid these errors in the future is to adopt a consistent naming style so that you know, for example, that given a choice between end and finish, you always prefer the former. Whilst we're on this line, it's also worth pointing out that it is generally better style to write x=x+step; as x+=step;. With all these syntax errors mercifully out of the way, the (now compilable) code looks like this:
void count_up(int start, int end, int step) { int x; printf("enter the upper limit"); scanf("%d", &start); printf("enter your lower limit"); scanf("%d", &end); printf("enter step size"); scanf("%d", &step); for(x=start; x<=end; x+=step) { printf("\n %d",x); } } int main() { count_up(1, 10, 1); count_up(-10, 10, 2); return 0; }
Now would be a good time to rejoice, were it not for the fact that the code still doesn't work as originally intended. To understand why this is so, examine the function calls in main. You are passing parameters to the function, yet they are not being used! Clearly this is less than cunning (something technically known as a logic error).
The solution, as it turns out, is fairly simple and has the handy sideeffect of fixing the logic error of incorrect prompting (unless step < 0, start is the lower limit, not the upper limit). The solution is, in fact, to simply delete the prompt code. After all, we have already passed the two limits and the step value into the function; we don't need to query a poor, overworked user for them! The revised code now looks as follows:
void count_up(int start, int end, int step) { int x; for(x=start; x<=end; x+=step) { printf("\n %d",x); } }
This is pretty close to what we want, as well as being much simpler than the original. However, there are a couple more problems we need to fix. First of all, consider the following call:
count_up(1,10,0);
This seemingly innocuous call will put our function into an infinite loop. (Or as one wise programmer once put it: "Oops!") There are two possible solutions to this:
-
We can define a sensible behaviour for our function if step is 0. Perhaps it could output the start value once and then return. Perhaps it could just return without outputting a thing. However, we need to consider whether it actually makes sense to pass a step value of 0. Rapidly coming to the conclusion that it doesn't, we proceed to:
-
Abnormally terminate the program if step is 0. The easiest way to do this is using the old workhorse, assert:
#include <assert.h> void count_up(int start, int end, int step){ int x; assert(step!=0); for(x=start; x<=end; x+=step){ printf("\n %d",x); } }
The function is now both correct and safe. However, the name count_up is misleading, since the function outputs the numbers. In fact, it is difficult to find a sensible name for it, which suggests that perhaps it isn't an altogether sensible function at all. It would probably be better to simply write the function body (which is essentially just a for loop) into the calling function, especially since C89 does not support inline. A further point is that the output might look better if you had written printf("%d\n",x);, although this is an aesthetic point which may or may not be helpful.
Hope this helps,
I wonder what Francis is after. There are quite a few syntax errors scattered through the code. And I suspect that one of those (the lost closing brace of main) was an artifact of the process of laying out the material for printing (It was. Francis). However one thing sits up and shouts at me, the curious case of a function that ignores its arguments and promptly asks the user for new values. That is where I am going to focus most of my attention. Syntax is easy to fix, but that mistake suggests that the student does not yet understand what a function is and how it works.
Most programming languages are built round the idea that programmers can invent new things and give them names. We need to distinguish three fundamentally different places where names will occur in a program. These are declaration, definition and use.
A declaration is when we introduce a name into our code. We use names for lots of different things. The most important to start with are names of functions and names of variables. Actually we usually combine the declaration and definition of a variable and write things like:
int start;
which both tells the computer that start is the name of some int storage (the declaration) and instructs the compiler to provide that storage (the definition). If we do not want to define (provide storage) for a variable but just want to tell the compiler that you will provide that storage elsewhere (i.e. we just want a declaration) we have to write:
extern int start;
Do not worry about that now, because separation of declaration and definition of variables is unusual in C until you get to things like struct. But that is another story that we will leave for another day.
Function names are rather different because as we get to writing larger programs we want to be able to split them into many files (which can be reused in other programs). More than that, we want to separate uses of functions from definitions. A function can only be defined once (well think about it, how would you cope with two definitions, if they were different you would have a mess, and if they were the same you are wasting time writing them twice.) It can be used many times - that is a basic reason for writing functions. Everywhere it is used the compiler needs to know what we are talking about even though it only needs to know how to do it once. The former is what we call a declaration. For historical reasons there are two types of function definition in C, the original one, often called a K&R style function definition and the newer (and safer one) that is called a prototype. Most of us only use prototypes these days and so I am not going to say anything about the other form here.
A prototype has 3 parts. The first part is a type (int, double, char* etc.) that tells the compiler what kind of result it is going to get when the function is used (called). The second part is the name that identifies the function. The third part is a comma-separated list of types that is placed in parentheses. That third part tells the compiler what information the function will expect to be included when the function is used. That last part is called a parameter list. We can, but we do not have to, add a name after each type in the parameter list. The compiler does not care, but those names can help the programmer understand what information is needed when the function is used (called).
A pure declaration ends with a semi-colon. If we end it with an opening brace it becomes a definition. Actually it is impossible to write a definition that is not also a declaration. You cannot tell a compiler how to do something without at the same time telling it what you are doing. So now let me relate this to your code:
Inside your main you use count_up twice. Assuming that the user understands what the name of the function means s/he would expect the program to output something like:
1 2 3 4 5 6 7 8 9 10 -10 -8 -6 -4 -2 0 2 4 6 8 10
and is likely to be more than a little surprised to receive:
Enter upper limit:
Even more surprising is that when s/he responds to the three questions with 10, 1, 1 the program exits without any further output. Well that assumes the syntax errors and use of an undeclared variable are corrected. Now let me give you a reworked version:
#include<stdio.h>
Note that stdio.h is largely made up of declarations of things that are provided by a library file that will have been compiled from the definitions of those declarations.
void count_up(int start, int end, int step);
That is a prototype (function declaration) that tells the compiler that count_up when used will need to provide values for three ints. It tells the programmer that these three ints will be used as a start, end and step size. The declaration also states that there will be no usable return value (the effect of having a void return type).
int main() { count_up(1, 10, 1);
And the compiler can prepare code that will connect those three values with space it assumes the definition of count_up will provide. The job of making the connection is done by something called a linker (it links uses with definitions, hence the name).
count_up(-10, 10, 2); return 0; }
Now either here or in some other file that we compile and give to the linker we must define count_up(). Here is a reasonable implementation:
#include<stdio.h> void count_up(int start, int end, int step) { int i; if(step <= 0) { puts("zero step size, output suppressed"); } else { for(i=start; i<=end;i += step) { printf("\n %d",i); } } }
Note how I handle inappropriate values for step. Later I would want to modify that strategy, but for now that seems the best answer. Trap bad values, issue a message and then ignore them. I do not bother to trap start>end because that simply does nothing anyway.
Now, I can hear muttering about why separate the declaration from the definition. The simple answer is that I can easily change my mind about the definition, indeed I can have several different definitions in different files and make my choice when I come to link the code together. Of course there are other reasons for separating out function definitions from declarations but you will learn those from experience.
From Brett Fishburne <william.fishburne@verizon.net>
Let me say, first, that on all the systems to which I had access the code compiled and ran flawlessly. This puts me at a dramatic disadvantage.
Unfortunately, lint also failed to find any problems that could cause the segmentation fault on any of these systems.
Thus, my analysis may be completely wrong as I cannot test a "corrected" solution and get different results. Given that, please bear with me and do not laugh too hard if I am all wrong.
My first line of thought was to address what the meaning of the "segmentation fault" could be. A segmentation fault is almost always associated with improper use of a pointer. In my case, it is almost always an inadvertently uninitialized pointer.
Since tmp=*a did not cause an error, it seems that the bad pointer must be "b." I rigorously checked the logic of the reverse subroutine insuring that "b" could never be out of bounds:
Minimum possible value of b:
-
b is smallest when i is largest, since b is reduced in size by i.
-
i is largest when i = (strlen(str)/2)-1.
-
by 2. the minimum of b = str[strlen(str) - ((strlen(str)/2) - 1) - 1] = str[strlen(str) - (strlen(str)/2)]
-
Since the strlen(str)/2 is always less than strlen(str) for any strlen > 1, the minimum value of b is always within the string.
Maximum possible value of b:
-
b is largest when i is smallest, since b is reduced in size by i.
-
i is smallest when i=0.
-
therefore, the largest value of b = str[strlen(str)-1] which is the last character of the string.
-
Since b cannot exceed the length of a string for any strlen > 1, the maximum value of b is always within the string.
Indeed, b is never out of bounds. That doesn't mean, however, that swap could not be passed NULL pointers, so my next thought was that swap should check for NULL pointers simply for completeness. It is clear from the analysis, however, that, as written, swap will not be called with NULL pointers and the problem is not that b has been assigned a NULL or rogue value. b is always within the range of str.
This had me stumped for several days. Finally it occurred to me that I was looking in the wrong place. If str was out of range, then b would be out of range! Of course, then, so, too, would a and we know from the error message that a was not out of range (otherwise the assignment to tmp should have caused a segmentation fault. Of course, there is the possibility that a just happened to point to a valid value, but that kind of "luck" doesn't go well with such a structured code critique. This did point out to me, however, that reverse should also check for a NULL string.
Again I was stumped for several days. At long last the breakthrough came. "cheese" is a string literal! How is this important? Well, for those of you who have the ANSI C standard handy, flip to 6.1.4 which tells you that string literals should NOT be modified. When a string literal is modified, the behaviour is "undefined" and that means that it should not be done. This is the explanation for why some compilers don't have a problem with this code. The "undefined" behaviour is to allow this particular modification. In real life, however, a string literal should be treated as "const char *" not "char *".
It seems only fair that in the case of this student's code, the student should have received a compiler error that warned of this (hang about, how could the compiler do that? It needs to do quite a bit of analysis to spot the problem. If reverse and swap were in different files the problem gets real nasty real quick. - Francis), not just a run-time error. Fairness, however, is rarely the order of the day. Now we know the cause of the segmentation fault and how to resolve it (don't use a string literal!) The question, however, is whether there is a way to prevent someone from calling reverse with a string literal. The answer is no.
Since the assignment is to reverse the string "in place" it doesn't make sense to rewrite reverse so that it passes back a reversed string. A hefty comment would be nice, but that is about the best that can be done in this regard.
Let us now address the structure of the program itself, having resolved the "dirty little error" and with resolution in hand. The first, and most obvious error, is a failure to include <stdio.h>. Since printf is used, this inclusion is appropriate, thus the headers should be:
#include <stdio.h> #include <string.h>
I, personally, hate the concept of "side-effects." swap is nothing but one big "side-effect." On the other hand, swap does exactly what you would expect it to do, so the "side-effect" isn't so bad. I still prefer for a subroutine to return an error when an error is possible, so I would rewrite swap to account for NULL pointers as follows:
/* Swaps a and b returning 0 when successful. * Does nothing and returns -1 or -2 if a or b * is a NULL pointer,respectfully */ int swap(char *a, char *b) { char tmp; if(a == (char *)NULL) return(-1); if(b == (char *)NULL) return(-2); tmp = *a; *a = *b; *b = tmp; return(0); }
Along the same lines, reverse is similar to swap. I would, thus, make similar changes:
/* Reverses a string in place - don't call * with a string literal! Returns 0 when * successful, -1 when called with a NULL * pointer, and positive on a print error */ int reverse(char *str) { char *a, *b; int i, err=0; if(str == (char *)NULL) return(-1); for(i=0; i < strlen(str)/2; ++i) { a = &(str[i]); b = &(str[strlen(str)-i-1]); (void)printf("swapping %c & %c\n", *a, *b); if(err=swap(a,b)) { if(err == -1) { (void)fprintf(stderr, "The first character is NULL\n"); } else { (void)fprintf(stderr, "The second character is NULL\n"); } return(-1); } } return(printf( "The reversed string is: %s\n", str)); }
The main program must be declared as type int according to the most recent standards, so that should be added, the string literal needs to be fixed, and a value returned:
/* Creates a string and then reverses it in * place. Returns 0 on success, negative on a * NULL pointer error, and positive on a * printing error */ int main() { char notliteral[7]; (void)strncpy(notliteral, "cheese", 6); notliteral[6] = '\0'; return(reverse((char *)notliteral)); }
[Well I would have written:
char notliteral[] = "cheese"
instead of those three lines. Francis]
The return value from strncpy is ignored because it is a pointer to notliteral. On the other hand, it is always a good idea to be sure that any string is properly terminated, so the last possible character of notliteral gets a NULL (I disagree, you should know what your common tools do - Francis). In this case, the NULL character is necessary, because strncpy would only copy up to the 6th character (which did not include the implied NULL character on the string literal). [But it would have copied the null terminator had you used seven instead of six. I think the mechanism you used is far too error prone, being packed with magic numbers. Francis]
I ignore the return codes from printf and fprintf (hence the void cast) most of the time because there is nothing I can do if those commands fail in this case. I can't warn the user (printing already failed) and the whole function of the program is to print the reversed string. If that fails, then the main program sends the error on to the user as its return code.
[void casts are not often used these days because they actually add nothing other than tell maintenance programmers that you know you ignored the return value. But how often is ignoring a return value an error? And, by the way, note that the entire printf and scanf family of functions are ones where the side-effects are what we want. - Francis]
It is time for a little C++ teaser. Identifying the problem should not be too hard but I want rather more than that. I want suggestions on coding idioms that will make the student's life easier. And I will give you a big clue, making a function template a friend of a class template is not a brilliant idea. Remember that all elements of code are up for criticism. Critiques by September 16 to: <francis.glassborow@ntlworld.com> please.
What's wrong with the program below?
template <int N> class T { public: friend T operator+ (const T&, const T&); private data[N + 1]; }; template <int N> T<N> operator+ (const T<N>& S1, const T<N>& S2) { return S1; } int main() { T<64> a, b, c = a + b; }
I can compile it, but I can't link it- the operator isn't found.
The editor's choice is: David Caabeiro. Congratulations David! Please email Francis to arrange for your prize.
Notes:
More fields may be available via dynamicdata ..