Journal Articles
Browse in : |
All
> Journals
> CVu
> 131
(16)
All > Journal Columns > Code Critique (70) Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: Student Code Critique Competition
Author: Administrator
Date: 05 February 2001 13:15:43 +00:00 or Mon, 05 February 2001 13:15:43 +00:00
Summary:
Body:
Prizes for this competition are provided by Blackwells Bookshops in co-operation with Addison Wesley.
This piece of code was posted to comp.lang.c.moderated with a request for help. The author claims two weeks experience of C. The intention of the program is to remove whitespace from the start and end of lines.
#include <stdio.h> #define MAX 1000 int rtrim(char s[]); int ltrim(char s[]); char s[MAX]; main() { int j; while((j = getline(s, MAX)) > 0) rtrim(s); ltrim(s); } int rtrim(char s[]) { int i; int j = getline(s , MAX); /*j-2 because the last two array values are null and \n*/ for (i = j - 2; s[i] == ' ' || s[i] == '\t'; i) { s[i] = '\0'; } return 0; } int ltrim(char s[]) { int i; for(i = 0; s[i] != '\0' ; i++) { if(s[i] == ' ' || s[i] == 't') s[i+1] = s[i]; } return 0; } /* This method will read next line into array s[] */ int getline (char s[], int lim) { int c, i; static char s[MAX]; for (i = 0; i<lim-1 && (c = getchar()) !=EOF && c != '\n'; ++i) { s[i] = c; } if (c == '\n') { s[i] = c; ++i; } s[i] = '\0'; return i; }
It was nice to see some new names in the latest submissions. While not all can win prizes, I hope that all get benefit from contributing. Once again I have a file without an included form of identification of the author. I have failed to resolve the issue even by searching my email archives. So whose is it? Because it deserves acknowledgement. I am delighted with the way some of you are experimenting with writing techniques, clearly ACCU members are way above the average.
A C++ Solution from James Holland <jamie_holland@lineone.net>
I did not have enough time to look at the student code in any depth but you might be interested in a C++ program that performs the required task, namely to remove leading and trailing white spaces from a string.
#include <iostream> int main (void){ std::string line; getline(std::cin, line); const char white_space[] = "\t "; // array of char counted as white space std::string::size_type start = line.find_first_not_of(white_space); std::string::size_type finish = line.find_last_not_of(white_space); if (start == std::string::npos) line = ""; else line = line.substr(start, finish-start+1); std::cout<< '*' << line << '*' << std::endl; }
The program uses the STL string class. This is a huge improvement over that provided by C. There is no need to worry about buffers overflowing, the user can enter a string as long as he or she likes. The buffer (named line in this case) will grow to accommodate any input. A whole host of member functions is provided for searching, inserting, replacing and extracting sub strings, see Stroustrup C++ Programming Language, 3rd Edition, chapter 20 - Strings.
Most of the work is done by three member functions. The first, find_first_not_of(), returns an index to the first character of the string after any leading white space. The second function, find_last_not_of(), returns an index to the last character of the string, before any trailing white space. The third function, substr(), extracts the desired text. The only slightly messy code is the if statement. This deals with the case where the string contains no non white space characters (it is either a null string or consists entirely of white space characters. Finally, the desired text is displayed flanked by asterisks (to confirm that there are no leading or trailing white spaces).
I think you will agree that the C++ version is shorter and more understandable than any C program and that the Standard Template Library is well worth investigating.
While we are on the subject of the Standard Template Library, I have another example of the use of the STL. I was interested in writing a program that would determine if a file name complied with a wildcard description, the one where '*' means a sequence of zero or more characters and where '?' stands for any single character. Devising an algorithm to perform this operation can be quite a tricky business. It was not until I came across a book by K. John Gough entitled Syntax Analysis and Software Tools, 1988, ISBN 0-201-18048-0, that I discovered a clear explanation of a method to perform this task. I have converted the original Pascal program to C++, making use of the STL. Section 4.4 of the book describes the algorithm but I shall briefly describe how in works.
The program works in two stages. The first stage is to take the wildcard description and convert it into a table representing a non-deterministic finite state automaton (NFSA). The form of the NFSA is quite simple; each state has a single advance character and possibly a self-loop on any character.
The second stage is to interpret the table. As the state machine is non-deterministic, we cannot tell whether to take the self-loop path or the advance path. The solution to this problem is to take both paths and keep track of all possible machine states as the filename is being analysed. If, after the entire filename has been scanned, one of the recorded states is the terminal state, then the filename must comply with the wildcard description.
I include the code for wildcard parser as it may be of some interest. I would welcome any comments.
#include <vector> #include <set> #include <iostream> class wildcard { private: struct record { char advance_character; bool self_loop; } r; std::vector<record> NFSA; public: wildcard(std::string = "*"); void expression(std::string); bool analyse(std::string); }; wildcard::wildcard(std::string s) { // Generate the NFSA table. expression(s); } void wildcard::expression(std::string s) { // Construct table. NFSA.clear(); r.self_loop = false; for(std::string::iterator i = s.begin(); i != s.end(); ++i) { if(*i == '*') r.self_loop = true; else { r.advance_character = *i; NFSA.push_back(r); r.self_loop = false; } } r.advance_character = '-'; NFSA.push_back(r); } bool wildcard::analyse(std::string s){ std::set<int> possible_states; std::set<int> successor_states; possible_states.insert(0); for(std::string::iterator i = s.begin(); i != s.end(); ++i) { successor_states.clear(); for(std::set<int>::iterator j = possible_states.begin(); j !== possible_states.end(); ++j) { if(NFSA[*j].self_loop) successor_states.insert(*j); if(NFSA[*j].advance_character == *i || NFSA[*j].advance_character == '?') successor_states.insert((*j)+1); } possible_states = successor_states; } // Return true if filename complies with // wildcard description. return (possible_states.find(NFSA.size()-1) != possible_states.end()); } int main(void){ wildcard w("*AB?Z*"); std::cout << w.analyse("AAKABZZ") << std::endl; // Complies. std::cout << w.analyse("AAKABZA") << std::endl; // Does not comply. w.expression("*A*B*C"); std::cout << w.analyse("AAKABZC") << std::endl; // Complies. std::cout << w.analyse("AAKABZZ") << std::endl; // Does not comply. }
My first thought is that this is an ambitious project for someone with two weeks' C experience, however it is not said whether this is two weeks' programming experience, or merely C. I will work from the assumption that the author has little prior programming experience, as this seems most likely.
Overall the style is good and consistent, but there are a few points I feel should be mentioned. The conventionally global variables have longer and more descriptive names than local ones. The reason is that globals can appear anywhere in the program, and so the name should make it clear what the variable is. Because of this, I would suggest calling the global array s something more descriptive, such as string. For the same reason I would change MAX to MAX_LEN, indicating that it is a maximum length.
Braces are used to group lines of code together, which are then treated as a block. A construct such as an if statement executes the block of code following it if the condition is met. A single line is also a block, so if the body of an if statement only has one line, braces are not required. For example, in the function getline the if statement needs braces around the next two lines, because together they form the block that is executed if the condition is met. However, the for statement before it does not need braces because there is only one line in the block. It then looks like
for (i=0; i<lim-1 && (c = getchar()) != EOF && c != '\n'; ++i) s[i] = c;
The program produces no output, so it is impossible to tell whether it is working or not. A printf at the end of main will indicate what the global string s contains.
There are function prototypes for functions rtrim and ltrim, but not for getline. There should be one for getline also.
Now I have covered the few stylistic issues, I will move on to the actual program itself. The program compiles with one warning, but when it is run it does not work.
The function main is defined with no explicit return type. In this situation the return type is assumed to be int, however there is no return at the end of main. Some people make main return void so a return value is not required, however this is the wrong thing to do. main should always return an int, and if the program completes with no errors it is usual to return 0.
The function getline has serious problems. There are three arrays of 1000 chars called s that could conceivably be in scope; which is the one being used? The three arrays are the global s, the parameter s, and the static s local to getline. In fact, it is the local s that is in scope; within the function getline the global s cannot be seen, and neither can the parameter s. The warning generated by the compiler is "declaration of 's' shadows a parameter," meaning the local s stops the parameter s being seen. I think the author intends all these arrays to be one single array, and does not realise that a global variable can be used anywhere in a program. There is no need to pass s to getline, as it can access it anyway. If the global s had been declared as local to main instead of global, then it would have to be passed to getline. There is also no need to declare another s local to getline.
Whether to have one global array, or to have an array local to main which is then passed to functions, is a question requiring some thought. It is often best to avoid using global variables, and to use only local variables. Doing this is called information hiding, and it helps to protect data. If a variable is global and any function can access it, it is more likely that it will become corrupted than if the variable is local and functions can only access a copy of it. Making an array local brings other issues, however. An array is a big structure to pass to a function, and my instinct is to pass a pointer to the array rather than the array itself. But the author only has two weeks' of C experience, and is unlikely to have met a pointer yet, never mind understand them. Leaving the array global will avoid dealing with pointers, and this is what I advise for a novice C programmer. Then getline only requires one parameter, and ltrim and rtrim take no parameters. Indeed, getline does not even require one parameter, as it is simply MAX which can be used directly in getline.
Looking back at main, there is a problem with the call to getline. The function getline gets a line of characters from stdin and stores them in the global array s. It also returns the number of chars in the string discounting the null byte. In main, a variable j is used to store the return value, which is tested to see if it is greater than zero. While it is, rtrim is called. Each time the while condition is evaluated, a new line is read from stdin. It is only after the while loop exits that ltrim is called to trim the other end of the string. I am not sure why the while loop is used in this way. If the author wants the program to continue reading and trimming lines until there are no more lines, then the call to ltrim should also be in the body of the while. Also, j should be compared against 1, not zero, as there will always be a newline character by itself on the last line if lines are being read from the keyboard.
rtrim also has a problem with a call to getline. Again a variable j is used to store the return value of getline, which is the number of chars in the string s excluding the null byte. But this call to getline reads a new line from stdin, not the line that was read in main. The length of the string can easily be passed to rtrim from main, as it has already been calculated. This saves repeating the call to getline, or even a call to a library function such as strlen. The comment following the call to getline is misleading, as is says the null byte is included in the count of characters, when really it is not. Therefore I would change the comment and also the following code, as
i = j - 2;
should really be
i = j - 1;
in the for loop. This function would be better if it scanned through the string from the end until it found the first non-whitespace character, when it would then write a newline and a null byte. I suggest using the library function isspace to determine if a character is whitespace, as this includes all whitespace characters, not just space and tab. This could be done by
void rtrim(int len) { int i; for (i = len-1; isspace(s[i]); --i) ; /* i is now the position of the first non-whitespace char */ s[i+1] = '\n'; s[i+2] = '\0'; }
The functions rtrim and ltrim are defined as returning an int, but they always return 0 and the return value is never checked. It would be better to return void, or to return an integer that had some meaning, such as the number of characters removed. In this case I do not see any benefit in returning an int, and so returning void is best.
ltrim is a strange function, and I am unsure of exactly what the author's intention was. It steps through the array from s[0] up, and if it finds a space or a 't' at s[i] it copies it to s[i+1]. I assume the t is a typing mistake, and '\t' is what is really meant. What is really required is a function that scans through the string until it finds the first non-whitespace character, and then moves all characters along the array to overwrite the whitespace. The new ltrim is then
void ltrim(void) { int i, count = 0; for (i = 0; isspace(s[i]); ++i) ++count; if (count > 0){ for (i = count; s[i] != '\0'; ++i) s[i-count] = s[i]; s[i-count] = '\0'; } }
As a final point, the comment before the getline function describes getline as a method. Methods are applicable in object-oriented languages such as Java; in a language like C they are called functions.
My version of the program is as follows.
#include <stdio.h> #include <ctype.h> #define MAX_LEN 1000 void rtrim(int); void ltrim(void); int getline(void); char string[MAX_LEN]; int main(void){ int j; while ((j = getline()) > 1) { rtrim(j); ltrim(); printf("%s", string); } return 0; } void rtrim(int len) { int i; /* len-1 because string[len] is a '\n' */ for(i = len-1; isspace(string[i]); --i) ; /* i now indexes the first non-whitespace */ string[i+1] = '\n'; string[i+2] = '\0'; } void ltrim(void) { int i, count = 0; for(i = 0; isspace(string[i]); ++i) ++count; /* count is now the number of spaces at beginning of string */ if (count > 0) { /* if count == 0 there's nothing to move */ for (i = count; string[i] != '\0'; ++i) string[i-count] = string[i]; string[i-count] = '\0'; } } int getline(void){ int i, c; for(i = 0; (i < MAX_LEN-1) && ((c = getchar())!=EOF)&&(c != '\n'); ++i) string[i] = c; if(c == '\n') { string[i] = c; ++i; } string[i] = '\0'; return i; }Note that the choice of string as a variable name is unwise. Quite apart from the problem that might arise in C++, all variables starting with 'str' are reserved to implementers by the C Standard. FG
Hot through the post and I cannot resist so here goes...
During the article I will refer to the line number of the code, as laid out in CVu12:6, they range from 1 to 47. (may not exactly match layout in this issue. FG)
The current design breaks the problem down into four logical blocks.
-
A top level loop in main()
-
A function for collecting a line of raw input, getline().
-
A function for stripping out leading white space, ltrim().
-
A function for stripping out trailing white space, rtrim().
This arrangement makes the intent of the code clear and Francis' comments unnecessary. The code is self-documenting but you have wisely added comments (lines 16,32) where your intention may not be clear.
You have included the library stdio, so that you will not need to write your own function for getting characters.
You have moved the 'magic' value MAX into a #define so that it may be easily altered. It also has the added benefit of making your code easier to read. Being in uppercase it is clearly a pre-processor value.
However the code does not work as you intended it to. I list the problems not as a means of correcting the code but as an evaluation of how to improve the code:
-
In main() ltrim() (line 8) will not be called during the while-loop. Only on receiving a zero length line will the while-loop terminate and ltrim() be called. This could be solved by moving ltrim() up inside the while(){...} braces.
-
I mention for completion that once you have the trimmed line you do nothing with it. If you wished to check the result of trimming you could add a printf(), along with ltrim() and rtrim() inside the while-loop braces.
-
During the rtrim() you call getline() (line 15). This will overwrite your previously fetched line. If you wished rtrim() to use the line and count fetched in main() you will need to pass them both as parameter. You currently only pass s[] as a parameter and then overwrite it by calling getline() a second time, thus you will miss alternate lines of input.
-
There is a typo in rtrim()'s for-loop (line 19). The i- will not decrement the value of i (nor compile). Errors like these can be very hard to find, although compilers can spot some of them and it is worth learning and using the idiomatic forms and constructs of 'C' viz. for(i=0;i<iterations; i++)
-
If the line is traversed by rtrim()'s for-loop (line 18) it will fail under certain conditions, such as lines of length one or lines of all ' '. If the loop conditions start to become complex it is time to re think the algorithm. In this instance you might use for(i=j;i>=0;i--) (making sure i can take negative numbers!) and then test for values of s[i] inside the body of the for-loop. You can always use break if you no longer need to iterate down the line. The extra cost of starting at the end, even though you know the first two characters are whitespace, will make the function robust and easier to code. If speed is an issue optimise then and only then.
-
The function ltrim() uses a simple for-loop and there is a clear conditional test within the loop (line 27). Apart from the typo testing for 't' instead of '\t', there is a serious problem. The conditional statement that is carried out (line 28) does not do as you had hoped. On discovery of a ' ' or '\t' the character will be copied into the next character position. On the next iteration of the loop this 'next character' will also be copied forward and so on infinitum or at least until the operating system complains! If the for-loop had chosen a clear range of values to iterate over the problem would not have been so severe, the algorithm would have been equally wrong but we would have an output and 'hopefully' would spot the mistake and fix it.
-
In function getline() you have found yourself with two variable called s. The first one is passed as a parameter (line 34) and the second is defined locally (line 36) [in fact there is a third global s defined on line 5]. As the code stands it will not compile but if braces were added to make the new definition of s reside in an inner scope, characters would be read into your local s storage but this is hidden from other functions. When a variable is defined with the same name as another it hides the previous definition. By removing (line 36) static char s[MAX]; the code will use the parameter s passed to it and not its own local storage with the same name.
-
The getline()'s for-loop seems to work but will over run the memory set aside for s if a MAX length line is reached before '\n'. Using the idiom for(i=0;i<MAX;i++) loop will help guard against over run and the value of i can be checked outside of the loop to see if extra termination is needed. In testing the code it is always important to test boundary conditions. By using #define MAX 4 in a test build, boundary testing of lines of length 3, 4 and 5 becomes feasible, however in this case it may not bring to light the over run and careful coding is necessary.
-
The compiler will pass you warnings and errors for several reasons including that main() does not return an int and that the compiler is guessing the function interface for getline(). Treat the compiler as your friend in finding typos and mistakes!
The overall splitting up of the tasks into functions is correct. There are currently three levels of hierarchy, main() at the top, which uses ltrim(), rtrim() and getline(), which in turn uses getchar().
To make the structure clearer it may be worth moving the relationships between the functions around. main() at the top, which will use gettrimmedline(), which will use getrawline(), rtrim() and ltrim(), which in turn use other low level functions. I have an eye on wanting to keep the getrawline() function separate, it will give me flexibility in the future and it seems a clean separation of specialisation.
I also need to consider who is responsible for creating data elements and how they should be passed up and down the hierarchy of functions. Presently s is defined in full view of all functions and then is passed as a parameter s, which hides its global name. A more flexible approach is to tell a called function what it needs to know and no more. For example I may wish to fetch several trimmed lines into an array of buffers.
The use of getchar() makes coding much easier, bugs fewer and the code portable. We can get leverage by using other library functions such as isalnum() or iscntrl() found in ctype.h for none whitespace characters and fgets() in stdio.h for fetching lines of input.
If the functionality we require is not available in a library, such as testing for either ' ' or '\t', it is worth writing our own function similar to those available. This will allow us to easily make changes latter.
I have used array manipulations, as in the original instead of using pointers, which would be preferred. There are two places where pointers have been used, in printf() and strlen() where the values passed are the starting addresses of the string to be printed and counted respectively.
I also use the library function fgets() to read in a line. This does everything required and adds the flexibility that allows us to input data from a file should we need to. Currently stdin (the keyboard) is passed as a parameter.
I have placed the testing of whether a character is to be trimmed or not into validCharacter(). If I need to make a change to the characters to be trimmed, the change will now only be in one place.
I have used a do-while-loop in main(). I tend to shy away from do-while-loops as the conditional execution of the loop is only tested for after the loop has been executed at least once. This however it is necessary here and the use of any other loop will make the code less clear.
All loops have been constructed with clear start and end points to prevent over run. During the loop, if all necessary work is completed, a break statement is used to terminate the loop.
#include <stdio.h> #include <ctype.h> #include <string.h> #define BUFFERSIZE (1000) int getTrimmedLine(char buf[], int buflen, FILE* inputstream); void rightTrim(char line[]); int leftTrim(char line[]); int validCharacter(char c); int main(){ char buffer[BUFFERSIZE]; int offset = 0; do{ offset = getTrimmedLine(buffer, BUFFERSIZE, stdin); printf("Trimmed line is:'%s'\n", &(buffer[offset])); } while(buffer[offset] != '\0'); /* loop until an empty line is found */ return 0; } int getTrimmedLine(char buf[], int buflen, FILE* inputstream){ int leftoffset = 0; /* input lines must end in newline */ if( fgets(buf,buflen,inputstream) ){ /* buf will be 0 terminated */ rightTrim(buf); leftoffset = leftTrim(buf); } else buf[leftoffset] = '\0'; /* failed, terminate at 1st character */ } return leftoffset; } void rightTrim(char line[]){ int linelength = strlen(line); /* line is also of type char* */ int lastchar = linelength-1; int i; for(i=0;i<linelength;i++){ /* search restricted to the line */ if(validCharacter(line[lastchar-i])){ /* this is the last valid char */ break; } else line[lastchar-i] = '\0'; } } } int leftTrim(char line[]){ int i = 0; while(line[i] != '\0'){ /* line is always 0 terminated */ if(validCharacter(line[i])) break; /* this is the new first char */ else i++; /* skip this character */ } } return i; } int validCharacter(char c){ int valid = 1; /* default valid */ if((c==' ')||(c=='\n')||(c=='\t'))valid = 0; return valid; }
As I was having some trouble completing my assignment, I asked my instructor if I could get help elsewhere. She kindly provided me with an address, and a letter of introduction to Master Dicken, whom she said was an expert in the field. So I travelled to his offices and found his door. I knocked, and as I heard him say enter, I did.
I offered him the letter, but he waved me away, saying "Go on up and see the Old Fellow." He gave me some directions and then went back to work, calling after me to make sure that I didn't actually call him 'Old Fellow'.
The directions took me up two levels and down a long corridor where I saw an open door. I entered and cleared my throat, for there sitting in a chair was an elderly gentleman who appeared to be sleeping. At the sound, he gave a sudden start and seemed to come awake. "Ah, you're bound to be the youngster Master Dicken called me about."
I replied that I had just come from Master Dicken's office. I tried to take a quick look around the office to get a sense of the person I was about to talk to, and he seemed to regard me with equal curiosity. In his bookcases, sitting on a shelf separate from the books that seemed mostly related to programming, I noticed some books with titles like The Historian's Guide to Celtic Coinage and A Numismatic History of the Pacific Island Kingdoms. I decided to try to break the ice, "Master, do you collect coins?"
He smiled at me, rubbed his chin, and told me, firstly, that he was not a master, but rather an apprentice. Then he said, "No, I don't collect coins. Collectors categorize and classify things and I collect categories and classifications."
I was somewhat shocked to find that someone of his years was still an apprentice. As I thought about that, I realized that the second part of his answer seemed to beg the question of how he would classify and categorize his own collection of categories and classifications. I didn't know how to respond, and so I remained silent.
He reached out his hand and said, "Well, let's see what kind of program you've brought for us to look at today." I handed him a listing and a disk. He stood up to take them from me, and muttering something to himself about 'at least there aren't any gotos,' he motioned for me to take a seat at his table.
"This is C code? Yes I can see it is. I'll need to locate a compiler. I don't really program in C anymore and haven't for several years. But I suppose that we can give it a look."
He sat down next to me in front of his terminal and began a search for a compiler that he could use. As he was looking, he asked me questions.
How long had I been programming? For about two weeks.
What was the code supposed to do? Remove leading and trailing white space from each line of input.
Was I supposed to output the results anywhere or just do the processing? Well... I suppose that I was going to have to print the results.
What did I want him to do? This question stopped me. I didn't really want him to rewrite the code for me. That wouldn't be right. I told him that I wanted some pointers on what I was doing wrong and how to write this kind of program.
He smiled and said that was just about right, and that we should talk just a little bit about programming. "Lots of programs follow a simple form,"
while(input) begin do something with the input. output results end.
"Your program follows this form. A lot of programming assignments that are given to beginners follow this form. But to begin with, while writing this kind of program, its best to start with a second form, like this,"
while(input) begin output what was input end.
I suggested that this didn't look like C code to me, and he agreed that this wasn't C and asked if I could follow the logic of what he was asking. I answered in the affirmative and told him that I wasn't sure why this second form was better than the first.
"It's best to start off slow when you're coding. Of course, it's good to know what you want to accomplish, but better to try to do only a part of it at first, then make a little more of it work, then a little more, until you are done. It's a mistake to try to get everything to all work at once. Sometimes the input itself can be daunting. Just reading in a line and outputting it can be difficult for a beginner like yourself. Then also, it will give you an opportunity to use printf. And you can learn something, a really simple technique for testing such code. Surround the output, just for testing with some character that will show you where the input line really begins and ends. So that if your input is:"
This is a line.\n
"Your output might look like:"
*This is a line.*
"Wait just a moment, what happened to the \n?"
"Ah, the \n, well, as I'm sure you know, the \n is the escape sequence for a new line. And a new line is considered to be white space. If your goal is to remove white space from both ends of your lines, then new lines will have to be removed as well. But I guess that before you do the 'do something with the input' step, your output will look more like:"
*This is a line. *
He paused to let me consider this for a moment, and then I asked, "Do you think that my getline method will work as I've written it?"
"Under the conditions that you'll use it, most of the time, but you ought to test it, using the second form. There are a few things that you might want to consider. What you want the function to do if it encounters a line that is longer than MAX? And what if it finds the last line of the file you are reading has no new line at the end of it, only an EOF? But, go ahead and write the program using the second form and run it using a test file. We'll discuss what the test file should look like a little bit. There should be at least a few empty lines, that is a line that has only a new line, a few lines with no white space at beginning but white space at the end, some with no white space at the end, but some at the beginning and some at both ends. At least one line should be one character long. And what is MAX? Hmmm.... 1000, that's pretty reasonable for now. But for testing you may want to set it to 10 or 20 and have at least one line that is MAX-1 characters long, one that is MAX characters and one that is MAX+1 characters. Just so you get an idea as to how the program will behave for those cases. It's a real advantage to have defined MAX. That way you can easily change it. You may also want to check how your program works with an empty file."
I took notes furiously as he spoke, but he seemed to know when he had gotten too far ahead of me and either paused or slowed down.
"Do you know how to use printf? I see that you've had at least some experience with the C library, because you've used getchar. Well, I've plenty of books and the compiler we're using has, lucky for us, an integrated environment with online help. Although, I don't know what compiler you're using when you're not here. You'll certainly want to always have some kind of reference material available when you're coding. It's very important."
He sighed and for a moment seemed caught up in concentrating on something else. Then he started talking again.
"Best for you to type up, try to compile and run what we've just discussed. I'll watch, but I won't interfere. Ask questions if you've any."
And then he was silent and just watching me. I began to type first my test file, which he looked at and seemed to give a nod of approval to, and then my program.
main() { int j; while((j = getline(s,MAX)) > 0) { printf("*%s*\n",s); } }
So far so good, but the Old Fellow cleared his throat and made a suggestion. "You may be better off putting the getline() function, and it is a function, not a method, above the main function, that way you won't have to put in a prototype of the function, but learning about prototypes is important. Knowing prototypes will help you when you've larger projects and header files. But for now it may be easier to put the function above main. I guess also, that now is the time to learn how to write a main function. Like so:"
int main() { /* this is where the second form goes in this example */ return 0; }
"main is a function," he continued, "and always has a return type of int. And you should always return a value. I guess that you should #include <stdlib.h> and return EXIT_SUCCESS;, but lots of people are happy with return 0; and for now you can be too. But then... Maybe this isn't really the time to discuss what that value should be.
So I rewrote the code like this:
#include <stdio.h> #define MAX 20 static char s[MAX]; int getline(char s[], int lim) { int c, i; static char s[MAX]; for(i=0; i<lim-1 && (c = getchar())!= EOF && c != '\n'; ++i) s[i] =c; if(c == '\n') { s[i] = c; i++; } s[i] = '\0'; return i; } int main() { int j; while((j = getline(s,MAX)) > 0) { printf("*%s*\n",s); } return 0; }
Then he made another suggestion. "There is a third, and even simpler form, maybe you should use this form the first time you run your program,"
input one line. output the line that you just got from input.
"Just to make sure that you can actually read a single line. And you may want to print the value of 'j', just to make sure it's something that you expect. You might also want to make sure that you use a consistent method of indenting your code. I see that you haven't done that in the copy that you gave me."
So, after consulting the documentation for printf again, I rewrote the main function to look like this:
int main() { int j; j = getline(s,MAX); printf("d *%s*\n"j, s); return 0; }
Then I tried to compile the code. Of course I got several errors and warnings. I always have a problem with these, but the Old Fellow had some advice.
"Not all the warnings and messages are interesting. You'll have to practice with your compiler to find ones that you need to fix. Sometimes fixing one will make a dozen other ones go away. Or sometimes, fixing one will make a hundred more appear. If you can, turn the warning or error level up as high as is practical. The stricter you are about writing code that has no compilation errors or warnings, the better off you'll be. Also," and he turned to fix his gaze on me, "it's useful to get your code to at least compile before you ask someone to look at it." He sighed, and then, "Well, I suppose that you will just have to fix it now. It'll be good practice for you. I'll just sit here and watch. I won't interfere."
Then he leaned back and pulled a book out from somewhere under the table and opened it and started to read. I could just make out the title, An Horologists Reference to Collectors and Collections of Old World Movements of the Nineteenth Century. He whispered something to himself about primitive computing devices as I got to work. When I was able to compile without errors, my code looked like this:
#include <stdio.h> #define MAX 20 static char s[MAX]; int getline(char s[], int lim) { int c, i; /* this line redefined the parameter named s * static char s[MAX] * and the next line had an extra semicolon * after the lim-1. that was illegal */ for(i=0; i<lim-1 && (c = getchar()) != EOF && c != '\n'; ++i) { s[i] = c; } if(c == '\n') { s[i] = c; i++; } s[i] = '\0'; return i; } int main() { int j; j = getline(s,MAX); printf("%d *%s*\n",j,s); return 0; }
He made me create a test file with one line in it, and after I ran the program, my output looked like this:
7 *a line *
"Very well," he said reviewing my code, "I think that there are a few things that you could do to improve readability, but for now this is ok. You could have, for example, written s[i++] = '\n'; but we'll leave that for now. And there are other things, but I suppose that this isn't the time for a discussion of the aesthetics of coding. For now just try to make your main function follow the second form that we discussed."
I did as he requested, and tested the program using my test file. One of my lines was longer than MAX characters and this line was produced as several lines of output. The Old Fellow saw this and held one hand up to rub his chin, letting the book that he had been reading fall away. "Problem?" He asked. "Well, maybe not, we can set MAX back to 1000 and that will go away. Although 1000 is a fairly small number when we get right down to it. I believe that the C standard may require a minimum of 4095 characters per line of code. So 4096 would be a good minimum value for MAX. It's safe enough for what you're trying to do. Lines can be longer than that, and of course we wouldn't want to have these kinds of problems in production code. If that were the case we'd want to handle this whole thing differently. But..." He trailed off for a moment. "But, well, if this was production code we'd have a whole other set of things to worry about; like disk space and memory budgets." He fixed a stern gaze on me. "But this is just a learning experience. Not for production. Make sure that you can always tell the difference."
He sat and stared off into space for a few seconds, and then he turned to me. "You've tested that part of your code, and I guess that you're pretty convinced that it works. Now we can safely move on to trying your left trim function. I guess that's what ltrim stands for? I didn't see you document that.
I added my ltrim function to the code that I had already written and asked him if I should use the first or third form for testing it. His response was only to look at me and to nod and say "Yes, one of those." So it was up to me to choose. But before I could start typing he told me he had another slight variant of the third form.
get a line of input. print the line. change the line in some way. print the line with the changes.
"That way," he told me, "you can see that you've read in the line correctly and see if you've changed it the way you want." He stopped, was silent for a moment. and then he continued. "Of course, if you're going to keep using that printf statement you might want to consider a little function that you can use to print out the line. Maybe something like this:"
void p(int i, const char s[]) { printf("%d *%s*\n", i, s); }
He just wrote this out for me in long hand and said, "I haven't tested this. It may save you some typing. Also, anytime you notice that you are duplicating code you'll want to think about some kind of function that will allow you to write that part of the code once. After all, how many times do you want to write or debug the same code? And if this was code that you were writing for a customer who might want to read it, think of how much easier it would be to read something like p(j,s); and know that you were calling that function." He looked at the expression on my face and laughed. "I know that you don't see the point of it for something so simple as this printf call. And of course 'p' is a simply ridiculous name for a function. That's ok, we'll return to this in a little while. Now let's take a look at your ltrim function."
int ltrim(char s[]){ int i; for(i = 0; s[i] != '\0' ; i++) { if(s[i] == ' ' || s[i] == 't') s[i+1] = s[i]; } return 0; }
"It seems pointless to always return a zero from a function. main is an exception, because sometimes we won't want it to return a zero. Maybe you should make the return type for ltrim void? 't' is just a 't', not a tab character, '\t' is the tab character. And s[i+1]=s[i]; will just move a character that's a space or a 't' to the character right after it..." He trailed off. "I think that you should run this and see what happens."
I did run it, and the program aborted. "Good thing we have a debugger," he said. "But you'd be better off printing out some values from the loop. Before you do that, can you write for me what you want ltrim to do? Not in C, but in English"
As I started to write I heard him say, "Sometimes better to cut your losses and start all over." After thinking about it, this is what I wrote:
find the first non space character in the line. move that character to the first position in the line. move all the following characters in the line to following positions. when you've done the last character of the line, return.
He asked me some questions.
So you're not returning a value, like some integer? No.
Then what kind of return type should this function have? void.
Good. How will you find the first character in the line that isn't white space? for loop.
How will you move the characters as you've described? Another for loop.
Good again. Ok, maybe you should write just the part for finding the first non space character. Try it.
void ltrim(char s[]){ int i; int from; for(i = 0; s[i] != '\t' && s[i] != '\n' && s[i] != ' ' && s[i] != '\0' ; i++) { from = i; } return; }
"Not bad," he said, and he nodded at something. "I think that you ought to take a look at isspace. Remember what I said about not duplicating code. You might have to duplicate some of the test for what is white space when you write rtrim. And there are a couple of other issues. For one thing, what is white space? You haven't included all the white space characters in your condition. isspace will answer that question for you in a consistent way, and for another, in a portable way. That way everyone reading your code will know what you meant to do when you use isspace to test for white space. You could write your own function, if you wanted to, but there's no need to duplicate what's already been done."
I rewrote the code again to look like this:
void ltrim(char s[]) { int i; int from; for(i=0; isspace(s[i]) && s[i]!='\0'; i++){ from = i; } return; }
"Yes, and it's a little more readable that way too. Long conditions can be very difficult. I'd put the s[i] before the isspace(s[i]) in the condition. You also don't have to write s[i] != '\0', just s[i] will do. Don't forget to #include <ctype.h>", he said. "It's taking shape. You should initialize the from variable to zero, and in the loop set it to i+1. You'll see why as you test the code and print out debugging information. What's next?"
"Another loop to copy the data?"
"Yes, you can write a loop to do that, but you may want to read about the strcpy or memcpy functions. And find out how they're implemented. But don't use them in this case, since the source and destination of the copy may have some overlap. For now, just write the loop the way you think it should be written."
I wrote the code as he suggested:
void ltrim(char s[]){ int i,j; int from = 0; for(i = 0; s[i] && isspace(s[i]); i++) { /* from will be the index of the first non * white space character */ from = i+1; } /* start j at the next char could be \0 */ i = 0; for(j = from; s[j]; j++) { s[i] = s[j]; i++; } s[i] = '\0'; return; }
It did take me a few tries before I got it right.
"A good start," said the Old Fellow. "You should add a little more documentation about how this works. Why is the from variable initialized to zero. What the expected inputs are. How will s have been changed when we return. And from might not really be a great name for a variable. Maybe firstns would be better? What do you think, better or worse? Besides which, you might want to replace that last loop with this:"
int to=0; while(s[to++] = s[from++]) ;
"Oh, yes of course, the int to=0; will certainly go up at the top of your function. And the loop, just before the return, replacing the second for loop, and that return isn't really needed. If we do this, then from would probably be an ok name for the variable. The loop condition may look confusing. But remember that the loop will finish when the value of s[from] is zero. That is, at the end of the string. And you don't really need that s[i] in the condition of your for loop.
I had to stop for a moment and look at this. Then I changed my code:
void ltrim(char s[]) { /* ltrim, left trim. On entry s contains a * string that may have leading white * space.On exit the leading white space is * removed and the string is left justified * in s. * from will get the value of the index of * the first non white space character. * I.e. where we will copy from.to will always * start off as the index to the first * character in the array. I.e., where * we will copy to. */ int from = 0; int to = 0; int i; /* get the index for first non whitespace */ for(i=0; isspace(s[i]); i++) from = i+1; /* copy it to the start of the array*/ while(s[to++] = s[from++]) ; }
I reran my test case, it worked. "What is the single semicolon for at the end of the while loop?"
"A null statement. Sometimes it's called an empty statement. And now I suppose that you'll want to write rtrim? Well then, I think that would be a fairly simple thing that you'll be able to do on your own. But before you go, I want you to look at another small trace function similar to the one I showed you, like so:"
void p(const char *msg, int j, const char *s) { int length; int i; char c; length = strlen(s); printf("%s %d ",msg,j); for(i=0; i<length; i++) { c = s[i]; if(isspace(c)) printf("\\w"); else printf("%c",c); } printf("\n"); }
"I'd also like you to look at a way of writing rtrim:"
void rtrim(char s[]) { reverse(s); ltrim(s); reverse(s); }
" I caution you that this probably isn't a good way to write rtrim, because... Hmm... Well, next time you're here you can tell me what you think of my second version of function p, and what's wrong with the version of rtrim that I wrote. I'd also like you to review the library functions that manipulate strings. Pay particular care to the functions that do searches, like strspn. And in particular, I'd like you to tell me why we didn't use strspn in this exercise. You can tell me why we didn't use pointers too... No, wait, leave the pointers for another time. For now, I'm sure Master Dicken would be pleased if you would see him before you leave. He'll want to know if you've spent your time here profitably. Don't you think?"
I could only nod, and ask what I should've asked when I had entered his room, "How should I address you?"
He smiled and rubbed his chin. "Yes, a good question. It's important to ask good questions. Many of the youngsters who come here never do learn to do that." He paused and smiled again, "'Old Fellow' doesn't seem quite right, does it? And you seem to have hit on an important lesson to learn. First things should always come first. Well, make sure you close the door on your way out."
It seemed that an answer wouldn't be forthcoming, and as the Old Fellow seemed about to doze again, I closed the door as quietly as I could and went down to report to Master Dicken.
The Waite Group's Essential Guide to ANSI C. 1988, Howard W. Sams & Company.
Software Tools. 1976, Addison-Wesley. Kernighan, & Plauger
The C Programming Language Second Edition. 1988, Prentice Hall. Brian W. Kernighan,, Dennis M. Ritchie
The Standard C Library. 1992, Prentice Hall. P. J. Plauger
Choosing a winner has been very difficult. The C++ submission was intended as a competition entry, but I included it in this column because it has a number of interesting features. Victoria's is an excellent entry from a newcomer to these pages. I was sorely tempted to award her the prize to encourage her. The anonymous entry is a lot of fun, and I hope that we will see much more creative writing from his keyboard and (from others). In the end I decided that Robert deserved to win for a succinct and comprehensive entry. If he contacts me we can discuss his prize.
Set by Francis Glassborow
Entries by March 7th
This program is intended to be a simple calculator in C. It compiles but when run it does not collect a second number after the operator symbol has been entered. Again, this is a raw novices code in C. Apart from identifying the cause of the immediate problem comment on the approach and suggests better mechanisms to achieve the objective. However keep it simple, so no function pointers please.
#include <stdio.h> int main() { double num1, num2, total; char operation; printf("Enter the first number\n"); scanf("%e", &num1); printf("enter operation\n"); scanf("%c", &operation); printf("enter second number\n"); scanf("%e", &num2); switch(operation) { case '+': total = num1 + num2; printf("%e\n",total); getch(); break; case '-': total = num1 - num2; printf("%e\n",total); getch(); break; case '*': total = num1*num2; printf("%e\n",total); getch(); break; case '/': if (num2 == 0); printf("ERROR: DIVIDE BY ZERO!!\n"); getch(); if(num2 != 0); total = num1 / num2; printf("%e\n",total); getch(); break; default: printf("ERROR\n"); getch(); break; } }Apologies for the multistatemnt lines, it is a matter of space. FG
Notes:
More fields may be available via dynamicdata ..