Title: Anonymous Code Exercise Rewrite

Author:

Date: 06 October 1999 13:15:33 +01:00 or Wed, 06 October 1999 13:15:33 +01:00

Summary:

Body:

(This is based on the code presented on page page 36 of C Vu 11.5.

The program should take a file whose lines are terminated with carriage returns ('\r') and display/print it to stdout with the carriage return replaced by a newline ('\n'). These files occur on Unix platforms and also on Apple MacIntosh.

I have attempted to include as much error checking as possible while retaining the original program structure. My code is written in traditional C, not C++.

The principle has been to check return codes when they are available. In real life I have seen problems where a failure to check all return codes has caused much greater problems. One example was a Korn Shell script which performed a backup on a Unix box. Every return code was checked, except for the one coming back from the "mount" instruction which mounted the destination file system. On one occasion the destination file system was not available and the temporary area was used by default. The temporary area filled and a number of critical functions failed to run overnight as they also required temporary file space. The moral of the story is that return codes are put there for a purpose.

I have identified one case where the program will fail. If the file name contains spaces, and it is not enclosed in quotation marks, then the program will take the argument to be the text up to the first white space character. I have hit this problem in real life with command line utilities on Windows NT.

In performing the error checking I have tried to cater for worst-case failures. I have considered that both stdout and stderr may have been redirected and so might have problems with output. In the event of a failure to write a character to stdout we try to write a message to stderr. If we cannot write a message to stderr, then we are in trouble and we call the abort() function to clean up. The possibility of more general character translation is allowed for with the structure as presented. If we had to translate more than a few characters I would have used a select/case construct.

I haven't been able to identify any areas where the program is non-conformant - but then I don't have a copy of the C standard. I have checked the library functions in the documentation and it passes through lint with the exception of a very strange comment about the enum and variable types.

Catriona

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
        /*  Re-write of Exercise on Page 36 of CVu 11.5 (July 1999).
          * Program will read a file and put the output to stdout replacing  line returns with newlines.
          * Catriona Siobhan O'Connell 10 July 1999   */
        int ch;
        int newch;
        FILE *infile;
        
        enum ERROR_CODE {ERROR_SUCCESS,
                                        ERROR_MISSING_ARGUMENT,
                                        ERROR_OPEN_INPUT,
                                        ERROR_OUTPUT,
                                        ERROR_CLOSE_INPUT};

I have three comments to make on the above. (I have adopted italic arial narrow to identify me. The Harpist) First, I generally think it is a poor idea to declare an enum at block scope. I guess that is only a personal preference, but I think it indicates a style of programming where too much is placed in a single function.

Second, I think there is very little purpose served by naming this kind of enum. You do not intend to have variables of this type, indeed it is probably a mistake to have more than a single variable of this kind. Consider the effect of having a global enum and using that to provide error returns to various functions that can fail fatally.

Thirdly, and by far the most important, never use identifiers that lack any lower case letters unless they are pre-processor ones. And never use lower case letters in pre-processor identifiers. That way you can reduce the chances of an identifier being replaced by the pre-processor. Consider having the following defined at file scope:

enum { success, missingargument, inputfailure, outputfailure, fileclosefailure};

Note that the enumerated values are more descriptive and I have avoided that very ugly ERROR_SUCCESS. I am flirting with the idea of not using uppercase and underscores in my identifiers. If you really like this upper case style use the traditional C #defines.

 /*  Check if exactly one argument has been passed to the program else raise an error and exit */
         
        if (argc != 2) {
                if (EOF == fprintf(stderr,"No file name given.\n")) abort();

Did you check the spec for the return value of fprintf? EOF does not come into it. You must write:

                if (fprintf(stderr,"No file name given.\n")<0) abort();

because fprintf is only required to return an unspecified negative value if it fails. There are several places where you made this mistake. If you want to say that the Standard C Library is insistent, I agree but we have to work with what we have.

            if (EOF == fprintf(stderr,"Syntax: %s filename\n",argv[0])) abort();
                exit(ERROR_MISSING_ARGUMENT);
        }

Why did you not detect too many parameters? That would help with the whitespace problem.

 /*  Try to open the argument as a file.  If this fails then raise an error and exit.  */
         
        if ((infile = fopen(argv[1], "rb")) == NULL) {

I am coming to prefer this line written this way:

                     if (!(infile = fopen(argv[1], "rb"))) {

because it clearly say what it means 'if not infile...'

          if (EOF == fprintf(stderr,"Cannot open file %s\n",argv[1])) abort();
                exit(ERROR_OPEN_INPUT);
        }
        
        /*  Read each character from the file.
          * Perform any required character transformations and  then output to stdout.
          * If the output to stdout fails then an error will be raised and the  program will exit.  */
         
        while (EOF != (ch = getc(infile))) {
                if (ch == '\r') newch = '\n';
                else  newch = ch;

I am not sure why you did not write:

       while(EOF !=(newch = getc(infile)))     if (newch == '\r') newch = '\n';

why the apparently purposeless intermediate variable?

           if (EOF == putchar(newch)) {
                        if (EOF == fprintf(stderr,"Problem writing to stdout.\n")) abort();
                        exit(ERROR_OUTPUT);
                }
        }
        
        /*  Close the input file. 
           * If this fails, we exit the program and allow the system to attempt  the clean-up process. */
         

        if (EOF == fclose(infile)) {
                if (EOF == fprintf(stderr,"Problem closing the input file.\n")) abort();
                exit(ERROR_CLOSE_INPUT);
        }
        
        /* Despite everything which could have gone wrong we have reached the  end of the file successfully.  */
        return(ERROR_SUCCESS);

Don't you agree that the above is a particularly ugly line of code? And while we are considering it, why return this time and exit everywhere else? Let me offer a little challenge: rewrite this function so that it has a single exit point (via a final return statement. And using goto is not an acceptable solution.

The Harpist's Code Review Reviewed by the Tuner

Of course most of the article (C Vu Sept 98 (page 18) is excellent, but I have some niggles. I'll simply take them in order within the article, so these comments will be rather disjointed.

"Every time you declare some identifier at global scope you introduce one more place where the linker can become confused"

I feel this anthropomorphic comment is unhelpful, especially for an inexperienced reader. Linkers don't become confused, any more than kettles or televisions become confused. The linker does what it's told to do. If the programmer breaks type safety then if anyone is confused, it's not the linker, it's the programmer.

"Why has it [the look-up table 'crc_table'] been laboriously initialised in the file? Suppose that some other table of values was wanted, this mechanism requires recompilation."

A reasonable question to ask, but the Harpist seems to conclude that there is only one answer. Taking this point of view to the limit, no data within a program should be initialised, everything should be read from data files. I hope that is obviously ridiculous.

Whether to initialise within a program, or use external data, is a design decision. I assume the given code is calculating a "well known" CRC (and of course comments referring to such an algorithm description would have improved the code). Putting the data in an external file might make the code more flexible, but at the risk of the data file being lost during distribution of the program binary, or 'well-intentioned' or accidental modification of that data by some user. Moreover, by introducing generality in this way, the reader must consider not only the code, but also look elsewhere for the look-up table - so in a sense the data has become "more global" (a downside that the Harpist is rightly conscious of in other contexts).

And what is "laborious" about its initialisation?

"I do not understand why the function returns the complement."

This is fairly common in checksum and CRC algorithms. I haven't taken time to look at the detail of this one, but consider a trivial example. Form a checksum by adding values together. A sequence of all zeros will have a zero checksum. Imagine this is being used on some electronic board, a wire comes loose, the data is corrupted to all zeros. The problem is not detected because the checksum is also corrupted to zero.

Inverting the checksum is a way of solving this, but checksums are not as good as CRCs in detecting some other errors (e.g. data reversals). A well-designed CRC algorithm takes into account many different failure modes. The comment "postconditioning" might well be a sufficient comment if only the author had made reference at the start of the function to the source of the algorithm.

".. buried in the implementation is the implicit requirement that the look-up table contains 256 elements."

[Harpist's new code] static unsigned long crc32_table[256]; /* this algorithm requires a look-up table of 256 elements */

Oh, come on! This is a little too close to the commenting style that gives us

        count = 0;  /* Set count to zero */

If you're calculating a CRC for an array of bytes, then unless you're doing something pretty strange (something that would indeed need a comment), you're stepping through the array a byte at a time and using each byte to index into the look-up table. A byte (unsigned) ranges from 0 to 255, hence 256 values in the table.

If any of that comes as a bolt from the blue, then such a reader has no business altering code that calculates CRCs. The Harpist himself says "Comments are supposed to help you understand how the code solves the problem and to highlight dangerous or arcane lines of source code". Adding trivial comments does not add value, it adds clutter.

[Harpist's new code] if ((chr_ptr = strrchr(ini_name, '\\'))) chr_ptr++; else chr_ptr = ini_name;

Bearing in mind that the article is tutorial in nature, I would prefer to see something like:

chr_ptr = strrchr(ini_name, '\\');
if (chr_ptr != NULL) chr_ptr++;

Confusing '==' with '=' is too easy to do, though modern compilers issue warnings. Such a style in a tutorial article may encourage people to turn off those useful warnings. When you take a compiler out of its box, you should flip all its warning switches to on, I (strongly) believe. And if you ever need to turn any off, you should have a very good reason.

[existing code] if (chr_ptr = strtok(inifilename, ".")) {   chr_ptr = chr_ptr + (strlen(chr_ptr) + 1);

".. but the next two lines baffle me .. set chr_ptr .. and then make no use of it because the next time chr_ptr occurs it is on the left of an assignment"

Huh? And how about the occurrences on the RIGHT of the same assignment? Of course it's made use of. "strncpy(inifilename, chr_ptr, 9);"

I'm not certain that chr_ptr invariably points at a string less than 9 char. Will the destination be terminated?

"Provide a local static array large enough for .."

On page 24 the Harpist suggests using a local static array for returning an error message. Yet on page 21 he is concerned about mechanisms that don't work in multi-threaded environments.

".. worried by the intent to read an entire file into RAM."

If the .ini file is generated by the same (or a companion) program, perhaps the designer has ensured that such files are less than 64K. Sometimes it's very convenient to hold a small file as an array in memory. But, as the Harpist says, a file size check is desirable.

Even if the files are externally generated, assuming they're small might be a practical restriction. "Dir \windows\*.ini /os" on my system shows the largest file is about 10K. Microsoft provides the Notepad text editor, which can't handle files over 64K. If Microsoft does it, it must be right. Ooops, I think I can feel a bullet hole in my foot!

Still, before anyone runs away with the idea that I thought badly of the article, quite the contrary. It's not much to take issue with, in an article of 8 densely packed pages of text.

As for Spencer, I hope he's encouraged to persevere, it's clear he's on an upward spiral. Many of us have seen far worse code than his, in production code that has been paid for. And, most serious of all, those concerned are often uninterested in improving.

Notes:

More fields may be available via dynamicdata ..

Journal Articles

Title: Anonymous Code Exercise Rewrite

The Harpist's Code Review Reviewed by the Tuner