Journal Articles

CVu Journal Vol 12, #4 - Jul 2000 + Programming Topics

Browse in :

All > Journals > CVu > 124 (22)
All > Topics > Programming (877)
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: Reading Integers Revisited

Author: Administrator

Date: 03 July 2000 13:15:38 +01:00 or Mon, 03 July 2000 13:15:38 +01:00

Summary:

Body:

Like your Editor, I too rush to my contribution in a magazine that falls through my letter box: I want to see how the Editor has corrected my English and made the item comply with House style so that next time I can send him something better. All right, its vanity and I like to see my name in print.

But unlike your Editor, I find that a week spent writing, testing, checking and cross-checking my work has failed to detect a fault that leaps out at me from the printed page. So please let me try to correct one now.

The Problem

This all concerns my piece entitled Reading Integers in the March 2000 issue of C Vu.

It is to do with the call to strtol, which returns a long int, and what happens when that type is cast to an int. It also depends on what your compiler does, and, to some extent, on what you computer (or the microprocessor in it) does as well.

Let me assume, as happens on my computer, that the compiler generates code for 16-bit ints and 32-bit longs. So an int can hold all values between -32768 and +32767 while a long int can hold values between -2147483648 and +2147483647.

You can see how your compiler implements ints and longs by looking in the standard header file <limits.h>. This should contain definitions of INT_MIN, INT_MAX, LONG_MIN and LONG_MAX, which are the most negative and most positive legal ints and the most negative and most positive legal long ints respectively. You will probably find the values I gave above for these symbols, but you do need to know what your compiler does.

Keeping the above in mind, let us see what my original code does.

After the message prompting for user input is written, strtol is called. This reads in a number, which let us say is 1000, and returns that value as a long int.

Somehow, this is cast (the type of the number is changed) to an int, and that value is stored in the variable i. Now 1000 is a legal value for an int, so all is well, and my code can then safely check that something was read from the line, and that it is in the intended range.

The key to the fault I have created is that 'somehow' at the beginning of the previous paragraph.

Let us see what occurs if a number like 67000 is entered. This is a perfectly legal value for a long int, so that is what strtol returns.

But this is most certainly not a legal value for an int, so what happens when the value is cast to an int and an attempt made to save it in i? My compiler merely takes the bottom part of the long int value, calls it an int and stores that. And so the value 1464 is written to i. (To see why, work out 67000 as a 32-bit binary number, take just the 16 least significant bits, ignoring the high order bits, and convert back to a decimal number. You should get 1464.)

So i has had a value stored in it that is nothing like the intended value, and my subsequent range checking fails to spot this.

I cannot say what your compiler does (though you should find out). If your int is the same size as a long int (as sometimes happens), nothing will go wrong.

But the Standard says that when one converts from an integral type (which includes all the integer types, of whatever length) to a signed integer of shorter length (which is what the Standard calls 'demotion' and is what I am doing here), the result is 'implementation defined'. That is, the compiler can do what it likes (but the implementor must document it. FG).

I suspect most compilers will do as mine does, and merely store the lower part of the long in the int variable. But some compilers might notice the problem, refuse to complete the cast, and stop the program with a run-time error. Indeed some languages, such as Ada, specify that this is exactly what must happen (though Ada also allows the programmer to recover control and attempt to repair the problem).

More interestingly (read 'worryingly'), the Standard does say what to do when an integral type is demoted to an unsigned integer. It says you get the least significant bits. At least, that is what it implies in the sort of long sentence that gives Standards Documents their well deserved reputation for being unreadable.

The Solution

This is not a very useful state of affairs for my original function. How can I correct this?

Well, the first step is that the range checking will have to be carried out with long ints. And that means that the type of i must be changed to a long. Now, I do not like long's called i as a matter of principle, so I have also changed the name of the variable to l. And of course, the cast of the value returned by strtol has to come out.

The range check should be changed to account for the fact that I am now comparing a long (the number entered) with an int (the user's limiting values). The compiler will do this implicitly, though I prefer to do it explicitly.

But one thing I do not want to do is to change the interface to my function and make the limits long ints. That would annoy all the people who are using it even more because not only have I got the function wrong, but they would have to do some work to fix it. And fortunately that is not necessary.

The upper and lower bounds can remain as ints in the function prototype: they need only be cast to long ints when the time comes to perform the range checking. And anyway, it is reasonable for the user to expect the limits to be specified in the same type as he is trying to read in.

There is one other necessary change. The format, or second, parameter in the call to fprintf which is used when the number read in is out of range must be changed to print a long int rather than merely an int.

Finally, not necessary, but advisable, is to cast the type of l to an int in the return statement. The compiler will do the necessary casting for you implicitly, but by doing it explicitly you tell the reader that whilst you have calculated the function result as one type, you know a different type is to be returned.

And how did I come to miss this bug in the first place?

Well, I have been doing a lot of work over the past year in a language other than 'C' on a machine where int is the longest available integer type and there is no speed penalty to be paid for using that instead of one of the several available shorter integer types. So because all the system calls use the int type, and to avoid having to worry about casting between different integer types, all my integers are ints. And in writing the piece for you, I forgot about the trap and fell into it.

Writing for one machine in the day and another in the evening is not as easy as it sounds.

Some Other Points

The Editor raised a couple of points at the end of my piece that I should also address.

On checking the Standard, I now see that strtol accepts all consecutive characters which might form a legal integer of the given final base argument without regard for the largest number that can be saved in a long int. So your Editor is right and I was wrong. Having accepted the characters, strtol then attempts to form that integer and sets the error flag errno if it does not fit a long.

The reason for my using strcpy to overwrite the comment character is that in addition to the '\n' character, strcpy also writes a '\0'. So after I have deleted the comment, the string in line continues to look as though it could have been returned by fgets.

The point is not really important in this example, but I suggest that in large programs, keeping the style of an object (like a string) the same when one changes a bit in the middle can avoid causing problems elsewhere in the program.

If you do not believe this, do you think that gets and fgets reading from stdin should do different things? Quickly now, which one does what?

More Error Handling

I said above that strtol indicates in errno if the number it tried to read would not fit in a long. It does this by setting errno to the value of the macro ERANGE.

Now the use of errno is probably worth an article in its own right (definitely. FG), if only because the Standard says that if a function is not documented as using errno, then that function can set errno to any value it likes, even if there is no error. And if a function that is documented by the Standard as using errno does not find an error, it must not clear any previously recorded error.

So how should errno be used? Well, firstly, include the standard header <errno.h>. Then, immediately before you want to check the operation of a function that is documented as potentially setting errno, set errno to zero. Call the function and then check if errno is still zero, which implies that no error has been detected. If it is not, then something has gone wrong.

How can one determine what has gone wrong, and so decide how to fix it? If errno has been set to the value the Standard says might be returned, then the Standard will also say what has gone wrong.

But the Standard does not preclude some other non-zero value from being returned. And to find what that means you must rely on either the compiler reference manual, or the perror function which prints a diagnostic message based on the value of errno.

This all means that I need two steps to ensure that strtol worked as anticipated. First, I must check for errno being non-zero, and not equal to ERANGE, which means that something unexpected has gone wrong. In that case, I use perror to print a diagnostic message and stop the program.

Secondly, when I check the range of the returned number in l, I must also check for errno being zero. If errno is non-zero, the previous step means it must have the value of ERANGE, and I proceed as if the returned number is out of range.

All in all, there are now a number of changes to my original function, and so I reproduce the new listing of my get_int function below.

Which brings me back to my original point. Starting from the point of implementing the apparently simple task of safely reading in an integer, we have come quite a way and visited several complex areas of the 'C' language and its run time library. No wonder that original, fictional, tutor preferred to leave his students with scanf and hope nothing would go wrong.

/*: get_int.c   Prompt for, read and check, an integer from 'stdin'.  */
/* Version 2.0  Corrected use of result from 'strtol'.      */
#include  <stdio.h>           /* Standard I-O header.    */
#include  <stdlib.h>          /* Standard library header.  */
#include  <string.h>          /* Standard string header.  */
#include  <errno.h>           /* Error number header.    */
#define    TRY_LIMIT  5       /* Retry limit for reading int.  */
#define    LINE_SIZE  256     /* Maximum line length allowed.  */
#define    COMMENT    '#'      /* Comment character.    */

/*: get_int Writes the 'prompt' message to the terminal, reads a line of text as the reply  */
/* and attempts to find an integer from it. Any integer found is range checked such then    */
/* 'lower' <= i <= 'upper' is true. This is returned as the function value. In the event of */
/* problems, a message is printed, and the number is re-requested. After the retry limit is */
/* exhausted, the function prints another message and aborts the program.       */

int get_int (char * prompt, int lower, int upper) {
  char  line[LINE_SIZE];      /* Input line buffer.    */
  char  *com_posn;            /* Position of any comment.  */
  char  *next;                /* Where 'strtol' stopped.    */
  long  l;                    /* Number read from the user.  */
  int  try;                   /* Retry counter.      */
/* Loop per attempt to read the number from the user.    */
  for (try = 0; try < TRY_LIMIT; ++try) {
    fputs (prompt, stdout);
    if (fgets (line, LINE_SIZE, stdin) == NULL) {
/* Something has gone badly wrong, possibly  End of File.    */
      fputs ("'get_int' found End-of-File.\n", stdout); exit (EXIT_FAILURE);
    }
    if ((com_posn = strchr (line, COMMENT)) != NULL) /* Delete any trailing comment */
      strcpy (com_posn, "\n");          /* A comment is present.  Mask it out.*/
/* Attempt to read an integer from the start of the line.  */
    errno = 0;    /* force errno to no error */
    l = strtol (line, &next, 10);
    if (errno && (errno != ERANGE)) {   /* check for reproted error */
      perror ("Unexpected error from 'strtol'."); exit (EXIT_FAILURE);
    }
    if (! isspace (*next)) fputs ("Failed to read anything from line.", stdout);
    else {
      if ((errno == 0) && (l >= (long) lower) && (l <= (long) upper)) return (int) l; 
      else   fprintf (stdout, "%s %ld %s%d%s %d%s", "The number", l, "is not in the range ["
                            , lower, ",", upper, "]." );
    }
/* If this is not the last try, ask for another line.  */
    if (try < (TRY_LIMIT - 1)) fputs ("  Please try again.\n", stdout);
  }
/* When here, we have exhausted the retry limit without obtaining a suitable response. */
  fputs ("\n'get_int' aborting program: retry limit reached.\n", stdout);
  exit (EXIT_FAILURE);
}
=======================================================================
/* A test program for 'get_int' function.    */
#include  <stdio.h>
int  get_int (char *, int, int);
int  i;        /* Number read from user. Global OK in test harness   */
int main (argc, argv) {
  fputs ("Use [* Break *] to terminate run.\n", stdout);
  while (1) {      /* Loop forever.    */
    i = get_int ("Integer: ", -1000, 2345);
    fprintf (stdout, "Read %d.\n", i);
  }
}

I have changed Posul's K&R style function declartions to prototypes, and refuced or eliminated many comments to reduce line count. FG

Notes:

More fields may be available via dynamicdata ..