Title: Buffer Overflows and the Standard C Library

Author:

Date: 03 March 2000 13:15:35 +00:00 or Fri, 03 March 2000 13:15:35 +00:00

Summary:

Body:

In C Vu 12.1 (page 19), Edward Collier reasonably asked what was wrong with scanf() and fscanf(). The short answer is that they are two of several Standard C Library functions which fail to honour buffer bounds. The following functions have this failing: gets(), strcpy(), strcat(), sprintf(), fscanf(), scanf(), vsprintf() and strxfrm().

Buffer overflows are caused by programs allocating blocks of memory and then trying to put too much data into them.

At the very least, a buffer overflow may result in garbage being written over other data structures and code, causing corruption and or program failure. In the worst case, deliberately created overflows can be used to acquire administrative privileges, if the program itself runs with elevated privileges. Buffer overflows are a major source of problems with software today and are responsible for a significant percentage of the exploitable security problems. A great deal of sysadmin time is spent applying fixes to these problems (assuming that the author or vendor of the code will support it).

This article was intended to be part of a series on writing secure code. Anything that makes code more secure will in general make the code better and more reliable. I certainly wouldn't put myself forward as a C programming expert, but I am paid to know about computer security issues and I know enough about the coding to know what causes problems.

There are fundamentally two approaches to the problem.

Allocate buffers that are larger than necessary and hope that they never overflow. This is not always a defence. For example, hackers found a way of exploiting a buffer overflow in IIS by using an 8000 character URL, even though you would not expect anyone to type in a URL of anything approaching that length. This approach is wasteful of resources and only masks the problem -as we have seen, it does not solve the problem.
Use alternative functions or use the functions in such a way as to reduce the risk.

Sometimes there are alternative ways of using the Standard C Library to achieve the same functionality given by the more dangerous functions. However, the programmer has to be fully aware of the subtle differences between a function and its replacement:

gets() vs fgets()
char *gets(char *s)
char *fgets(char *s, int maxsize, FILE *fp)

gets() will copy the input line into the array and replace the terminating '\n' with '\0'. The normally recommended replacement is fgets(), which reads at most, maxsize-1 characters from a file into the array. Note however that the newline character '\n' is included in the array and then the array is terminated with '\0'. While fgets() will prevent a buffer overflow, the programmer now has to decide what will happen if the input data has been truncated. Whether it is important depends on the program. For example a user may intended to delete the directory C:\winnt\system32\backup, but if this is truncated to C:\winnt\system32, the result will be very different. This is a somewhat extreme example - but something equally catastrophic could happen. A simple example of a problem with gets() is shown by The Harpist in C Vu 12.1 page 23 (leaving to one side the issue of whether the storage pointed to by reply should be modifiable).

strcpy() vs strncpy

char *strcpy(char *s, char const *c)
char *strncpy(char *s, char const *c, size_t n)

strcpy() will copy string c into string s including the terminating '\0'. Note that no checks are made on the buffer size pointed to by s. The normally recommended replacement is strncpy which will copy at most n characters from c into s. s is padded with '\0' if c has fewer than n characters. There is a subtle feature here waiting to trap the unwary. If the characters from c exactly fill s, then no terminating '\0' will be appended. If the contents of s are passed to any function or routine expecting to see a standard nul-terminated C string, the routine will look beyond the end of s (and into an arbitrary location in memory) until a nul character is found. A commonly adopted solution is to force a nul character into the final location.

#define BUFSIZE 80
char s[BUFSIZE];
strncpy{s,c,BUFSIZE);
s[BUFSIZE-1] = '\0';

This is OK as far as it goes, but it does have the same problem with truncation described above. People have pointed out that the action of strncpy() to fill an overlarge buffer with '\0' may be time consuming. Unless the buffer is huge, this is unlikely to be a problem on most modern systems.

strcat() vs strncat()

char *strcat(char *s, const char *c)
char *strncat(char*s, const char *c, size_t n)

strcat() will append the string c to string s, without checking if there is space. The problems seen with strcpy() are seen with strcat(). strncat() will copy at most n characters from c to the end of s and will terminate it with '\0'. Again, the problems with strncat() are like those of strncpy().

scanf() and fscanf()

scanf() is really just fscanf() using stdin as the source file. Bjarne Stroustrup explains what the problem with these functions are and how to code around them in C Vu 12.1 page 20, so there is little point in repeating that information here. The point to note about the expert-level solution presented in that article is the use of a fixed length buffer in the sprintf() function. It has to be oversized to allow for the fact that a very large number may be used here (e.g. MAXINT). sprintf() does the same formatting of arguments as printf(), but stores the output in a string rather than to stdout. No checks are done on the size of the output string buffer by sprintf(), so once more there is an opportunity for a buffer overflow for the unwary. sprintf() is problematic because there is no portable way of replicating its functionality with buffer bound checking.

It is worth remembering that other "standard" code might have problems. Consider for example the getopts() function, which although not part of the standard library is available in most installations as a standard method of parsing command line arguments. This may have been written to be safe - but have you checked it?

Conclusions

Unless you have taken action to check the input into a fixed size buffer, you will have problems. The problems may not emerge for sometime.

Generous use of strlen() to check the length of strings before copying or concatenating them will reduce the number of problems experienced. realloc() is a much underused function, but may be used where unknown amounts of input data may be experienced. An example of its use can be found in checkISNB2 and checkISBN3 on the code disk for C Vu 11.6 (or in the FTP archive).

The "dangerous" functions discussed above can be used. The programmer needs to be aware of the risks and how to minimise them and then make a considered judgement. Because they do few checks they tend to be swift and efficient - but then so is a cutthroat razor.

Notes:

More fields may be available via dynamicdata ..

Journal Articles

Title: Buffer Overflows and the Standard C Library

strcpy() vs strncpy

strcat() vs strncat()

scanf() and fscanf()

Conclusions