Title: Professionalism in Programming #29

Author:

Date: 07 December 2004 13:16:09 +00:00 or Tue, 07 December 2004 13:16:09 +00:00

Summary:

Body:

The more you seek security, the less of it you have. (Brian Tracy)

Last time we opened an ugly can of worms by investigating the seedy world of software security. We learnt the nature of security problems and discovered why it's depressingly hard to secure our code. This article concludes our tour by investigating specific code vulnerabilities and working out how to prevent them in the programs we write.

Feeling Vulnerable

To learn how to write secure code and defeat our adversaries let's look at the security nuts-and-bolts. These are some specific types of code vulnerability. Each is a hole that can be compromised by an attacker.

Insecure Design

This is the most fundamental flaw, and consequently the hardest to fix. If you don't consider security at the architectural level then you will be committing security sins everywhere: sending unencrypted data over public networks, storing it on easily accessible media, and running software services that have known security flaws.

You could write a simple system, and rely on your host environment for security, but then your application will only be as secure as that system. For example, a Java program can be no more secure than the JVM it's running on.

Absolutely every system component must be considered for security concerns. A computer system is only as safe as its least secure part.

Buffer Overrun

Many applications are public-facing, running an open network port or handling input from a web browser or GUI interface. All of this input must be parsed and acted on. If you're not careful, these are prime sites for security failure.

Parsing is often done using the standard C library function sscanf (although this exploit is far from a C-only problem). You might see code like this:

void parse_user_input(const char *input) {
  /* first parse the input string */
  int  my_number;
  char my_string[100];
  sscanf(input, "%d %s", &number, my_string);
  ... now use it ...
}

The problem is simple (and obvious). A badly formed input string could cause mayhem. Any string over 100 characters long will overrun the my_string buffer, and smear arbitrary data across invalid memory addresses.

The results of this can vary in severity. Sometimes the program will carry on unaffected; you've been very, very lucky. Sometimes the program continues, but its behaviour is subtly altered - this can be hard to spot and confusing to debug. Sometimes the program will crash as a consequence, perhaps taking other critical system components down with it. But the worst case is when the spilt data gets written somewhere in the CPU's execution path. This isn't actually hard to do, and allows an attacker to execute arbitrary code on your machine, potentially even gaining complete access to it.

Overrun is easiest to exploit when the buffer is located on the execution stack, as in the example above. Here it's possible to direct CPU behaviour by overwriting the stack-stored return address of a function call. However, buffer overrun exploits can abuse heap-based buffers too.

Embedded Query Strings

This breed of attack can be used to crash systems, execute arbitrary code, or fish for unauthorised data. Like buffer overrun it relies on a failure to parse input, but rather than burst buffer boundaries these attacks exploit what the program subsequently does with the unfiltered input.

In C programs format string attacks are a common example of the problem. A great culprit is the printf function (and its variants), being used as follows:

void parse_user_input(const char *input) {
  printf(input);
}

The input string is used as printf's format string parameter, and a malicious user could provide an input string containing format tokens (like %s and %x for example). This can be used to print data from the stack or even from locations in memory, depending on the exact form of the printf call. An attacker can also write arbitrary data to memory locations using a similar ploy (exploiting the %n format token).

Solutions to this problem aren't hard to find. Simply writing printf("%s", input) instead of printf(input) will avoid the problem, by ensuring that input is not interpreted as a format string.

There are many other contexts where an embedded query can be inserted maliciously into program input. SQL database query statements can be surreptitiously fed into database applications to force them to perform arbitrary database lookups for an attacker.

Another variant is commonly exhibited by lax web-based applications. Consider an online bulletin board system providing forums where users post messages to be read by any other web browser. If an attacker posts a comment containing hidden Javascript code, this will be executed by all browsers rendering the page - without their users realising. This is known as a cross site scripting exploit, due to the way the attack works 'across' the system; from an attacker's input, through the web application, finally manifesting on a victim's browser.

Race Conditions

It is possible to exploit systems which rely on the subtle ordering of input events, to provoke unintended behaviour or crash the code. This is generally exhibited in systems with complex threading models, or which comprise of many collaborating processes.

A threaded program might share its memory pool between two worker threads. Without adequate guarding, one thread might read information in the buffer that the writer thread did not intend to release yet.

This problem isn't restricted to threaded applications, though. Consider the following fragment of Unix C code. It intends to dump some output to a file, and then change file permissions on it.

fd = open("filename"); 
/* point A */
write(fd, some_data, some_data_size); 
close(fd); 
chmod("filename", 0777);

There is a race here that at attacker can exploit. By removing the file at point A and replacing it with a link to their own file the attacker gains a specially privileged file. This can be used to further exploit the system.

Integer Overflow

Careless use of mathematical constructs can cause a program to cede control in unusual ways. Integer overflow will occur when a variable type is too small to represent the result of an arithmetic operation. An unsigned 8 bit data type renders this C calculation erroneous:

uint8_t a = 254 + 2;

The contents of a will be 0, not the 256 you'd expect; 8 bits can only count up to 255. An attacker can supply very large numeric input values to provoke overflow and generate unintended program results. It's not hard to see this causing significant problems; the following C code contains a heap overrun waiting to happen thanks to integer overflow:

void parse_user_input(const char* input) {
  uint8_t length = strlen(input) + 11;
  char *copy = malloc(length);
  if(copy) {
    sprintf(copy, "Input is: %s", input);
    ... do something with copy ...
  }
}

It's true that uint8_t is an unlikely candidate for the string length variable, but the exact same problem manifests itself with larger data types.

This kind of problem is just as likely with subtraction operations (where it's called integer underflow). It's not only generated by such simple operations, and can stem from mixed signed/unsigned assignments, bad type casting, and multiplication or division.

Protection Racket

So what techniques will protect us from this mayhem? We'll start to answer this with a simple analogy from the Real World. If you were to secure a building there's a number of things you'd do:

Close all the unnecessary entrances, brick up the back door, and board over the windows.
Obscure the remaining windows so people can't easily see what's inside.
Secure the entry points. Lock all doors, hide the keys, and make sure you use very good locks.
Employ a guard to patrol inside and out.
Add security mechanisms, like a burglar alarm, electronic pass cards, identity badges, etc. There's no point in installing these if they're not used properly, though. A door can be left ajar regardless of any fancy lock devices. A burglar alarm can be left unset.
Put all your valuables in a safe.

In summary, you would cut down on the possible attack points and employ technology that deters, blocks, identifies, and repels attackers. These have many software-writing analogues which we'll investigate below. They can each be applied at a number of different development levels, including:

On a particular system installation. The exact OS configuration, network infrastructure, and the version number of all running applications each have radical security implications.
The software system design. We need to address design issues like: can the user remain 'logged in' for indefinite periods, how does each subsystem communicate, and what protocols are used?
The actual program implementation; it must be flaw-free. Buggy code leads to security vulnerabilities.
The system's usage procedure. If it's routinely used incorrectly, any software system can be compromised. We should design to prevent this as much as possible, but users must be taught not to cause problems. How many people write down their username/password on paper beside their terminals?

Creating a secure system is never easy. It will always require a security/functionality compromise. The more secure a system is, the less useful it becomes. The safest system has no inputs and no outputs; there's nowhere for anyone to attack. It won't do much, though. The easiest system has no authentication, and allows everyone full access to everything; it's just terribly insecure. We need to pick a balance. This depends on the nature of the application, its sensitivity, and the perceived threat of attack. To write appropriately secure code we must be very clear about such security requirements.

Just as you would take steps to secure a building, the following techniques will protect your software from malicious attackers.

System Installation Techniques

First we'll look at practices that will protect your software once it's been installed. Perhaps this is backwards, but it will highlight what holes remain to be plugged at a lower level. No matter how good your application, if the target system is insecure then your program is unprotected.

Don't run any untrusted, potentially insecure software on your computer system.

This raises the question: what makes you trust any piece of software? You can audit open source software to prove that it's correct (if you have the inclination). You can opt for the same software that everyone else uses, thinking that there's safety in numbers. However, if a vulnerability is found in that software you, and many other people, must all update. Or you can pick a supplier based on their reputation, hoping that it's a worthwhile indicator.
Employ security technologies, like firewalls and spam/virus filters. Don't let crackers in through a back door.
Prepare for malicious authorised users by logging every operation, recording who did what and when. Backup all data stores periodically so that bogus modifications don't lose all of your good work.
Minimise the access routes into the system, give each user a minimal set of permissions, and reduce the pool of users if you can.
Set up the system correctly. Certain OSes default to very lax security, just inviting a cracker to walk straight in. If you're setting up such a system then it's vital to learn how to protect it fully.
Install a honeypot: a decoy machine that attackers will find more easily than your real systems. If it looks plausible enough then they'll waste their energy breaking into it, whilst your critical machines continue unaffected. Hopefully you'll notice a compromise of the honeypot and repel the attacker long before they get near your valuable data.

Software Design Techniques

As programmers this is the essential place to get our security story straight. You can try to shoehorn it into code at the end of a development cycle, and you'll fail. Security must be a fundamental part of your system's architecture and design.

So what design techniques will improve our software security? The simplest software design is the easiest to secure. So don't run any software at all. Failing that, run your program in a sealed box in an underground bunker in an undisclosed location in the middle of a desert. That way, crackers can't get anywhere near it. Otherwise you'll have to think about how your software will be used, and how to actively prevent anyone from abusing it. Here are the winning strategies:

Limit access to the system as much as possible. The hardest kind of access to guard against is physical access to the computer itself; how can you stop an attacker switching it off, or installing their own evil software? Physical access notwithstanding, design your software to block as many entry points as possible.
Limit inputs in your design so that all communication goes through only one portion of system. This way an attacker can't get all over your code. Their influence is limited to a secluded corner, and you can focus your security efforts there^[1].
Run every program at the most restrictive privilege level possible. Don't run a program as the system superuser unless it's absolutely necessary, and then take even more care than usual. This is especially important for Unix programs that run setuid - these can be run by any user, but are given special system privileges when they start.
Avoid any features that you don't really need. Not only will it save you development time, it will reduce the chance of bugs getting into the program - there's less software for them to inhabit. In general, the less complicated your code, the less likely it is to be insecure.
Don't rely on insecure libraries. An insecure library is anything you don't know to be secure. For example, most GUI libraries aren't designed forsecurity, so don't use them in a program run as the superuser.
Avoid storing sensitive data. If you must, obscure or encrypt it. When you handle secrets be very wary where you put them; lock memory pages containing sensitive information so that your OS's virtual memory manager can't 'swap' it onto the hard disk, leaving it available for an attacker to read.
Obtain secrets from the user carefully. Don't display passwords.
Specify good locks. That is, use tightly controlled password access and employ strong encryption to store data.

The least impressive security strategy is known as security through obscurity, yet this is really the most prevalent. It merely hides all software design and implementation behind a wall, so that no one can see how the code works and figure out how to abuse it. 'Obscurity' means that you don't advertise your critical computer systems in the hope that no attacker will find them.

It's a flawed plan. Your system will one day be found, and will one day be attacked.

It's not always a conscious decision, and this technique works very conveniently when you forget to consider security in the system design at all. That is, it's convenient until someone does compromise your system. Then it's a different matter.

Code Implementation Techniques

With a bullet-proof system design your software is unbreakable, right? Sadly not. We've already seen how security exploits can capitalise on flaws in code to wreak their particular brand of chaos.

Our code is the front line, the most common route an attacker will try to enter through, and the place our battles are fought. Without a good system design even the best code is unprotectable, but under the shadow of a well thought out architecture we must build strong walls of defense with robust code. Correct code is not necessarily secure code.

Defensive programming is the main technique to achieve sound code. Its central tenet - assume nothing - is exactly what secure programming is about. Paranoia is a virtue, and you can never assume that the user will employ your program as you expect or intend.

Simple defensive rules like: 'check every input' (including user input, startup commands, and environment variables), and 'validate every calculation' will remove countless security vulnerabilities from your code.
Perform security audits. These are careful reviews of the source code by security experts. Normal testing won't find many security flaws; they are generally caused by bizarre combinations of use that ordinary testers wouldn't think of, for example very long input sequences which provoke buffer overrun.
Spawn child processes very carefully. If an attacker can redirect the sub-task then they can gain control of arbitrary facilities. Don't use C's system function unless there's no other solution.
Test and debug mercilessly. Squash bugs as rigorously as you can. Don't write code that can crash; its use could bring down a running system instantly.
Wrap all operations in atomic transactions so an attacker can't exploit race conditions to their advantage. You could fix the earlier chmod example by using fchmod on the open file handle, rather than chmoding the file by name - it doesn't matter if the attacker replaces the file, you know exactly what file is being altered.

Procedural Techniques

This is largely a matter of training and education, although it helps to select users who aren't totally inept, if you have that luxury.

Users must be taught safe working practices: to not tell anyone their password, to not install random software on a critical PC, and to use their systems only as prescribed. However, even the most diligent people will make mistakes. We design to minimise the risk of these mistakes, and hope that the consequences aren't ever too severe.

Conclusion

Programming is war.

Security is a real issue in modern software development; you can't stick your head in the sand and hide from it. Ostriches write poor code. We can prevent most security breaches by better design, better system architecture, and greater awareness of the problems. The benefits of a secure system are compelling, since the risks are so serious.

^[1] Of course, it's never quite that simple. A buffer overrun could occur anywhere in your code, and you must be constantly vigilant. However, most security vulnerabilities exist at, or near, the sites of program input.

Notes:

More fields may be available via dynamicdata ..