Title: Security Implications of Running a Web Gateway

Author:

Date: 03 November 1998 13:15:28 +00:00 or Tue, 03 November 1998 13:15:28 +00:00

Summary:

Body:

While running a gateway does not directly impact on C/C++ programming many of us have to turn our hand to such things. Silas is well known to many of you and so I am delighted to let him share his thoughts on this subject. If you do not already know, Silas is not only ACCU's Disabilities Officer but is also sight-impaired himself..

By "Web gateway" I mean a CGI program that can be told to get any web page, do something to it, and return it to the requester. The following lists some of the possible attacks that may be made on such a program, and attacks that can be made on any CGI program, and the measures that I have taken against them in my own. If anybody has anything to add, then please do send it in.

Firstly, something that is nothing to do with security, but is nevertheless perhaps the most important thing when writing any CGI program: Make sure that you get the "Content-Length" right! The Content-Length header can be omitted, but doing so will slow down some browsers, and in any event Content-Length is usually required if your server supports the "keep alive" protocol (which is a good thing, especially if your pages load lots of small images, because the browser can get the lot in one single HTTP negotiation). But if you do specify a Content-Length, IT MUST BE CORRECT! In other words, you must not print ANYTHING without it being accounted for in the Content-Length. Don't forget to make sure that when you print newlines (\n) your system really does pass that character to the server; if it passes \r\n then you may need to count twice. Getting the Content-Length too big will cause the connection to stall, and getting it too small will, in most cases, crash the Web server (and you won't be too popular if it were someone else's server). Get it right.

And now for the security considerations:

Sending rubbish CGI input. This can happen to any CGI program. If it gets rubbish (or something that nearly means something but is not quite correctly formatted), it's all right to give back a rubbish response (because it should not happen during normal use),but it's not all right to hang the program, cause an access violation, corrupt a database, or otherwise mess things up on the host computer. I was very careful with my exception system and tried to make no assumptions at all about the input, so if anything is wrong then it throws an exception before anything serious happens. The exception handling is very simple: it just says "Something has gone wrong with my program", the nature of the exception, contact details, and so on.
Getting rubbish Web pages. An attacker could easily serve a Web page with some almost-but-not-quite-valid HTML in it and then ask the gateway to get that page, hoping that the errors in it would crash the gateway program. Again you need to be on your guard, although this time it is best not to say "Something has gone wrong with my program" at every mistake, because many real Web pages are not quite right anyway.
Obtaining the executable. If you want to make sure that nobody can do this, then you need to check the access permissions very carefully. Some Web servers have bugs that let you obtain the executable - for example, some NT web servers will return the script if you put its name followed by::$DATA, because of NTFS's stream system. Other servers have bugs that let you read files from the parent directory (by typing ./) or add extra commands on the command line. Usually if you are running a binary executable as the CGI the risks are not so great. I did lots of testing by trying to crack in to my own system, just to make sure.
Accessing internal data files, especially writing to them. This is the same sort of thing. In my case I don't really mind who gets the data files as long as they don't modify them - they're only public domain mapping tables and a help file - but there are other applications where this would be more serious. Sometimes it is easy to get a CGI data file if that file is placed in the scripts directory, because some servers will return the file itself if they don't know how to execute it. Keep your data files in a different directory if possible (and not a subdirectory of the scripts directory), and it is a good idea to test your system by simply trying to retrieve a data file with a browser - you may be surprised. Some web servers insist that the FTP users, Web users and CGI scripts all have the same set of rights, and if you have one of these then you need to change it. Small, fast and reliable Web servers are freely available for many platforms. When I found that a bug in Internet Information Services had permanently allowed universal access to one of my directories, I uninstalled it, found a nice little program called Xitami, and was up and running within ten minutes (although there are still some things to watch out for).

A variation on this theme is asking the gateway to retrieve a local file that would not normally be accessible from the Web. You need to write such precautions into the gateway, along with blocking out URLs like Telnet and email (if you are using a web-getting system that is not your own).
Some security sites have a "Click here to launch a test attack against your system" link. Anyone who wants to attack the gateway computer need only ask it to get that URL. You need to have already done the test attacks yourself and corrected any problems.
Using your system as a puppet. A Web gateway can be asked to get virtually any URL, and if another system can be attacked by sending it a funny URL then the attacker can ask the gateway to send it in an attempt to cover tracks (or hoping to get you into trouble). This will mostly involve sending a rogue CGI query to the other system (via your gateway), and there is no code you can write to reliably detect such things. You can at least explain yourself if anybody takes any action, and offer to use your logs to help them track down the real attacker, but it seems rather futile to try to stop this from happening in the first place, especially given that they can always use a public terminal or whatever rather than using you as a puppet, and the only real reason for using you is to get you into trouble. Some of the most silly things you can do with CGI (such as setting a variable to an empty string) will already be covered by your exception handlers if they are like mine, so in the general case it is quite difficult to do one of these attacks anyway.
Creating sharing violations. Web servers like Xitami are multi-threaded, which means that several people can call your script simultaneously. If your program were not written on this assumption, then an attacker could use it to create problems. One particular nasty is when the operating system or compiler routines are untrustworthy. For example, if two instances of tmpnam() executed simultaneously in independent processes, then there is no guarantee that they will not potentially generate duplicate names on every compiler. I found it helpful to get the program's process ID, which would be unique at that time, and use it with things like temporary files.
Overloading the service. It doesn't matter if it takes a long time before a remote server responds to a page-getting request, because this does not take processor resources on the gateway computer (the getting process is blocked until data is available). However, if somebody asked the gateway to get a very large web page and do extensive processing on it, and then sent many such requests per second, they could easily slow things down for everybody else. They could also possibly slow things down for the person actually using the gateway computer, especially with servers like Xitami where the default priority of the Web service is set to high to ensure maximum response. Normally a high priority is acceptable, but it can cause problems with CGI scripts. In my case, I left it set to high, because I often run legacy DOS applications in Normal priority, and NT doesn't always know when they are idling. If the web server were set to low priority then it would not service requests while I am running DOS applications.

In my case, I do not like the idea of setting arbitrary limits (such as size limits) on the gateway's use (and if you do set a size limit, then you should not trust the HTTP header's "Content-Size", because it could be artificial if the attacker has asked for a page from their own suitably-reprogrammed phoney Web server). People may sometimes legitimately request large pages. My application is a gateway that sorts Web pages out for visually impaired and international users (one needs things like frames and tables re-arranged, the other needs conversions of non-Roman characters), and to impose artificial limitations for security would be against its principles (it's an "access" gateway, not a "let's limit your use" gateway). However, most browsers will set a time limit when retrieving pages, and it is reasonable to set a generous time limit on CGI scripts (say five minutes) beyond which the server will stop the script if it is still executing (because the browser has probably already given up anyway). This will at least mean that if somebody sent an infinite CGI request, or a request for an infinitely long Web page (by perpetually writing data down a TCP/IP stream), the gateway will not be blocked forever, but it is not an ideal solution.

For one thing, if you are relying on the server "killing" your process, then, if you can't catch the "kill" signal, it will leave behind its temporary files. This might be the least of your worries - it is possible that on some systems the processes owned by yours (e.g. web getting) will not stop; you need to check for this if possible, and if so make sure that those processes will stop by themselves if necessary. Also, attackers can still overload your system by sending large numbers of simultaneous requests of moderate size. Assuming that an attacker is not doing "IP spoofing", you can keep track of the requesting IPs and cut out if more than a certain number of requests happen in a certain unit time (bearing in mind that several requests can happen in a multiframe document etc), although this does introduce some overhead for legitimate users. Blocking would usually need manual intervention, since there will be cases where numerous people can appear to be at the same IP address (e.g. a Unix server and/or a dial-up ISP), but automatic blocking can still be programmed if it is known that attacks are most likely to come from a particular location. For example, I could quite easily write code to recognise when a request is coming from within Cambridge University, and this is the most likely source of attack on my program (if a bunch of drunken students can send me anonymous derogatory messages about blindness through a Web-based remailer that required re-writing the HTML if you wanted to send to arbitrary addresses - except they didn't think its postmaster would be on my side - then they can probably launch an attack or two on my CGI next time). There is still the trouble that IP monitoring adds overhead to normal users, especially if you are using the CGI paradigm (for portability) and have to do all that file access. Also, it would not be good to permanently block a shared or assignable IP address (as in the BOOTSERVs used in Cambridge).

I do make a point of checking the logs. This will not stop attacks as they happen, but it will at least make sure that they come to my notice within 24 hours (earlier if I notice them while using the computer or the gateway myself), and if they are sufficiently infrequent then this would be adequate (although not ideal). I usually casually glance through the logs to check everything's all right, i.e. no obvious malfunctions or attacks, but I don't use the logs to pry on people - it's an access gateway, not a surveillance gateway. I always delete old logs, I make a point of doing my looking-through while half-thinking about something else and just before an interesting task that makes me forget anything I might have noticed, and I don't bother to look at all the details if everything seems fine anyway. The logs are useful if somebody reports a problem because I don't have to ask them for dozens of details, and I have not yet had anybody complain about privacy - if they are concerned then they may be surprised at just how many logs there are around the Internet anyway. The other problem with checking the logs is that it takes so long, especially when you add up all the time I would spend on it over a year. I've been toying with the idea of automatic checking (I already have batch files to filter out my own access and so on), but developing a good enough fully automatic program would be too time-consuming (at least in the short and medium term) and I might as well just read the stuff. Unattended speech synthesis doesn't help either - it might seem a good idea to have the thing babble away while I eat breakfast (or whatever), but all too often I need to skip a few lines when it's obvious what's going on, otherwise the process takes far too long. Also the synthesiser isn't very configurable - I usually work with my residual sight, and I wouldn't like to do serious stuff with my ancient bundled copy of an early version of TextAssist that I've just managed to actually install.

I am not sure if there is a catch-all solution to overloading. Doing the logs is a bit of a chore, but any potential attackers can be assured that I will continue to be alert to them until I find a better method (and I know where the managerial offices are). If you have a limited number of users then you could set up a password system (and then the attacker would first have to sniff the connection to get the password), but in mine I ruled out that possibility by introducing the character conversion options, with potentially thousands of local students who might need them. I write this before returning to Cambridge and putting the internation-alised version online, and I'm beginning to think to myself "you're really going to put your foot in it this time". I wish I could stick my cane into this course of action and see if it's clear. Then I think of all those students struggling to get the computers to display their native characters - no, I'm not backing out that easily. Let's have some interesting times. I do hope this thing actually works.

Notes:

More fields may be available via dynamicdata ..

Journal Articles

Title: Security Implications of Running a Web Gateway