Understanding and avoiding SQL injection

First published: September 03, 2007
Last updated: March 17, 2013

Introduction

SQL injection attacks are one of, if not the most, common security vulnerabilities affecting modern web applications. They have been responsible for some high profile exploits, e.g. the defacement of the United Nation's website in August 2007 (discussed here), or of Microsoft UK's website in June 2007 (discussed here). However, they are also fairly simple to understand and to prevent. Any responsible web application developer needs to be familiar with the risk of SQL injection.

The example code in this document will be written in PHP, because it's the most widely used programming language for writing web applications. Bear in mind, though, that SQL injection vulerabilities are completely language independent. They can also appear in web applications written in Perl, Python, Ruby, Java, C or any of the Microsoft languages that I don't know the names of.

Understanding

SQL injection attacks, like most attacks on web applications, succeed when a developer blindly trusts input provided by a user to be the sort of input the developer expects and not something else.

The setting

Any situation where an SQL query is constructed using data which has been input by a web application user is a possible setting for an SQL injection attack.

As a common and motivating example, consider a simple log in system, where a user provides two strings as input: a username and a password. The web application compares the username and password to those stored in an SQL database of user's details and lets the user log in if it finds a match, turning the user away if it doesn't.

To make our example concrete, let's imagine we have a PostgreSQL database with a table named USER_INFO which has two columns: a column username of type varchar(20) and a column password of type varchar(16). When populated, the table may look like this:


 username |  password
----------+-------------
 Alice    | abracadabra
 Bob      | alakazam
 Eve      | opensesame

Our login system may be realised in PHP with code equivalent to the following (assuming database connections have already been made, etc.):


$username = $_POST["username"];
$password = $_POST["password"];
$sql_query = "SELECT * FROM USER_INFO WHERE username='$username' AND password='$password'";
$result = pg_exec($db_connection, $sql_query);
if(pg_num_rows($result) > 0 ) {
        let_user_in();
} else {
        turn_user_away();
}

Before reading any further, it is important to realise that what's done above is a terrible way to store and validate passwords, from a security perspective. I have used it here because it is also the simplest way to store and validate passwords, and I don't want to distract readers from the subject of SQL injection with any uncessary detail on things like salting and hashing passwords. If you don't understand the problems associated with the method used above, I recommend you read my discussion of techniques for validating and storing passwords.

Suppose the user Alice fills out the login form so that the value of $username is Alice and $password is abracadabra. Then the SQL query which is executed by the above code is:


SELECT * FROM USER_INFO WHERE username='Alice' AND password='abracadabra'";

The attack

So, where's the vulnerability? Suppose that a user who is not Alice fills out the login form so that $username is Alice and $password is foo' OR 'bar'=bar. That is, the user's submitted password contains apostrophes and an equals sign. The SQL query which is exected by the above code is now:


SELECT * FROM USER_INFO WHERE username='Alice' AND password='foo' OR 'bar'='bar'";

On account of the fact that 'bar'='bar' is evaluated to be true, this query will return a result even though no valid password has been supplied. A user has thus illegitimately logged in as Alice, simply by knowing her username.

It might get worse. Depending on the database system used, an attacker may be able to execute entirely arbitrary queries. For instance, if an attacker supplies a password of foo'; DELETE FROM USER_INFO WHERE username='Alice, then the query executed becomes:


SELECT * FROM USER_INFO WHERE username='Alice' AND password='foo'; DELETE FROM USER_INFO WHERE username='Alice'

If the database backend supports executing multiple queries in this way, our attacker has just deleted Alice's entry from the database. Now the legitimate Alice cannot log in with her correct password!

Many more examples are possible at this point, but the message is clear: if you blindly trust strings supplied by your users when constructing SQL queries, you are asking for a whole lot of trouble.

Preventing

Preventing SQL injection attacks is straightforward, at least in theory: don't trust any user input which will be used in the construction of an SQL query. Consider input strings to be ``tainted'' until you have taken steps to make absolutely sure they are safe. In this case, transforming tainted input into safe input is a matter of escaping all characters which have some meaning in SQL. If you do this vigilantly, your applications will be safe.

Don't panic: you don't have to write complicated code yourself to go through strings and figure out how to escape what and where. There are a few options to take care of the escaping fairly easily.

Escaping functions

Most programming languages, particularly those commonly used for web development, supply functions for most major database platforms to correctly escape characters which affect the meaning of SQL statements. For example, the PHP language provides the following functions for widely used database platforms: pg_escape_string, for PostgreSQL and mysql_escape_string for MySQL. The functions both take an SQL query string as an input and return that same query with any necessary escaping performed.

Prepared statements

All major database platforms today include support for so called ``prepared statements''. Very quickly, the basic idea behind prepared statements is that the database prepares in advance an SQL query with a fixed number of placeholders for unknown values in fixed positions. You can then supply values to replace these placeholders at a later time to do queries. Here's an example in PHP, using the PDO syntax (assuming $dbconn is an already initialised PDO object):


// Prepare statment
$statement = $dbconn->prepare("SELECT name FROM users WHERE gender=? AND country=?");

// Get all the male Australian users
$statement->execute(array("male","Australia"));
$results = $statement->fetchAll();

// Get all the female German users
$statement->execute(array("female","Germany"));
$results = $statement->fetchAll();

// Get all the male Japanese users
$statement->execute(array("male","Japan"));
$results = $statement->fetchAll();

// etc...

The main reason for using prepared statments is efficiency. If you need to repeat what is essentially the same query many times over with a few different values each time, it is much quicker for the database to prepare the unchanging part of the query in advance and just be fed values to plug in than it is to accept a completely new query string each time. However, the databases will also take care of escaping whichever values you plug into a prepared statment so as not to change the meaning of the query, thus guarding against SQL injection. Prepared statements thus provide not only increased efficiency, but increased security. Use them.

ORM software

A common, and generally advisable, software engineering principle used in large, complicated projects which have to deal with databases is to use some sort of ORM - Object Relational Mapping - technology to free developers from the task of having to write any SQL by hand. Very quickly, the basic idea behind ORM is to to provide a framework for automatically mapping between an instance of an object in an object oriented programming language and a row of a database table. Each table of the database represents a particular class in the application being developed. Each row of such a table is an instance of this class, with the columns of the table holding the values of that instance's attributes (or properties, or fields, or whatever your OO language of choice calls them). Developers can simply tell the ORM to ``save'' an object, and the ORM will figure out the required SQL to update the database on the fly. In doing so, the ORM will take care of any required escaping, thus guarding against SQL injection.

There is a wide variety of ORM choices for most programming languages. Some of the most popular are Hibernate, for Java, Propel, for PHP and SQLAlchemy, for Python.

Learning more

There is a lot more to learn about SQL injection than one might expect. This brief article has only slightly scratched the surface of a veritable security iceberg. The following links are useful starting points for those who wish to learn more: