What is an SQL Injection
SQL injection is an attack against an application that uses an SQL database. The attack occurs if the attacker is able to insert (or inject) additional SQL commands into a running application. These additional commands can then change the application logic, retrieve hidden data, reveal the database structure, or even enable the attacker to gain control over the server running the database.
Any application that uses client-provided data to construct SQL queries is potentially vulnerable – basically every non-trivial application.
A few examples of SQL injection attack
Consider a web application that stores usernames and passwords to log-in users.
To validate a log-in attempt, all we need to do is to look-up an entry with given username and password. If we get a hit, the log-in attempt is valid, if the result set is empty, the attempt is invalid. (The example assumes the passwords are stored in plain-text which is bad practice, but this is a topic for another day. Moreover, even when passwords are stored correctly, the SQL injection can still be possible.)
Consider the following PHP (and SQL) code snippet that validates the login attempt.
$dbh = new PDO("mysql:host=localhost;dbname=mydb", "user", "password");
// read log-in credentials from the HTTP POST request
$username = $_POST["username"];
$password = $_POST["password"];
// INSECURE CODE: NEVER CONSTRUCT QUERIES THIS WAY
$stmt = $dbh->prepare("SELECT * FROM user WHERE username = '$username' AND password = '$password'");
$stmt->execute();
$user = $stmt->fetch();
The code above looks reasonable and sound: if $user
is null
, the log-in attempt was invalid, but if it is an actual object, the attempt must have been valid, since it returned an entry from the database. However, the code contains a fatal flaw. Since we constructed a dynamic SQL query by concatenating strings, we are vulnerable to SQL injection.
Suppose the client sets the username to john' OR '
and the password to an arbitrary value like a
. Assuming there is no such user (Mr. Boby tables aside), we expect the application to deny the login attempt. But let’s closely examine the SQL command that gets executed. Once the strings are concatenated to the query, we get the following SQL command:
SELECT * FROM user WHERE username = 'john' OR '' AND password = 'a'
Consider the WHERE
clause which contains three terms and two operators. The first term, username = 'john'
may be either true
or false
; it depends whether the database contains an entry where the username is john
. The second term, an empty string ''
in SQL always evaluates to false
. The third, password = 'a'
is like the first term, either true
or false
, it depends whether the database contains an entry where password is set to a
.
To combine the three terms we use the two operators, conjunction AND
and disjunction OR
. Since conjunction has precedence, we first combine the second and the third term. Because the second term is always false
, the entire conjunction will also be false
, regardless of the third term: false
is an annihilator for conjunction.
We are left with two terms, username = 'john'
and the constant false
, which we have to combine with disjunction. The end result now depends only on the the first term, since the constant false
is an identity for disjunction, i.e. it does not influence the result.
So in effect, the actual SQL query that gets executed against the database is the following.
SELECT * FROM user WHERE username = 'john'
This is quite horrifying: by appending string ' OR '
to a valid username, an attacker can completely circumvent password verification. This is one of many ways malicious SQL code can be injected into running applications. Here are a few more examples.
An SQL injection can be used to retrieve data that should otherwise be restricted. Imagine the following query where the user, a potential attacker, provides the $genre
parameter, for instance by manipulating a URL. The value of that parameter is then directly concatenated into the original SQL query. The additional attribute available
indicates whether a movie should be shown to users or not; movies that have it set to 1
should be shown to users, the ones that have it set to 0
, should not.
SELECT * FROM movie WHERE genre = '$genre' AND available = 1
A clever attacker might set the parameter to comedy'--
to obtain the list of all movies regardless of their availability. The resulting SQL query looks as follows.
SELECT * FROM movie WHERE genre = 'comedy'--' AND available = 1
By appending --
to the end of the parameter, the attacker renders the rest of original the SQL query as a comment.
Similarly, the attacker could set the parameter genre
to comedy' OR 1=1
to obtain movies from all genres regardless of their availability.
SELECT * FROM movie WHERE genre = 'comedy' OR 1=1--' AND available = 1
On pages where the application simply reflects the returned database rows to the user, the attack can be even more damaging. If the attacker sets the genre
parameter to ' UNION SELECT * from user--
your application just compromised personal data of all of your customers.
SELECT * FROM movie WHERE genre = '' UNION SELECT * from user--' AND available = 1
The last example is rather optimistic—in a typical SQL system both SELECT
commands must return the same number of columns and of the same type—an attack like that is nonetheless possible. It might require more work, since the attacker needs to know a bit about the database structure. But fear not, in a poorly written application, SQL injection can reveal that too. On most database systems the query SELECT * from information_schema.tables
returns information about tables and their columns. So a UNION
query like the following might help.
SELECT * FROM movie WHERE genre = '' UNION SELECT * from information_schema.tables--' AND available = 1
What went wrong and how to do it right
The thing that enables all such SQL injection attacks is string concatenation: because we concatenated untrusted user-provided data into a prepared SQL query, the nature of SQL queries could (and did) change.
Fortunately, a solution is straightforward. Instead of concatenating user-provided data into SQL queries directly, we have to use parameterized queries. These are offered by a dedicated API where we typically first create prepared statements and then bind in user-provided data.
So do not write code snippets like the following.
$dbh = new PDO("mysql:host=localhost;dbname=mydb", "user", "password");
// INSECURE CODE: NEVER CONSTRUCT QUERIES THIS WAY
$stmt = $dbh->prepare("SELECT * FROM user WHERE username = '$_POST[username]' AND password = '$_POST[password]'");
$stmt->execute();
$user = $stmt->fetch();
Instead, use appropriate API: in PHP we have PDO and MySQLi and there are similar APIs in other programming languages – every modern programming language has an API that allows creating prepared SQL statements and binding variables to those statements.
$dbh = new PDO("mysql:host=localhost;dbname=mydb", "user", "password");
$stmt = $dbh->prepare("SELECT * FROM user WHERE username = ? AND password = ?");
$stmt->bindValue(1, $_POST["username"]);
$stmt->bindValue(2, $_POST["password"]);
$stmt->execute();
$user = $stmt->fetch();
While this example is only 2 lines longer, it is completely safe from SQL injection. All the hard work is done by the API; the actual mechanics are programming language and database specific – the API could simply sanitize the untrusted inputs or it might compile the prepared statements so that potential injected commands are treated as data. In either case, injecting commands into SQL query becomes no longer possible.
Conclusion
SQL injection is an attack where an application takes an untrusted user input and concatenates it with predefined SQL query. If the user input contains additional SQL commands, the nature of the original SQL query may change and allow the attacker to execute actions that would otherwise be impossible.
The best defense against SQL injection is using a dedicated API to interact with the database; we listed PDO and MySQLi for PHP, but every modern programming language offers a similar API.
SQL injection is a rather common in web applications. On the list of 10 vulnerabilities that pose the most risk to web applications published by OWASP, it typically ranks at the top: first in 2010, 2013, 2017; third in 2021.
However, SQL is not limited to web applications and can happen anywhere an SQL database is used, as the following xkcd example demonstrates.