Sunday, June 12, 2011

Watch Your Step!

(Note: if you are familiar with SQL injection vulnerabilities, you may want to skip to the end of this post, which mentions some new research that could eliminate such errors before they are made.)


This classic comic from xkcd demonstrates one of the most common and dangerous type of security vulnerabilities in web applications, called SQL injection.  This is a way in which an attacker can cause the application to execute unintended and unanticipated (by the application operator) actions on the database.  In this example, the action is to remove the database table called "Students," thus removing all student information.

The application server executes some code in response to receiving information submitted from a web form.  I will show the example in Java, but similar code is written in other languages, including PHP, Perl, and .NET.  The application sends a query to the database; in this example, it tries to get student information.  The query is typically coded in SQL, and may have a form similar to this:

SELECT * FROM Students WHERE NAME = 'name'
The name itself, shown above in italics, is of course different for each student; the value of this field is taken from the web form.  The code that generates this query in Java looks like this:
String query = "SELECT * FROM Students WHERE NAME = '" + request.getParameter("name") + "'";
This builds the query string from three parts: two constant strings, and the call request.getParameter("name"), which returns the "name" field of the web form.  Unfortunately, if you substitute little Bobby Tables' name into this computation, you will end up with the following SQL statement:
SELECT * FROM Students WHERE NAME = 'Robert';
DROP TABLE Students;
--'
(Bobby's full name is shown in red; note that the final single quote comes from the second constant string in the original computation.)  When this string is sent to the database for execution, it will be interpreted as three different commands, shown above on three separate lines.  The first will indeed return the information about Bobby, but the second will remove the Students table.  The third consists only of the three characters --' ; the first two dashes indicate that this is a comment and the rest of the line (the trailing quote) is to be ignored.

This is not funny!  All too many web applications are written in this way, and are therefore vulnerable to SQL injection attacks.  In OWASP's list of top 10 vulnerabilities, injection vulnerabilities appear as the first item, tagged as being common, easy to exploit, and having severe impact.

How can such problems be avoided?  Obviously, any user-supplied information, including the contents of web forms, must be validated or sanitized.  Validation means blocking any potentially-dangerous input (such as anything that includes quotes), while sanitation means modifying such inputs to a non-harmful form.  This is easy enough to do; any reasonable web-development framework will supply appropriate validation and sanitization functions.  However, the developer must remember to call these functions on each and every user input; any omission creates a security vulnerability, and a systematic attacker will discover it eventually.

An alternative solution is to use prepared statements to communicate with the database.  Java, like other server-side programming languages, offers more than one interface to the database.  A prepared statement uses a fixed SQL statement, which includes question marks instead of actual parameter values.  Bobby's example would look like this:
SELECT * FROM Students WHERE NAME = ?
When a prepared statement is used to invoke a database operation, all parameters identified by question marks are given their actual values by invoking type-specific methods (such as setString or setInt) on the prepared statement object.  These take care of sanitization internally, so that the developer doesn't need to worry about it.  A prepared statement object can therefore be used multiple times with the same SQL query but with different arguments.  Another advantage of prepared statements it that the query string they receive must obey SQL syntax rules (with the addition of the question marks), and this can be checked when the prepared statement is initialized with the query, and the query can also be optimized once instead of having to be optimized for each call.

However, there is a serious problem in the approach taken by all these languages that require the construction of SQL queries as strings.  This gives too much freedom to the developer, allowing the creation of security and other errors that will not be discovered until the application is actually run.  Certainly, if you use a statically-typed language like Java, you should also be able to express the syntax of your database queries in a way that can be checked for correctness at compile time rather than at runtime.  This calls for a syntactic extension of the programming language, which will support SQL statements with parameter indicators such as the question marks of prepared statements.  This will have the advantage of catching syntactic erros at compile time, as well as the possibility of adding sanitation automatically, as is done for prepared statements.

Current web-development languages make it difficult for developers to do the right thing, necessitating special tools for discovering security vulnerabilities and fixing them.  Why not provide the ability to prevent them in the first place?

This is actually possible.  In their paper "Simple and safe SQL queries with C++ templates" (Science of Computer Programming 75 (2010)), Joseph Gil and Keren Lenz describe a system that uses C++ template metaprobramming techniques to represent relational algebra expressions inside the C++ language.  (Relational algebra is an alternative to SQL with a more mathematical syntax.)  The compiler can check the expressions written this way for syntactic correctness, and when the SQL query is actually built for sending to the database, all string inputs are sanitized so that they can't modify the syntax and create injection attacks.  Bobby's example can be written in this system this way:
(Students / (NAME == name)).asSQL()
The parenthesized expression is the relational-algebra form of the query, and the asSQL() method converts it to a string that can be sent to the database.  This syntax uses C++ operator overloading, and is therefore inapplicable to Java and similar languages.  It is possible to extend these with similar capabilities using more cumbersome syntax, but the right thing to do would be to make SQL (better yet, relational algebra) an integral part of the language.  Anyone who has ambitions to develop the next server-side programming language should be aware of this research!