Preventing SQL Injection in JDBC without using Prepared Statements

Preventing SQL Injection in JDBC without using Prepared Statements - java

I am aware that using Prepared Statements is the best way to protect against SQL Injection (and syntax errors due to unescaped characters in unchecked input).
My current situation is that I am writing some Java code to move data from one third party application to another. The destination application uses a proprietary version of Sybase and so whilst I do have the JTDS JDBC driver PreparedStatement fails, as the driver uses temporary stored procedures which aren't supported in this particular flavour of the database. So I only have Statement to work with and I have no control over the user input as it is coming from another application.
There is this similar question but that is focused on fixing the problem where you have a parameter such as a table which cannot be handled via a Prepared Statement. My case is different and hopefully simpler, since I have straightforward SQL statements. I would like to know if there is a best practice for replicating something like the following without using PreparedStatement:
PreparedStatement statement = connection.prepareStatement("UPDATE mytable SET value=? WHERE id=?");
statement.setInt(1, getID());
statement.setString(2,userInput);
statement.executeUpdate();
So I guess the problem is how can I sanitise the user input reliably? I can try to do that myself from scratch but this seems like a bad idea as there is likely to be at least one edge case I'd miss, so I was hoping there was a library out there that would do that for me, but I haven't been able to find one so far.

The ESAPI library has procedures for escaping input for SQL and for developing your own db specific encoders if necessary.

Check out JTDS FAQ - I'm pretty confident that with a combination of properties prepareSQL and maxStatements you could go there (or "could have gone" as you probably completed that task years ago :-) )

Related

Can you put multiple statements in one query-string in Oracle jdbc?

I have a JDBC connection to an Oracle database. I create a Statement. The SQL query String contains multiple statements separated by a semicolon, and is provided by a different system.
Example:
connection.prepareStatement("SELECT * FROM A; SELECT * FROM B");
According to ddimitrov it isn't possible.
But all other databases I've tried support it. And JDBC even has support to retrieve multiple results.
Does anyone have either pointers to Oracle documentation explicitly stating that it is not supported or have a way to make it work (without using of stored procedures)?

For executing multiple statements:
JDBC 2.0 lets you submit multiple statements at one time with the addBatch method
See here.

No, this is not possible with the Oracle JDBC driver.
You will have to parse and split the string into their individual statements.
Btw: I think the only databases that allow this are Microsoft SQL Server and MySQL. Which also makes them vulnerable to certain kind of SQL injection attacks that would not work Oracle or PostgreSQL.

AFAIK most databases only allow you to execute / prepare one statement per execute or prepare call. Although not very explicitly expressed, the intent of the JDBC methods is to execute a single SQL statement:
sql - **an** SQL statement that may [...]
The retrieval of multiple resultsets is for (very rare) single(!) statements or stored procedures which return multiple resultsets (as explained in the javadoc of Statement#execute).

What does pre-compiling a JDBC PreparedStatement do?

What does "precompiling" a statement do, because I have seen that
if I write a prepared statement with a bad SQL syntax that compilation does not report
any problem!
So if precompiling a prepared statement doesn't check for syntax validity what really does it do?

Creating a PreparedStatements may or may not involve SQL syntax validation or even DB server roundtrips, that depends entirely on the JDBC driver used. Some drivers will do a roundtrip or validate, others will not.
So on some JDBC drivers a PreparedStatement is no more "prepared" than a normal Statement. (In other words: with some JDBC drivers a PreparedStatement represents a server-side resource (similar to Connection), while on others it's a pure client-side construct).
An important difference, however is that a PreparedStatement will help you handle dynamic parameter values in a way that is guaranteed to avoid any escaping or formatting issues that you would have if you try to insert the values into the SQL statement string manually and execute it using a normal Statement.
That feature is indepdendent from the choice of "preparing" the statement beforehand or not, so it's provided by every JDBC driver, even if it doesn't do any other preparation steps.

'statement compilation' is something that happens on the database which helps it return results faster and more efficiently. If you use a PreparedStatement then it allows the database to reuse a statement which was already compiled and saves having to do it again. Bad SQL will likely result in a badly compiles database statement, but not always.

Preventing SQL injection without prepared statements (JDBC)

I have a database log appender that inserts a variable number of log lines into the database every once in a while.
I'd like to create an SQL statement in a way that prevents SQL injection, but not using server-side prepared statements (because I have a variable number of rows in every select, caching them won't help but might hurt performance here).
I also like the convenience of prepared statments, and prefer them to string concatination. Is there something like a 'client side prepared statement' ?

It sounds like you haven't benchmarked the simplest solution - prepared statements. You say that they "might hurt performance" but until you've tested it, you really won't know.
I would definitely test prepared statements first. Even if they do hamper performance slightly, until you've tested them you won't know whether you can still achieve the performance you require.
Why spend time trying to find alternative solutions when you haven't tried the most obvious one?
If you find that prepared statement execution plan caching is costly, you may well find there are DB-specific ways of tuning or disabling it.

Not sure if I understand your question correctly. Is there something in PreparedStatement that isn't fitting your needs?
I think that whether or not the statement is cached on the server side is an implementation detail of the database driver and the specific database you're using; if your query/statement changes over time than this should have no impact - the cached/compiled statements simply won't be used.

First, Jon's answer that you should go with the most obvious solution until performance is measured to be a problem is certainly the right approach in general.
I don't think your performance concerns are misplaced. I have certainly seen precompiled complex statements fail dramatically on the performance scale (on MS-SQL 2000). The reason is the statement was so complex that it had several potential execution paths depending on the parameters, but the compilation locked one in for one set of parameters, and the next set of parameters were too slow, whereas a recompile would force a recalculation of the execution plan more appropriate for the different set of parameters.
But that concern is very far fetched until you see it in practice.
The underlying problem here is that parameter escaping is database specific, so unless the JDBC driver for your database is giving you something non-standard to do this (highly unlikely), you are going to have to have a different library or different escaping mechanism that is very specific to this one database.
From the wording of your question, it doesn't sound like your performance concerns have yet come to the point of meriting finding (or developing) such a solution.
It should also be noted that although JDBC drivers may not all behave this way, technically according to the spec the precompilation is supposed to be cached in the PreparedStatement object, and if you throw that away and get a new PreparedStatement every time, it should not actually be caching anything, so the whole issue may be mute and would need to be investigated for your specific JDBC driver.
From the spec:
A SQL statement with or without IN parameters can be pre-compiled and stored in a PreparedStatement object. This object can then be used to efficiently execute this statement multiple times.

what's wrong with using a regular prepared statement e.g. in the following pseudocode:
DatabaseConnection connection;
PreparedStatement insertStatement = ...;
...
connection.beginTransaction();
for (Item item : items)
{
insertStatement.setParameter(1, item);
insertStatement.execute();
}
connection.commitTransaction();
A smart database implementation will batch up several inserts into one communications exchange w/ the database server.

I can't think of a reason why you shouldn't use prepared statements. If you're running this on a J2EE server using connection pooling the server keeps your connections open, and the server caches your access/execution plans. It's not the data it caches!
If you're closing your connection every time, then you're probably not gaining any performance. But you still get the SQL injection prevention
Most java performance tuning books will tell you the same:
Java performance tuning

Prepared Statements don't care about client or server side.
Use them and drop any SQL string concatenation. There is not a single reason to not use Prepared Statements.

JDBC generation of SQL in PreparedStatement

I had a really huge problem recently which took me a lot of time to debug. I have an update statement which updates 32 columns in table. I did that with PreparedStatement. Accidentaly I deleted one setParameter() call so update could not be finished successfully.
I got exception from JDBC (Apache Derby) telling: "At leas one parameter is not initialized" and was not able to figure out which parameter is not set since driver would not tell you nothing about name or ordinal number of at least first parameter which is not set...
I was googleing unsuccessfully for some utility which will produce plain old SQL out of (nearly-finished) prepared statement. It would help a lot in situations like this one, since I will be able to see what is not set.
Have anyone faced this problem? Got any solution?

Have a look at P6Spy. It can intercept all your JDBC calls and log them before forwarding them onto your database.
Alternatively, think about using Springs JDBCTemplate which can take out alot of your boilerplate JDBC coding and help avoid these kind of mistakes. You don't need the rest of the Spring framework to use this bit.

Since the parameters in a prepared statement are just a List or Map in the PreparedStatement Object you should be able to inspect the values.
Also you could write a very simple Wrapper around you jdbc driver that creates wrapped PreparedStatements and logs all parameters and there settings before actually executing the statement.

PreparedStatements and performance

So I keep hearing that PreparedStatements are good for performance.
We have a Java application in which we use the regular 'Statement' more than we use the 'PreparedStatement'. While trying to move towards using more PreparedStatements, I am trying to get a more thorough understanding of how PreparedStatements work - on the client side and the server side.
So if we have some typical CRUD operations and update an object repeatedly in the application, does it help to use a PS? I understand that we will have to close the PS every time otherwise it will result in a cursor leak.
So how does it help with performance? Does the driver cache the precompiled statement and give me a copy the next time I do connection.prepareStatement? Or does the DB server help?
I understand the argument about the security benefits of PreparedStatements and I appreciate the answers below which emphasize it. However I really want to keep this discussion focused on the performance benefits of PreparedStatements.
Update: When I say update data, I really mean more in terms of that method randomly being called several times. I understand the advantage in the answer offered below which asks to re-use the statement inside a loop.
// some code blah blah
update();
// some more code blah blah
update();
....
public void update () throws SQLException{
try{
PreparedStatement ps = connection.prepareStatement("some sql");
ps.setString(1, "foobar1");
ps.setString(2, "foobar2");
ps.execute();
}finally {
ps.close();
}
}
There is no way to actually reuse the 'ps' java object and I understand that the actual connection.prepareStatement call is quite expensive.
Which is what brings me back to the original question. Is this "some sql" PreparedStatement still being cached and reused under the covers that I dont know about?
I should also mention that we support several databases.
Thanks in advance.

The notion that prepared statements are primarily about performance is something of a misconception, although it's quite a common one.
Another poster mentioned that he noted a speed improvement of about 20% in Oracle and SQL Server. I've noted a similar figure with MySQL. It turns out that parsing the query just isn't such a significant part of the work involved. On a very busy database system, it's also not clear that query parsing will affect overall throughput: overall, it'll probably just be using up CPU time that would otherwise be idle while data was coming back from the disk.
So as a reason for using prepared statements, the protection against SQL injection attacks far outweighs the performance improvement. And if you're not worried about SQL injection attacks, you probably should be...

Prepared statements can improve performance when re-using the same statement that you prepared:
PreparedStatement ps = connection.prepare("SOME SQL");
for (Data data : dataList) {
ps.setInt(1, data.getId());
ps.setString(2, data.getValue();
ps.executeUpdate();
}
ps.close();
This is much faster than creating the statement in the loop.
Some platforms also cache prepared statements so that even if you close them they can be reconstructed more quickly.
However even if the performance were identical you should still use prepared statements to prevent SQL Injection. At my company this is an interview question; get it wrong and we might not hire you.

Prepared statements are indeed cached after their first use, which is what they provide in performance over standard statements. If your statement doesn't change then it's advised to use this method. They are generally stored within a statement cache for alter use.
More info can be found here:
http://www.theserverside.com/tt/articles/article.tss?l=Prepared-Statments
and you might want to look at Spring JDBCTemplate as an alternative to using JDBC directly.
http://static.springframework.org/spring/docs/2.0.x/reference/jdbc.html

Parsing the SQL isn't the only thing that's going on. There's validating that the tables and columns do indeed exist, creating a query plan, etc. You pay that once with a PreparedStatement.
Binding to guard against SQL injection is a very good thing, indeed. Not sufficient, IMO. You still should validate input prior to getting to the persistence layer.

So how does it help with performance? Does the driver cache the
precompiled statement and give me a copy the next time I do
connection.prepareStatement? Or does the DB server help?
I will answer in terms of performance. Others here have already stipulated that PreparedStatements are resilient to SQL injection (blessed advantage).
The application (JDBC Driver) creates the PreparedStatement and passes it to the RDBMS with placeholders (the ?). The RDBMS precompiles, applying query optimization (if needed) of the received PreparedStatement and (in some) generally caches them. During execution of the PreparedStatement, the precompiled PreparedStatement is used, replacing each placeholders with their relevant values and calculated. This is in contrast to Statement which compiles it and executes it directly, the PreparedStatement compiles and optimizes the query only once. Now, this scenario explained above is not an absolute case by ALL JDBC vendors but in essence that's how PreparedStatement are used and operated on.

Anecdotally: I did some experiments with prepared vs. dynamic statements using ODBC in Java 1.4 some years ago, with both Oracle and SQL Server back-ends. I found that prepared statements could be as much as 20% faster for certain queries, but there were vendor-specific differences regarding which queries were improved to what extent. (This should not be surprising, really.)
The bottom line is that if you will be re-using the same query repeatedly, prepared statements may help improve performance; but if your performance is bad enough that you need to do something about it immediately, don't count on the use of prepared statements to give you a radical boost. (20% is usually nothing to write home about.)
Your mileage may vary, of course.

Which is what brings me back to the original question. Is this "some sql" PreparedStatement still being cached and reused under the covers that I dont know about?
Yes at least with Oracle. Per Oracle® Database JDBC Developer's Guide Implicit Statement Caching (emphasis added),
When you enable implicit Statement caching, JDBC automatically caches the prepared or callable statement when you call the close method of this statement object. The prepared and callable statements are cached and retrieved using standard connection object and statement object methods.
Plain statements are not implicitly cached, because implicit Statement caching uses a SQL string as a key and plain statements are created without a SQL string. Therefore, implicit Statement caching applies only to the OraclePreparedStatement and OracleCallableStatement objects, which are created with a SQL string. You cannot use implicit Statement caching with OracleStatement. When you create an OraclePreparedStatement or OracleCallableStatement, the JDBC driver automatically searches the cache for a matching statement.

1. PreparedStatement allows you to write dynamic and parametric query
By using PreparedStatement in Java you can write parametrized sql queries and send different parameters by using same sql queries which is lot better than creating different queries.
2. PreparedStatement is faster than Statement in Java
One of the major benefits of using PreparedStatement is better performance. PreparedStatement gets pre compiled
In database and there access plan is also cached in database, which allows database to execute parametric query written using prepared statement much faster than normal query because it has less work to do. You should always try to use PreparedStatement in production JDBC code to reduce load on database. In order to get performance benefit its worth noting to use only parametrized version of sql query and not with string concatenation
3. PreparedStatement prevents SQL Injection attacks in Java
Read more: http://javarevisited.blogspot.com/2012/03/why-use-preparedstatement-in-java-jdbc.html#ixzz3LejuMnVL

Short answer:
PreparedStatement helps performance because typically DB clients perform the same query repetitively, and this makes it possible to do some pre-processing for the initial query to speed up the following repetitive queries.
Long answer:
According to Wikipedia, the typical workflow of using a prepared statement is as follows:
Prepare: The statement template is created by the application and sent
to the database management system (DBMS). Certain values are left
unspecified, called parameters, placeholders or bind variables
(labelled "?" below): INSERT INTO PRODUCT (name, price) VALUES (?, ?)
(Pre-compilation): The DBMS parses, compiles, and performs query optimization on the
statement template, and stores the result without executing it.
Execute: At a later time, the application supplies (or binds) values
for the parameters, and the DBMS executes the statement (possibly
returning a result). The application may execute the statement as many
times as it wants with different values. In this example, it might
supply 'Bread' for the first parameter and '1.00' for the second
parameter.
Prepare:
In JDBC, the "Prepare" step is done by calling java.sql.Connection.prepareStatement(String sql) API. According to its Javadoc:
This method is optimized for handling parametric SQL statements that benefit from precompilation. If the driver supports precompilation, the method prepareStatement will send the statement to the database for precompilation. Some drivers may not support precompilation. In this case, the statement may not be sent to the database until the PreparedStatement object is executed. This has no direct effect on users; however, it does affect which methods throw certain SQLException objects.
Since calling this API may send the SQL statement to database, it is an expensive call typically. Depending on JDBC driver's implementation, if you have the same sql statement template, for better performance, you may have to avoiding calling this API multiple times in client side for the same sql statement template.
Precompilation:
The sent statement template will be pre-compiled on database and cached in db server. The database will probably use the connection and sql statement template as the key, and the pre-compiled query and the computed query plan as value in the cache. Parsing query may need to validate table, columns to be queried, so it could be an expensive operation, and computation of query plan is an expensive operation too.
Execute:
For following queries from the same connection and sql statement template, the pre-compiled query and query plan will be looked up directly from cache by database server without re-computation again.
Conclusion:
From performance perspective, using prepare statement is a two-phase process:
Phase 1, prepare-and-precompilation, this phase is expected to be
done once and add some overhead for the performance.
Phase 2,
repeated executions of the same query, since phase 1 has some pre
processing for the query, if the number of repeating query is large
enough, this can save lots of pre-processing effort for the same
query.
And if you want to know more details, there are some articles explaining the benefits of PrepareStatement:
http://javarevisited.blogspot.com/2012/03/why-use-preparedstatement-in-java-jdbc.html
http://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html

Prepared statements have some advantages in terms of performance with respect to normal statements, depending on how you use them. As someone stated before, if you need to execute the same query multiple times with different parameters, you can reuse the prepared statement and pass only the new parameter set. The performance improvement depends on the specific driver and database you are using.
As instance, in terms of database performance, Oracle database caches the execution plan of some queries after each computation (this is not true for all versions and all configuration of Oracle). You can find improvements even if you close a statement and open a new one, because this is done at RDBMS level. This kind of caching is activated only if the two subsequent queries are (char-by-char) the same. This does not holds for normal statements because the parameters are part of the query and produce different SQL strings.
Some other RDBMS can be more "intelligent", but I don't expect they will use complex pattern matching algorithms for caching the execution plans because it would lower performance. You may argue that the computation of the execution plan is only a small part of the query execution. For the general case, I agree, but.. it depends. Keep in mind that, usually, computing an execution plan can be an expensive task, because the rdbms needs to consult off-memory data like statistics (not only Oracle).
However, the argument about caching range from execution-plans to other parts of the extraction process. Giving to the RDBMS multiple times the same query (without going in depth for a particular implementation) helps identifying already computed structures at JDBC (driver) or RDBMS level. If you don't find any particular advantage in performance now, you can't exclude that performance improvement will be implemented in future/alternative versions of the driver/rdbms.
Performance improvements for updates can be obtained by using prepared statements in batch-mode but this is another story.

Ok finally there is a paper that tests this, and the conclusion is that it doesn't improve performance, and in some cases its slower:
https://ieeexplore.ieee.org/document/9854303
PDF: https://www.bib.irb.hr/1205158/download/1205158.Performance_analysis_of_SQL_Prepared_Statements_in_CRUD_operations_final.pdf

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.