I have a requirement to perform a scheduled dump of a SQL query from a web application. Initially it was an entire table (only the table name was configurable), but then the addition of a configurable WHERE clause was raised, along with a subset of columns.
The configurable options now required are:
columns
table name
where clause
At this point, it might as well just be the entire query, right?!
I know that SQLi can be mitigated somewhat by java.sql.PreparedStatement, but as far as I can tell, that relies on knowing the columns and datatypes at compile time.
The configurable items will not be exposed to end users. They will sit in a properties file within WEB-INF/classes, so the user's I am defending from here are sysadmins that are not as good as they think they are.
Am I being over cautious here?
If nothing else, can java.sql.PreparedStatement prevent multiple queries from being executed if, say, the WHERE clause was Robert'); DROP TABLE students;--?
A prepared statement will not handle this for you. With a prepared statement you can only safely add parameters to your query, not table names, column names or entire where clauses.
Especially the latter makes it virtually impossible to prevent injection if there are no constraints whatsoever. Column and table name parameters could be checked against a list of valid values either statically defined or dynamically based on you database structure. You could do some basic regex checking on the where parameter, but that will only really help against obvious SQL injection.
With the flexiblity you intend to offer in the form of SELECT FROM WHERE you could have queries like this:
SELECT mycolumn FROM mytable WHERE id = 1 AND 'username' in (SELECT username FROM users)
You could look at something like JOOQ to offer safe dynamic query building while still being able to constrain the things your users are allowed to query for.
Constraining your users in one way or another is key here. Not doing that means you have to worry not just about SQL injection, but also about performance issues for instance. Provide them with a visual (drag-and-drop) query builder for instance.
"It all depends".
If you have an application where users can type in the where clause as free text, then yes, they can construct SQL Injection attacks. They can also grind your server to a halt by selecting huge cartesian joins.
You could create a visual query builder - use the schema metadata to show a list of tables, and once the table is selected the columns, and for each column the valid comparisons. You can then construct the query as a parameterized query, and limit the human input to the comparison values, which you can in turn use as parameters.
It's a lot of work, though, and in most production systems of any scale, letting users run this kind of query is usually not particularly useful...
It's insecure to allow users to execute arbitrary queries. This is the kind of thing you'd see at Equifax. You don't want to allow it.
Prepared statements don't help make SQL expressions safe. Using parameters in prepared statements help make values safe. You can use a parameter only in the place where you would normally put a constant value, like a number, a quoted string, or a quoted date.
The easiest solution would be to NOT allow arbitrary queries or expressions on demand.
Instead, allow users to submit their custom query for review.
The query is reviewed by a human being, who may authorize the stored query to be run by the user (or other users). If you think you can develop some kind of automatic validator, be my guest, but IMHO that's bound to be a lot more work than just having a qualified database administrator review it.
Subsequently, the user is allowed to run the stored query on demand, but only by its id.
Here's another alternative idea: users who want to run custom queries can apply to get a replica of the database, to host on their own computer. They will get a dump of the subset of data they are authorized to view. Then if they run queries that trash the data, or melt their computer, that's their business.
First of all, I know this is bad practice but regardless I'm still looking for an answer.
In our web application we have a textarea where the user can write SQL to bring in custom data sets and view them in a chart. The way this works is essentially taking the written string and executing it as a query. What I'm looking for is everything I need to implement in our application server back end security wise as to disallow the execution of queries that produce results other than SELECT type queries.
The user won't be able to execute any type of SELECT query he wants since the app server backend expends the returned result set to have 2 columns named X_FIELD and Y_FIELD so we're not so much worried about the user being able to view data as much as him executing SQL that will break the database.
What we thought of doing is parsing the string for keywords such as DROP, ALTER, CREATE etc. Are there specific things that we have to look out for? Is there a tool/library that automates this? We're using java for our back end code.
Filtering queries can be done at the application level but it requires much more database-specific expertise than creating separate security systems for each database.
As an example, I created an open source program that can do this for Oracle. It won't solve your problem but the code can at least help explain why this is a bad idea.
First, it's important to understand that Oracle SQL syntax is much more complicated than most programming languages, such as Java.
Oracle has 2175 keywords and almost none of them are reserved. Forget about parsing SQL - none of the existing 3rd party parsers are accurate enough to do this securely.
Luckily a full parser is not needed for this task. Oracle syntax is structured in such a way that any statement can be classified with only 8 tokens, excluding
whitespace and comments.
But building a tokenizer and a
statement classifier is still difficult. That solution will handle
unusual kinds of selects, such as (select * from dual) or with asdf as (select 1 a from dual) select a from asdf;. But even a SELECT statement can cause
changes to the database; either through PL/SQL hidden in a function or type, or locking rows through a for update.
And don't forget to remove the (sometimes optional) terminator. They work fine
in most IDEs, but they are not allowed in dynamic SQL. Don't just remove the last characters, or the last token, because some SELECT statements allow semicolons in the middle.
That's a lot of work for just one database! If you want to use this method to implement security policies you need almost 100% accuracy. Very few people are fanatical enough about any database to build this. There's no chance you can do this for multiple databases.
I am writing a DAO layer IN Java for my Tomcat server application,
I wish to use Prepared Statement wrapping my queries (1. parsing queries once, 2. defend against SQL injections),
My db design contains a MyISAM table per data source system. And most of the queries through DBO are selects using different table names as arguments.
Some of this tables may be created on the fly.
I already went though many posts that explain that i may not use table name as an argument for Prepared statement.
I have found solutions that suggest to use some type of function (e.g. mysql_real_escape_string) that may process this argument and append the result as a string to the query,
Is there any built in Jave library function that may do it in the best optimized way, or may be you may suggest to do something else in the DAO layer (i do not prefer to add any routines to the DB it self)?
Are you able to apply restrictions to the table names? That may well be easier than quoting. For example, if you could say that all table names had to match a regex of [0-9A-Za-z_]+ then I don't think you'd need any quoting. If you need spaces, you could probably get away with always using `table name` - but again, without worrying about "full" quoting.
Restricting what's available is often a lot simpler than handling all the possibilities :)
If you want to be extra safe than you can prepare a query and call it with supplied table name to check if it really exists:
PreparedStatement ps = conn.prepareStatement("SHOW TABLES WHERE tables = ?");
ps.setString(1, nameToCheck);
if(!ps.executeQuery().next())
throw new RuntimeException("Illegal table name: " + nameToCheck);
(The WHERE condition might need some correction because I don't have mysql under my fingers at the moment).
I have an old MySQL 4.1 database with a table that has a few millions rows and an old Java application that connects to this database and returns several thousand rows from this this table on a frequent basis via a simple SQL query (i.e. SELECT * FROM people WHERE first_name = 'Bob'. I think the Java application uses client side prepared statements but was looking at switching this to the server, and in the example mentioned the value for first_name will vary depending on what the user enters).
I would like to speed up performance on the select query and was wondering if I should switch to Prepared Statements or Stored Procedures. Is there a general rule of thumb of what is quicker/less resource intensive (or if a combination of both is better)
You do have an index of first_name, right? That will speed up your query a lot more than choosing between prepared statements and stored procedures.
If you have just one query to worry about, you should be able to implement the two alternatives (on your test platform of course!) and see which one gives you the best performance.
(My guess is that there won't be much difference though ...)
Looks like the best way is just to make the change and test it out in a test environment.
Thanks for the help.
So I keep hearing that PreparedStatements are good for performance.
We have a Java application in which we use the regular 'Statement' more than we use the 'PreparedStatement'. While trying to move towards using more PreparedStatements, I am trying to get a more thorough understanding of how PreparedStatements work - on the client side and the server side.
So if we have some typical CRUD operations and update an object repeatedly in the application, does it help to use a PS? I understand that we will have to close the PS every time otherwise it will result in a cursor leak.
So how does it help with performance? Does the driver cache the precompiled statement and give me a copy the next time I do connection.prepareStatement? Or does the DB server help?
I understand the argument about the security benefits of PreparedStatements and I appreciate the answers below which emphasize it. However I really want to keep this discussion focused on the performance benefits of PreparedStatements.
Update: When I say update data, I really mean more in terms of that method randomly being called several times. I understand the advantage in the answer offered below which asks to re-use the statement inside a loop.
// some code blah blah
update();
// some more code blah blah
update();
....
public void update () throws SQLException{
try{
PreparedStatement ps = connection.prepareStatement("some sql");
ps.setString(1, "foobar1");
ps.setString(2, "foobar2");
ps.execute();
}finally {
ps.close();
}
}
There is no way to actually reuse the 'ps' java object and I understand that the actual connection.prepareStatement call is quite expensive.
Which is what brings me back to the original question. Is this "some sql" PreparedStatement still being cached and reused under the covers that I dont know about?
I should also mention that we support several databases.
Thanks in advance.
The notion that prepared statements are primarily about performance is something of a misconception, although it's quite a common one.
Another poster mentioned that he noted a speed improvement of about 20% in Oracle and SQL Server. I've noted a similar figure with MySQL. It turns out that parsing the query just isn't such a significant part of the work involved. On a very busy database system, it's also not clear that query parsing will affect overall throughput: overall, it'll probably just be using up CPU time that would otherwise be idle while data was coming back from the disk.
So as a reason for using prepared statements, the protection against SQL injection attacks far outweighs the performance improvement. And if you're not worried about SQL injection attacks, you probably should be...
Prepared statements can improve performance when re-using the same statement that you prepared:
PreparedStatement ps = connection.prepare("SOME SQL");
for (Data data : dataList) {
ps.setInt(1, data.getId());
ps.setString(2, data.getValue();
ps.executeUpdate();
}
ps.close();
This is much faster than creating the statement in the loop.
Some platforms also cache prepared statements so that even if you close them they can be reconstructed more quickly.
However even if the performance were identical you should still use prepared statements to prevent SQL Injection. At my company this is an interview question; get it wrong and we might not hire you.
Prepared statements are indeed cached after their first use, which is what they provide in performance over standard statements. If your statement doesn't change then it's advised to use this method. They are generally stored within a statement cache for alter use.
More info can be found here:
http://www.theserverside.com/tt/articles/article.tss?l=Prepared-Statments
and you might want to look at Spring JDBCTemplate as an alternative to using JDBC directly.
http://static.springframework.org/spring/docs/2.0.x/reference/jdbc.html
Parsing the SQL isn't the only thing that's going on. There's validating that the tables and columns do indeed exist, creating a query plan, etc. You pay that once with a PreparedStatement.
Binding to guard against SQL injection is a very good thing, indeed. Not sufficient, IMO. You still should validate input prior to getting to the persistence layer.
So how does it help with performance? Does the driver cache the
precompiled statement and give me a copy the next time I do
connection.prepareStatement? Or does the DB server help?
I will answer in terms of performance. Others here have already stipulated that PreparedStatements are resilient to SQL injection (blessed advantage).
The application (JDBC Driver) creates the PreparedStatement and passes it to the RDBMS with placeholders (the ?). The RDBMS precompiles, applying query optimization (if needed) of the received PreparedStatement and (in some) generally caches them. During execution of the PreparedStatement, the precompiled PreparedStatement is used, replacing each placeholders with their relevant values and calculated. This is in contrast to Statement which compiles it and executes it directly, the PreparedStatement compiles and optimizes the query only once. Now, this scenario explained above is not an absolute case by ALL JDBC vendors but in essence that's how PreparedStatement are used and operated on.
Anecdotally: I did some experiments with prepared vs. dynamic statements using ODBC in Java 1.4 some years ago, with both Oracle and SQL Server back-ends. I found that prepared statements could be as much as 20% faster for certain queries, but there were vendor-specific differences regarding which queries were improved to what extent. (This should not be surprising, really.)
The bottom line is that if you will be re-using the same query repeatedly, prepared statements may help improve performance; but if your performance is bad enough that you need to do something about it immediately, don't count on the use of prepared statements to give you a radical boost. (20% is usually nothing to write home about.)
Your mileage may vary, of course.
Which is what brings me back to the original question. Is this "some sql" PreparedStatement still being cached and reused under the covers that I dont know about?
Yes at least with Oracle. Per Oracle® Database JDBC Developer's Guide Implicit Statement Caching (emphasis added),
When you enable implicit Statement caching, JDBC automatically caches the prepared or callable statement when you call the close method of this statement object. The prepared and callable statements are cached and retrieved using standard connection object and statement object methods.
Plain statements are not implicitly cached, because implicit Statement caching uses a SQL string as a key and plain statements are created without a SQL string. Therefore, implicit Statement caching applies only to the OraclePreparedStatement and OracleCallableStatement objects, which are created with a SQL string. You cannot use implicit Statement caching with OracleStatement. When you create an OraclePreparedStatement or OracleCallableStatement, the JDBC driver automatically searches the cache for a matching statement.
1. PreparedStatement allows you to write dynamic and parametric query
By using PreparedStatement in Java you can write parametrized sql queries and send different parameters by using same sql queries which is lot better than creating different queries.
2. PreparedStatement is faster than Statement in Java
One of the major benefits of using PreparedStatement is better performance. PreparedStatement gets pre compiled
In database and there access plan is also cached in database, which allows database to execute parametric query written using prepared statement much faster than normal query because it has less work to do. You should always try to use PreparedStatement in production JDBC code to reduce load on database. In order to get performance benefit its worth noting to use only parametrized version of sql query and not with string concatenation
3. PreparedStatement prevents SQL Injection attacks in Java
Read more: http://javarevisited.blogspot.com/2012/03/why-use-preparedstatement-in-java-jdbc.html#ixzz3LejuMnVL
Short answer:
PreparedStatement helps performance because typically DB clients perform the same query repetitively, and this makes it possible to do some pre-processing for the initial query to speed up the following repetitive queries.
Long answer:
According to Wikipedia, the typical workflow of using a prepared statement is as follows:
Prepare: The statement template is created by the application and sent
to the database management system (DBMS). Certain values are left
unspecified, called parameters, placeholders or bind variables
(labelled "?" below): INSERT INTO PRODUCT (name, price) VALUES (?, ?)
(Pre-compilation): The DBMS parses, compiles, and performs query optimization on the
statement template, and stores the result without executing it.
Execute: At a later time, the application supplies (or binds) values
for the parameters, and the DBMS executes the statement (possibly
returning a result). The application may execute the statement as many
times as it wants with different values. In this example, it might
supply 'Bread' for the first parameter and '1.00' for the second
parameter.
Prepare:
In JDBC, the "Prepare" step is done by calling java.sql.Connection.prepareStatement(String sql) API. According to its Javadoc:
This method is optimized for handling parametric SQL statements that benefit from precompilation. If the driver supports precompilation, the method prepareStatement will send the statement to the database for precompilation. Some drivers may not support precompilation. In this case, the statement may not be sent to the database until the PreparedStatement object is executed. This has no direct effect on users; however, it does affect which methods throw certain SQLException objects.
Since calling this API may send the SQL statement to database, it is an expensive call typically. Depending on JDBC driver's implementation, if you have the same sql statement template, for better performance, you may have to avoiding calling this API multiple times in client side for the same sql statement template.
Precompilation:
The sent statement template will be pre-compiled on database and cached in db server. The database will probably use the connection and sql statement template as the key, and the pre-compiled query and the computed query plan as value in the cache. Parsing query may need to validate table, columns to be queried, so it could be an expensive operation, and computation of query plan is an expensive operation too.
Execute:
For following queries from the same connection and sql statement template, the pre-compiled query and query plan will be looked up directly from cache by database server without re-computation again.
Conclusion:
From performance perspective, using prepare statement is a two-phase process:
Phase 1, prepare-and-precompilation, this phase is expected to be
done once and add some overhead for the performance.
Phase 2,
repeated executions of the same query, since phase 1 has some pre
processing for the query, if the number of repeating query is large
enough, this can save lots of pre-processing effort for the same
query.
And if you want to know more details, there are some articles explaining the benefits of PrepareStatement:
http://javarevisited.blogspot.com/2012/03/why-use-preparedstatement-in-java-jdbc.html
http://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html
Prepared statements have some advantages in terms of performance with respect to normal statements, depending on how you use them. As someone stated before, if you need to execute the same query multiple times with different parameters, you can reuse the prepared statement and pass only the new parameter set. The performance improvement depends on the specific driver and database you are using.
As instance, in terms of database performance, Oracle database caches the execution plan of some queries after each computation (this is not true for all versions and all configuration of Oracle). You can find improvements even if you close a statement and open a new one, because this is done at RDBMS level. This kind of caching is activated only if the two subsequent queries are (char-by-char) the same. This does not holds for normal statements because the parameters are part of the query and produce different SQL strings.
Some other RDBMS can be more "intelligent", but I don't expect they will use complex pattern matching algorithms for caching the execution plans because it would lower performance. You may argue that the computation of the execution plan is only a small part of the query execution. For the general case, I agree, but.. it depends. Keep in mind that, usually, computing an execution plan can be an expensive task, because the rdbms needs to consult off-memory data like statistics (not only Oracle).
However, the argument about caching range from execution-plans to other parts of the extraction process. Giving to the RDBMS multiple times the same query (without going in depth for a particular implementation) helps identifying already computed structures at JDBC (driver) or RDBMS level. If you don't find any particular advantage in performance now, you can't exclude that performance improvement will be implemented in future/alternative versions of the driver/rdbms.
Performance improvements for updates can be obtained by using prepared statements in batch-mode but this is another story.
Ok finally there is a paper that tests this, and the conclusion is that it doesn't improve performance, and in some cases its slower:
https://ieeexplore.ieee.org/document/9854303
PDF: https://www.bib.irb.hr/1205158/download/1205158.Performance_analysis_of_SQL_Prepared_Statements_in_CRUD_operations_final.pdf