I am writing a DAO layer IN Java for my Tomcat server application,
I wish to use Prepared Statement wrapping my queries (1. parsing queries once, 2. defend against SQL injections),
My db design contains a MyISAM table per data source system. And most of the queries through DBO are selects using different table names as arguments.
Some of this tables may be created on the fly.
I already went though many posts that explain that i may not use table name as an argument for Prepared statement.
I have found solutions that suggest to use some type of function (e.g. mysql_real_escape_string) that may process this argument and append the result as a string to the query,
Is there any built in Jave library function that may do it in the best optimized way, or may be you may suggest to do something else in the DAO layer (i do not prefer to add any routines to the DB it self)?
Are you able to apply restrictions to the table names? That may well be easier than quoting. For example, if you could say that all table names had to match a regex of [0-9A-Za-z_]+ then I don't think you'd need any quoting. If you need spaces, you could probably get away with always using `table name` - but again, without worrying about "full" quoting.
Restricting what's available is often a lot simpler than handling all the possibilities :)
If you want to be extra safe than you can prepare a query and call it with supplied table name to check if it really exists:
PreparedStatement ps = conn.prepareStatement("SHOW TABLES WHERE tables = ?");
ps.setString(1, nameToCheck);
if(!ps.executeQuery().next())
throw new RuntimeException("Illegal table name: " + nameToCheck);
(The WHERE condition might need some correction because I don't have mysql under my fingers at the moment).
Related
I have a requirement to perform a scheduled dump of a SQL query from a web application. Initially it was an entire table (only the table name was configurable), but then the addition of a configurable WHERE clause was raised, along with a subset of columns.
The configurable options now required are:
columns
table name
where clause
At this point, it might as well just be the entire query, right?!
I know that SQLi can be mitigated somewhat by java.sql.PreparedStatement, but as far as I can tell, that relies on knowing the columns and datatypes at compile time.
The configurable items will not be exposed to end users. They will sit in a properties file within WEB-INF/classes, so the user's I am defending from here are sysadmins that are not as good as they think they are.
Am I being over cautious here?
If nothing else, can java.sql.PreparedStatement prevent multiple queries from being executed if, say, the WHERE clause was Robert'); DROP TABLE students;--?
A prepared statement will not handle this for you. With a prepared statement you can only safely add parameters to your query, not table names, column names or entire where clauses.
Especially the latter makes it virtually impossible to prevent injection if there are no constraints whatsoever. Column and table name parameters could be checked against a list of valid values either statically defined or dynamically based on you database structure. You could do some basic regex checking on the where parameter, but that will only really help against obvious SQL injection.
With the flexiblity you intend to offer in the form of SELECT FROM WHERE you could have queries like this:
SELECT mycolumn FROM mytable WHERE id = 1 AND 'username' in (SELECT username FROM users)
You could look at something like JOOQ to offer safe dynamic query building while still being able to constrain the things your users are allowed to query for.
Constraining your users in one way or another is key here. Not doing that means you have to worry not just about SQL injection, but also about performance issues for instance. Provide them with a visual (drag-and-drop) query builder for instance.
"It all depends".
If you have an application where users can type in the where clause as free text, then yes, they can construct SQL Injection attacks. They can also grind your server to a halt by selecting huge cartesian joins.
You could create a visual query builder - use the schema metadata to show a list of tables, and once the table is selected the columns, and for each column the valid comparisons. You can then construct the query as a parameterized query, and limit the human input to the comparison values, which you can in turn use as parameters.
It's a lot of work, though, and in most production systems of any scale, letting users run this kind of query is usually not particularly useful...
It's insecure to allow users to execute arbitrary queries. This is the kind of thing you'd see at Equifax. You don't want to allow it.
Prepared statements don't help make SQL expressions safe. Using parameters in prepared statements help make values safe. You can use a parameter only in the place where you would normally put a constant value, like a number, a quoted string, or a quoted date.
The easiest solution would be to NOT allow arbitrary queries or expressions on demand.
Instead, allow users to submit their custom query for review.
The query is reviewed by a human being, who may authorize the stored query to be run by the user (or other users). If you think you can develop some kind of automatic validator, be my guest, but IMHO that's bound to be a lot more work than just having a qualified database administrator review it.
Subsequently, the user is allowed to run the stored query on demand, but only by its id.
Here's another alternative idea: users who want to run custom queries can apply to get a replica of the database, to host on their own computer. They will get a dump of the subset of data they are authorized to view. Then if they run queries that trash the data, or melt their computer, that's their business.
What is the safe way how to put table name as parameter into SQL query? You cannot put table name as parameter using PreparedStatement. Concatenating string to execute query with dynamic table name using Statement is possible, however it is not recommended because of risk of SQL injection. What is the best approach to do this?
The best way would be:
To put your table name between the characters used to delimit the name of the table which change from one database to another
And escape the provided table name accordingly such that SQL injection won't be possible anymore.
So for example in case of MySQL, the table name's delimiter is the backquote character and we escape it by simply doubling it.
If your query is SELECT foo from bar, you could rewrite your query as next:
String query = String.format("SELECT foo from `%s`", tableName.replace("`", "``"));
This way you inject the name of your table without taking the risk of seeing some malicious code being injected.
I would try to solve the design problem, so you don't have to set the table name dynamically. If this is not possible, I would go for a design where you manage a list of available tables and users pick one from there, BY ID, so you can retrieve the real table name from the chosen id and replace the table name placeholder with it, avoiding any chance of sql injection in the table name replacement.
There is a rationale behind allowing only actual parameters in dynamic JDBC queries: the parameters can come from the outside and could take any value, whereas the table and column names are static.
There can be use cases for parameterizing a table or a column name, mainly when different tables have almost same structure and due to the DRY principle you do not want to repeat several times the same query only changing the table (or column) name. But in that use case, the programmer has full control on the names that will substituted, and should carefully test that there is no typo in any of them => there is no possibility of SQL injection here, and it is safe to replace the table name in the query string.
That is quite different for a web application exposed on internet where a query will use what has been entered in a form field, because here anything could occur, including a semicolumn to terminate the original harmless query and forge a new harmfull one => SQL injection if you just concatenate strings instead of correctly building a parameterized query.
I cannot imagine a use case where the table name or a column name could be a string typed in a form field by a user, which would be the only reason to allow to parameterize them.
first of all, a preface: I'm writing a java class that creates temporary tables on a database using jdbc. I'm using JSE6 and Oracle 11XE as a test DB, but the class needs to be also DB2 compliant.
The temporary tables I'm trying to create come from a bigger one, and I do some filtering and aggregations on the data. The parameters I base my filtering on are decided by the user at runtime. One simplified example of what I'm trying to do is this:
CREATE TABLE temp_table AS (
SELECT
table1.department_id,
SUM(CASE WHEN table1.number_1 < &1 THEN table1.number_1 ELSE 0 END)) AS column1
FROM
table1
GROUP BY table1.department_id
)
My problem is that I need to specify parameters to filter the data, and I need to be sure they're properly escaped/localized/typed. This would be easy using a prepared statement, but I cannot use bind variables with DDL.
The temporary solution I resorted to is to alter the query String myself, writing the parameters in the correct place, but this means I now have to implement all the checks instead of relying on a PreparedStatement object to do it for me, on top of losing all the other benefits.
I investigated other solutions, but none of them convinced me so far:
I could first create an empty temp_table and then fill it with INSERT INTO temp_table(id, column1) (SELECT ...) but it seems I might incur in performance loss, so I'd like to stick to the CREATE temp_table AS
I thought about creating a temporary statement to hold the inner SELECT query, and have it generate a properly formatted/localized/etc. query string, but I haven't found any way to obtain the final query from it (and I read it's definitely not possible here). The only option I found for this case is to use DebuggableStatement, but I'm not sure I can include it in the project (also, it seems a quite inelegant way of solving my problem)
Another solution I'm thinking of, is to simply put the queries that create the temporary tables (for each of them I'd put the whole CREATE AS (SELECT...) on the database, inside a procedure, which I'll then be able to call using CallableStatement. this way I could avoid handling typization and still have good performances, at the price of a tighter coupling with the db (I'd have to be sure the procedures are there, or manage in java their addition/removal from the db)
So, my question is: are there better alternatives than the ones I could think of?
Is this supposed to be database agnostic, or are you targeting for only Oracle? You don't have to store PL/SQL in a stored procedure to use it; just build an anonymous PL/SQL block that does what you need, and execute it. The anonymous PL/SQL block can be built dynamically so that strongly typed variables are declared in the PL/SQL to hold your parameters, and then your java code sticks the values in. The type safety wouldn't be handled by Java since you're just building a string; it would be handled by Oracle when you execute the anonymous PL/SQL block.
Let's presume that you are writing an application for a retail store chain. So, you would design your object model such that you would define 'Store' as the core business object and lots of supporting objects. Let's say 'Store' looks like follows:
class Store implements Validatable{
int storeNo;
int storeName;
... etc....
}
So, your client tells you that you have to import store schedule from a excel sheet into the application and you would have to run a series of validations on 'em. For instance, 'StoreIsInSameCountry';'StoreIsValid'... etc. So, you would design a Rule interface for checking all business conditions. Something like this:
interface Rule T extends Validatable> {
public Error check(T value) throws Exception;
}
Now, here comes the question. I am uploading 2000 stores from this excel sheet. So, I would end up running each rule defined for a store that many times. If I were to have 4 rules = 8000 queries to the database, i.e, 16000 hits to the connection pool. For a simple check where I would just have to check whether the store exists or not, the query would be:
SELECT STORE_ATTRIB1, STORE_ATTRIB2... from STORE where STORE_ID = ?
That way I would obtain get my 'Store' object. When I don't get anything from the database, then that store doesn't exist. So, for such a simple check, I would have to hit the database 2000 times for 2000 stores.
Alternatively, I could just do:
SELECT STORE_ATTRIB1, STORE_ATTRIB2... from STORE where STORE_ID in (1,2,3..... )
This query would actually return much faster than doing the one above it 2000 times.
However, it doesn't go well with the design that a Rule can be run for a single store only.
I know using IN is not a suggested methodology. So, what do you think I should be doing? Should I go ahead and use IN here, coz it gives better performance in this scenario? Or should I change my design?
What would you do if you were in my shoes, and what is the best practice?
That way I would obtain get my 'Store' object from the database. When I don't get anything from the database, then that store doesn't exist. So, for such a simple check, I would have to hit the database 2000 times for 2000 stores.
This is what you should not do.
Create a temporary table, fill the table with your values and JOIN this table, like this:
SELECT STORE_ATTRIB1, STORE_ATTRIB2...
FROM temptable tt
JOIN STORE s
ON s.STORE_ID = t.id
or this:
SELECT STORE_ATTRIB1, STORE_ATTRIB2...
FROM STORE s
WHERE s.STORE_ID IN
(
SELECT id
FROM temptable tt
)
I know using IN is not a suggested methodology. So, what do you think I should be doing? Should I go ahead and use IN here, coz it gives better performance in this scenario? Or should I change my design?
IN filters duplicates out.
If you want each eligible row to be selected for each duplicate value in the list, use JOIN.
IN is in no way a "not suggested methology".
In fact, there was a time when some databases did not support IN queries effciently, that's why folk wisdom still advices against using it.
But if your store_id is indexed properly (and it most probably is, if it's a PRIMARY KEY which it looks like), then all modern versions of major databases (that is Oracle, SQL Server, MySQL and PostgreSQL) will use an efficient plan to perform this query.
See this article in my blog for performance details in SQL Server:
IN vs. JOIN vs. EXISTS
Note, that in a properly designed database, validation rules are also set-based.
I. e. you implement your validation rules as queries against the temptable.
However, to support legacy rules, you can select values from temptable row-by-agonizing-row, apply the rules, and delete values which did not pass validation.
SELECT store_id FROM store WHERE store_active = 1
or even
SELECT store_id FROM store
will tell you all the active stores in a single query. You can now conduct the other tests on stores you know to exist, and you've saved yourself 1,999 hits to the database.
If you've got relatively uncontested database access, and no time constraint on how long the whole thing is going to take then you've no real need to worry about hitting the connection pool over and over again. That's what it's designed for, after all!
I think it's more of a business question with parameter of how often does the client run the import, how long would it take for you to implement either of the solution, and how expensive is your time per hour.
If it's something that runs once in a while, a bit of bad performance is acceptable in my opinion, especially if you can get the job done quick using clean code.
...a Rule can be run for a single store only.
Managing business rules along with performance is a tricky task, so there is a library ("Persistence Layer") that does exactly that. You define rules, then execute a bulk of commands, then the library fetch from DB whatever the rules require in a single query (by using temp tables rather than 'IN') and then passes it to the rules.
There is an example of a validator in here.
Any advice on how to read auto-incrementing identity field assigned to newly created record from call through java.sql.Statement.executeUpdate?
I know how to do this in SQL for several DB platforms, but would like to know what database independent interfaces exist in java.sql to do this, and any input on people's experience with this across DB platforms.
The following snibblet of code should do ya':
PreparedStatement stmt = conn.prepareStatement(sql,
Statement.RETURN_GENERATED_KEYS);
// ...
ResultSet res = stmt.getGeneratedKeys();
while (res.next())
System.out.println("Generated key: " + res.getInt(1));
This is known to work on the following databases
Derby
MySQL
SQL Server
For databases where it doesn't work (HSQLDB, Oracle, PostgreSQL, etc), you will need to futz with database-specific tricks. For example, on PostgreSQL you would make a call to SELECT NEXTVAL(...) for the sequence in question.
Note that the parameters for executeUpdate(...) are analogous.
ResultSet keys = statement.getGeneratedKeys();
Later, just iterate over ResultSet.
I've always had to make a second call using query after the insert.
You could use an ORM like hibernate. I think it does this stuff for you.
#ScArcher2 : I agree, Hibernate needs to make a second call to get the newly generated identity UNLESS an advanced generator strategy is used (sequence, hilo...)
#ScArcher2
Making a second call is extremely dangerous. The process of INSERTing and selecting the resultant auto-generated keys must be atomic, otherwise you may receive inconsistent results on the key select. Consider two asynchronous INSERTs where they both complete before either has a chance to select the generated keys. Which process gets which list of keys? Most cross-database ORMs have to do annoying things like in-process thread locking in order to keep results deterministic. This is not something you want to do by hand, especially if you are using a database which does support atomic generated key retrieval (HSQLDB is the only one I know of which does not).