Cannot use bind variables in a DDL statement. Alternatives? - java

first of all, a preface: I'm writing a java class that creates temporary tables on a database using jdbc. I'm using JSE6 and Oracle 11XE as a test DB, but the class needs to be also DB2 compliant.
The temporary tables I'm trying to create come from a bigger one, and I do some filtering and aggregations on the data. The parameters I base my filtering on are decided by the user at runtime. One simplified example of what I'm trying to do is this:
CREATE TABLE temp_table AS (
SELECT
table1.department_id,
SUM(CASE WHEN table1.number_1 < &1 THEN table1.number_1 ELSE 0 END)) AS column1
FROM
table1
GROUP BY table1.department_id
)
My problem is that I need to specify parameters to filter the data, and I need to be sure they're properly escaped/localized/typed. This would be easy using a prepared statement, but I cannot use bind variables with DDL.
The temporary solution I resorted to is to alter the query String myself, writing the parameters in the correct place, but this means I now have to implement all the checks instead of relying on a PreparedStatement object to do it for me, on top of losing all the other benefits.
I investigated other solutions, but none of them convinced me so far:
I could first create an empty temp_table and then fill it with INSERT INTO temp_table(id, column1) (SELECT ...) but it seems I might incur in performance loss, so I'd like to stick to the CREATE temp_table AS
I thought about creating a temporary statement to hold the inner SELECT query, and have it generate a properly formatted/localized/etc. query string, but I haven't found any way to obtain the final query from it (and I read it's definitely not possible here). The only option I found for this case is to use DebuggableStatement, but I'm not sure I can include it in the project (also, it seems a quite inelegant way of solving my problem)
Another solution I'm thinking of, is to simply put the queries that create the temporary tables (for each of them I'd put the whole CREATE AS (SELECT...) on the database, inside a procedure, which I'll then be able to call using CallableStatement. this way I could avoid handling typization and still have good performances, at the price of a tighter coupling with the db (I'd have to be sure the procedures are there, or manage in java their addition/removal from the db)
So, my question is: are there better alternatives than the ones I could think of?

Is this supposed to be database agnostic, or are you targeting for only Oracle? You don't have to store PL/SQL in a stored procedure to use it; just build an anonymous PL/SQL block that does what you need, and execute it. The anonymous PL/SQL block can be built dynamically so that strongly typed variables are declared in the PL/SQL to hold your parameters, and then your java code sticks the values in. The type safety wouldn't be handled by Java since you're just building a string; it would be handled by Oracle when you execute the anonymous PL/SQL block.

Related

Limiting SQL Injection when query is almost entirely configurable

I have a requirement to perform a scheduled dump of a SQL query from a web application. Initially it was an entire table (only the table name was configurable), but then the addition of a configurable WHERE clause was raised, along with a subset of columns.
The configurable options now required are:
columns
table name
where clause
At this point, it might as well just be the entire query, right?!
I know that SQLi can be mitigated somewhat by java.sql.PreparedStatement, but as far as I can tell, that relies on knowing the columns and datatypes at compile time.
The configurable items will not be exposed to end users. They will sit in a properties file within WEB-INF/classes, so the user's I am defending from here are sysadmins that are not as good as they think they are.
Am I being over cautious here?
If nothing else, can java.sql.PreparedStatement prevent multiple queries from being executed if, say, the WHERE clause was Robert'); DROP TABLE students;--?
A prepared statement will not handle this for you. With a prepared statement you can only safely add parameters to your query, not table names, column names or entire where clauses.
Especially the latter makes it virtually impossible to prevent injection if there are no constraints whatsoever. Column and table name parameters could be checked against a list of valid values either statically defined or dynamically based on you database structure. You could do some basic regex checking on the where parameter, but that will only really help against obvious SQL injection.
With the flexiblity you intend to offer in the form of SELECT FROM WHERE you could have queries like this:
SELECT mycolumn FROM mytable WHERE id = 1 AND 'username' in (SELECT username FROM users)
You could look at something like JOOQ to offer safe dynamic query building while still being able to constrain the things your users are allowed to query for.
Constraining your users in one way or another is key here. Not doing that means you have to worry not just about SQL injection, but also about performance issues for instance. Provide them with a visual (drag-and-drop) query builder for instance.
"It all depends".
If you have an application where users can type in the where clause as free text, then yes, they can construct SQL Injection attacks. They can also grind your server to a halt by selecting huge cartesian joins.
You could create a visual query builder - use the schema metadata to show a list of tables, and once the table is selected the columns, and for each column the valid comparisons. You can then construct the query as a parameterized query, and limit the human input to the comparison values, which you can in turn use as parameters.
It's a lot of work, though, and in most production systems of any scale, letting users run this kind of query is usually not particularly useful...
It's insecure to allow users to execute arbitrary queries. This is the kind of thing you'd see at Equifax. You don't want to allow it.
Prepared statements don't help make SQL expressions safe. Using parameters in prepared statements help make values safe. You can use a parameter only in the place where you would normally put a constant value, like a number, a quoted string, or a quoted date.
The easiest solution would be to NOT allow arbitrary queries or expressions on demand.
Instead, allow users to submit their custom query for review.
The query is reviewed by a human being, who may authorize the stored query to be run by the user (or other users). If you think you can develop some kind of automatic validator, be my guest, but IMHO that's bound to be a lot more work than just having a qualified database administrator review it.
Subsequently, the user is allowed to run the stored query on demand, but only by its id.
Here's another alternative idea: users who want to run custom queries can apply to get a replica of the database, to host on their own computer. They will get a dump of the subset of data they are authorized to view. Then if they run queries that trash the data, or melt their computer, that's their business.

Is there any use for views,triggers and stored procedures for a Java GUI project?

I am making a Java gui and web application which will use the same mysql database.
It's a DTh management system where all the information will be stored and retrieved dynamically depending on input.
I believe that views are static by nature and thus would be useless as all my queries will have a different where condition (userid).
Do I need to use triggers? I mean I could code the java to execute multiple statements instead of using a inbuilt trigger (e.g. Insert in customers name and family members name both will have a duplicate copy for head of the family). Is there a performance hit? Am I wrong in some way?
And same thing what is the use of stored procedures? Can't I use methods in java to do everything?
So, I am asking is it possible to shift all the calculation intensive stuff to java and web script instead of the sql. If yes, does this mean I only have to create the backend structure of Database(i.e. all the different tables and FK,PK) and do rest without using any sql stuff on mysql workbench?
Thank you for helping.
There is (as always) one correct answer: It depends.
If you only want to show and query some data, you probably won't need trigger or stored procedures.
Views are a different thing: They are pretty helpful if you want a static viesw to a join-table or something like that. If you don't need this, just don't use it.
Keys are really important. They make your data robust against wrong input.
What you shoud use is PrepearedStatement instead of Statement. If you only use PreparedStatements, you are (nearly ?) safe in the question of SQL-Injection.
We use Views because it just faster than select query and for just showing data (not edit-update) it is faster and preferable.
Trigger are fired at database side so it is faster because it just execute 2 or more queries in single execution.
Same in Stored procedures, because we can execute more than one queries in single database connection. If we execute different queries than it take more time on every execution for database connection (find database server, authenticate, find database,... etc.).

Can I pass table name as argument to a java prepared statement?

I am writing a DAO layer IN Java for my Tomcat server application,
I wish to use Prepared Statement wrapping my queries (1. parsing queries once, 2. defend against SQL injections),
My db design contains a MyISAM table per data source system. And most of the queries through DBO are selects using different table names as arguments.
Some of this tables may be created on the fly.
I already went though many posts that explain that i may not use table name as an argument for Prepared statement.
I have found solutions that suggest to use some type of function (e.g. mysql_real_escape_string) that may process this argument and append the result as a string to the query,
Is there any built in Jave library function that may do it in the best optimized way, or may be you may suggest to do something else in the DAO layer (i do not prefer to add any routines to the DB it self)?
Are you able to apply restrictions to the table names? That may well be easier than quoting. For example, if you could say that all table names had to match a regex of [0-9A-Za-z_]+ then I don't think you'd need any quoting. If you need spaces, you could probably get away with always using `table name` - but again, without worrying about "full" quoting.
Restricting what's available is often a lot simpler than handling all the possibilities :)
If you want to be extra safe than you can prepare a query and call it with supplied table name to check if it really exists:
PreparedStatement ps = conn.prepareStatement("SHOW TABLES WHERE tables = ?");
ps.setString(1, nameToCheck);
if(!ps.executeQuery().next())
throw new RuntimeException("Illegal table name: " + nameToCheck);
(The WHERE condition might need some correction because I don't have mysql under my fingers at the moment).

When to 'IN' and when not to?

Let's presume that you are writing an application for a retail store chain. So, you would design your object model such that you would define 'Store' as the core business object and lots of supporting objects. Let's say 'Store' looks like follows:
class Store implements Validatable{
int storeNo;
int storeName;
... etc....
}
So, your client tells you that you have to import store schedule from a excel sheet into the application and you would have to run a series of validations on 'em. For instance, 'StoreIsInSameCountry';'StoreIsValid'... etc. So, you would design a Rule interface for checking all business conditions. Something like this:
interface Rule T extends Validatable> {
public Error check(T value) throws Exception;
}
Now, here comes the question. I am uploading 2000 stores from this excel sheet. So, I would end up running each rule defined for a store that many times. If I were to have 4 rules = 8000 queries to the database, i.e, 16000 hits to the connection pool. For a simple check where I would just have to check whether the store exists or not, the query would be:
SELECT STORE_ATTRIB1, STORE_ATTRIB2... from STORE where STORE_ID = ?
That way I would obtain get my 'Store' object. When I don't get anything from the database, then that store doesn't exist. So, for such a simple check, I would have to hit the database 2000 times for 2000 stores.
Alternatively, I could just do:
SELECT STORE_ATTRIB1, STORE_ATTRIB2... from STORE where STORE_ID in (1,2,3..... )
This query would actually return much faster than doing the one above it 2000 times.
However, it doesn't go well with the design that a Rule can be run for a single store only.
I know using IN is not a suggested methodology. So, what do you think I should be doing? Should I go ahead and use IN here, coz it gives better performance in this scenario? Or should I change my design?
What would you do if you were in my shoes, and what is the best practice?
That way I would obtain get my 'Store' object from the database. When I don't get anything from the database, then that store doesn't exist. So, for such a simple check, I would have to hit the database 2000 times for 2000 stores.
This is what you should not do.
Create a temporary table, fill the table with your values and JOIN this table, like this:
SELECT STORE_ATTRIB1, STORE_ATTRIB2...
FROM temptable tt
JOIN STORE s
ON s.STORE_ID = t.id
or this:
SELECT STORE_ATTRIB1, STORE_ATTRIB2...
FROM STORE s
WHERE s.STORE_ID IN
(
SELECT id
FROM temptable tt
)
I know using IN is not a suggested methodology. So, what do you think I should be doing? Should I go ahead and use IN here, coz it gives better performance in this scenario? Or should I change my design?
IN filters duplicates out.
If you want each eligible row to be selected for each duplicate value in the list, use JOIN.
IN is in no way a "not suggested methology".
In fact, there was a time when some databases did not support IN queries effciently, that's why folk wisdom still advices against using it.
But if your store_id is indexed properly (and it most probably is, if it's a PRIMARY KEY which it looks like), then all modern versions of major databases (that is Oracle, SQL Server, MySQL and PostgreSQL) will use an efficient plan to perform this query.
See this article in my blog for performance details in SQL Server:
IN vs. JOIN vs. EXISTS
Note, that in a properly designed database, validation rules are also set-based.
I. e. you implement your validation rules as queries against the temptable.
However, to support legacy rules, you can select values from temptable row-by-agonizing-row, apply the rules, and delete values which did not pass validation.
SELECT store_id FROM store WHERE store_active = 1
or even
SELECT store_id FROM store
will tell you all the active stores in a single query. You can now conduct the other tests on stores you know to exist, and you've saved yourself 1,999 hits to the database.
If you've got relatively uncontested database access, and no time constraint on how long the whole thing is going to take then you've no real need to worry about hitting the connection pool over and over again. That's what it's designed for, after all!
I think it's more of a business question with parameter of how often does the client run the import, how long would it take for you to implement either of the solution, and how expensive is your time per hour.
If it's something that runs once in a while, a bit of bad performance is acceptable in my opinion, especially if you can get the job done quick using clean code.
...a Rule can be run for a single store only.
Managing business rules along with performance is a tricky task, so there is a library ("Persistence Layer") that does exactly that. You define rules, then execute a bulk of commands, then the library fetch from DB whatever the rules require in a single query (by using temp tables rather than 'IN') and then passes it to the rules.
There is an example of a validator in here.

Add description to columns using Java code

I can create a table and its columns in Java by using the statement:
CREATE TABLE table_name(column1 int, column2 double, etc...)
What I would like to do is to add descriptions to each of these columns with an appropriate statement, I found a stored procedure sp_addextendedproperty that looks like it can be used to accomplish this I just have no idea how to use it in java with jdbc.
Are you creating the table dynamically at runtime (e.g. as part of your application) - perhaps that's even user-driven? If that's the case, you already have that "documentation" (column comments) somewhere and I doubt the utility of adding them to SQL Server.
But if you're just trying to automate your build, take a look at LiquiBase. It's a pretty decent DB change management system that uses XML as backbone. It's written in java and integrates well with Hibernate (useful if you ever decide to use ORM instead of straight JDBC).
Update: If you do decide to go forward with calling stored procedure via JDBC, I would strongly recommend using CallableStatement to invoke it. Dynamically building SQL queries in the application should be avoided if possible.
There are a number of ways to call a stored procedure (essentially, preparing the statement and binding the variables, or sending a string of SQL), but the simplest is to just send rhe SQL statement
exec sp_addextendedproperty list, of, arguments, the, sp, needs;
Skipping your try/finally boilerplate, and assuming connection is a java.sql.Connection, that's:
connection
.createStatement()
.execute( "exec sp_addextendedproperty arguments;");
But ChssPly76 has a good point: doing this from Java isn't a good idea (unless you're developing some database manager in Java).

Categories

Resources