As a newbie to Servlet programming, I think I may not have gotten something right here: I understand the concept of Java Beans and little ORM helper classes like org.apache.commons.dbutils.DbUtils. I can convert a ResultSet into an instance of my JavaBean-object with a ResultSetHandler and a BeanHandler. But isn't there any convenient way to do it the other way round, other than hardcoding the SQL string? Something like
QueryRunner run = new QueryRunner(datasource);
int result = run.update("UPDATE " + tableName + " SET " + [and now some Handler sets all the columns from the JavaBean]);
At least, I didn't find anything like that! Or did I get it wrong? Help appreciated.
You did not get it wrong, you will still need a hard-coded SQL string as shown in this answer. Sql2o also requires a hard-coded SQL string but it will let you bind a POJO which gets you half-way there, see here (bottom of the page).
I think you will always need a hard-coded SQL string of some form because these are JDBC helper libraries and not "object relational mappers". Before the insert is done it is not known which properties are auto-generated, have a default-value, are foreign keys, allow null-values, etc.. All this information is required to prepare a proper insert statement based on a POJO/JavaBean and that goes beyond the scope of the helper libraries. On the plus-side: specifying a SQL string is explicit (there is no magic behind the scenes) and keeps you in full control.
Related
I need to build a query in such a way as to prevent the possibility of an SQL injection attack.
I know of two ways to build a query.
String query = new StringBuilder("select * from tbl_names where name = '").append(name).append(';).toString();
String query = "select * from tbl_names where name = ? ";
In the first case, all I do is a connection.preparestatement(query)
In the second case I do something like:
PreparedStatement ps = connection.prepareStatement(query)
ps.setString(1,name);
I want to know what is the industry standard? Do you use the string append way to build the query and then prepare the statement or prepare the statement already and pass parameters later?
Your first fragment of code is unsafe and vulnerable to SQL injection. You should not use that form.
To make your first fragment safe, you would need to manually escape the value to prevent SQL injection. That is hard to do correctly, and choosing the wrong way of handling values could potentially reduce performance depending on the underlying database (eg some database systems will not use an index if you supply a string literal for an integer column).
The second fragment is the standard way. It protects you against SQL injection. Use this form.
Using a prepared statement with parameter placeholders is far simpler, and it also allows you to reuse the compiled statement with different sets of values. In addition, depending on the database, this can have additional performance advantages for reusing query plans across connections.
You could also use the [OWASP ESAPI library][1]. It includes validators, encoders and many other helpful things.
For example, you can do
ESAPI.encoder().encodeForSQL(Codec,input);
More codecs are under development. Currently, MySQL and Oracle are supported. One of those might be helpful in your case.
Background: I have started a project using JDBC and MYSQL to simulate a bookstore, all local. To connect to the database, I started out using a Statement but I began to read that when using a query multiple times that just changes its parameters, it can be more efficient to use a PreparedStatement for those queries. However, the thing advantage I read the most about was how PreparedStatements could prevent SQL injection much better.
Sources:
Answers on this thread here
Google
Professors
My Question:
How do PreparedStatements prevent SQL injection better, or even different for that matter, than Statements when dealing with parametrized queries? I am confused because, if I understand correctly, the values still get passed into the SQL statement that gets executed, it's just up to the the programmer to sanitize the inputs.
You're right that you could do all the sanitation yourself, and thus be safe from injection. But this is more error-prone, and thus less safe. In other words, doing it yourself introduces more chances for bugs that could lead to injection vulnerabilities.
One problem is that escaping rules could vary from DB to DB. For instance, standard SQL only allows string literals in single quotes ('foo'), so your sanitation might only escape those; but MySQL allows string literals in double quotes ("foo"), and if you don't sanitize those as well, you'll have an injection attack if you use MySQL.
If you use PreparedStatement, the implementation for that interface is provided by the appropriate JDBC Driver, and that implementation is responsible for escaping your input. This means that the sanitization code is written by the people who wrote the JDBC driver as a whole, and those people presumably know the ins and outs of the DB's specific escaping rules. They've also most likely tested those escaping rules more thoroughly than you'd test your hand-rolled escaping function.
So, if you write preparedStatement.setString(1, name), the implementation for that method (again, written by the JDBC driver folks for the DB you're using) could be roughly like:
public void setString(int idx, String value) {
String sanitized = ourPrivateSanitizeMethod(value);
internalSetString(idx, value);
}
(Keep in mind that the above code is an extremely rough sketch; a lot of JDBC drivers actually handle it quite differently, but the principle is basically the same.)
Another problem is that it could be non-obvious whether myUserInputVar has been sanitized or not. Take the following snippet:
private void updateUser(int name, String id) throws SQLException {
myStat.executeUpdate("UPDATE user SET name=" + name + " WHERE id=" + id);
}
Is that safe? You don't know, because there's nothing in the code to indicate whether name is sanitized or not. And you can't just re-sanitize "to be on the safe side", because that would change the input (e.g., hello ' world would become hello '' world). On the other hand, a prepared statement of UPDATE user SET name=? WHERE id=? is always safe, because the PreparedStatement's implementation escapes the inputs before it plugs values into the ?.
When using a PreparedStatement the way it is meant to be used - with a fixed query text with parameter placeholders, no concatenation of external values -, then you are protected against SQL Injection.
There are roughly two ways this protection works:
The JDBC driver properly escapes the values and inserts them in the query at the placeholder positions, and sends the finished query to the server (AFAIK only MySQL Connector/J does this, and only with useServerPrepStmts=false which is the default).
The JDBC driver sends the query text (with placeholders) to the server, the server prepares the query and sends back a description of the parameters (eg type and length). The JDBC driver then collects the parameter values and sends these as a block of parameter values to the server. The server then executes the prepared query using those parameter values.
Given the way a query is prepared and executed by the server, SQL injection cannot occur at this point (unless of course you execute a stored procedure, and that stored procedure creates a query dynamically by concatenation).
The framework , Sql driver makes sure to escape the input. If you use string Statements and escape properly - will achieve same result. But that is not recommended as Preparend statements seem like more lines of code but lead to more structured code as well. Instead of a soup of long sql lines.
Plus since we set each parameter separately and explicitly the underlying driver class can escape them correctly depending on the data base in use. Meaning you could change the data base by config, but no matter the driver takes care of escaping. So one data base might need slashes escaped and another might want two single quotes ...
This also leads to less code as you do not need to bother about this. Simply put you let the framework / common classes one level below the app code take care of it.
As we know the best way to avoid sql injection is using prepared statement with bind variables. But i have question what
if i use just prepared statement but not bind variables like below where customer id is coming from User interface
String query ="select * from customer where customerId="+customerId;
PreparedStatement stmt = con.prepareStatement(query); //line1
Does line 1 take care restricting sql injection even when i have not used bind variables?
I agree the best way is below but if above approach also takes care of restrcting sql injection then i would prefer above one(as
its a legacy project)
String query ="select * from customer where customerId=?";
PreparedStatement stmt = con.prepareStatement(query);
stmt.setInt(1, 100);
Is prepared statement without using bind variable sufficient to make sure sql injection not possible?
One have to distinguish several matters.
Using prepared statement won't do any help just by itself.
As well as there is no harm in using non-prepared way in general.
The thing works only when you need to insert dynamical part into query.
So, in this latter case such a dynamical part have to go into query via placeholder only, which actual value have to be bound later (placeholder is a ? or any other mark that represents the actual data in the query).
The very term "prepared statement" implies using placeholders for all the dynamical data that goes into query. So,
if you have no dynamical parts in the query, there would be obviously no injection at all, even without using prepared statements.
if you're using a prepared statement, but inject values directly into query instead of binding them - it would be wide open to injection.
So, again - only with placeholders for all dynamical data prepared statement would work. And it works because:
every dynamical value have to be properly formatted
prepared statement makes proper formatting (or handling) inevitable.
prepared statement does proper formatting (or handling) in the only proper place - right before query execution, not somewhere else, so, our safety won't rely on such unreliable sources like
some 'magic' feature which rather would spoil the data than make it safe.
good will of one (or several) programmers, who can decide to format (or not to format) our variable somewhere in the program flow. That's the point of great importance.
prepared statement affects the very value that is going into query, but not the source variable, which remains intact and can be used in the further code (to be sent via email or shown on-screen).
prepared statement can make application code dramatically shorter, doing all the formatting behind the scenes (*only if driver permits).
Line 1 will not check if develeper want or not want to drop table. If you write query it's assumed it is Ok.
Goal of sql injection is to prepare values that allows making additional sql query without will nor knowledge of developer. Quering your website with fake values in attributes.
Example:
id = "'); DROP ALL TABLES; --";
query = "select * from customer where customerId="+id;
PreparedStatement ensures that special symbols (like ' or ") added to query using setInt/setString/etc will not interfere with sql query.
I know this is an older post, I just wanted to add that you avoid injection attacks if you can make sure you are only allowing integers into your query for line 1. String inputs are where the injection attacks happen. In the sample above, it is unclear which class of variable 'customerId' is, although it looks like an int. Since the question is tagged as Java, you can't do an injection attack with an int, so you should be fine.
If it is a string in line 1, you need to be confident that the 'customerId' comes from a secure source from which it must be an integer. If it comes from a post form or other user generated field then you can either try to escape it or convert it to an integer to be sure. If it is a string, cast it to an integer and you will not need to bind params.
In Java I would want to print out the query that is going to be submitted/queried on the database so that I can see whats the error when the query throws out exception.
It will be useful to exactly locate the issue instead of trying to understand Oracle Exception ID's and trying to match where exactly did it fail in the code. Any help please.
PreparedStatement ps = conn.prepareStatement("SELECT * FROM EMPLOYEES where EMPNAME=?");
ps.setString(1, "HULK");
ps.executeQuery();
Ideally I want to do a syso(ps) or syso(ps.getquery) and the output should be
SELECT * FROM EMPLOYEES WHERE EMPNAME='HULK'
or
SELECT * FROM EMPLOYEES WHERE EMPNAME=<HASHCODE OF THE OBJECT YOU ARE TRYING TO BIND>
Something interesting I ran across, Log4JDBC, which allows you to log SQL Calls. I haven't had a chance to use it yet, but I thought it was a great idea to be able to change the logging level and get the SQL calls into a log file.
This is more than you asked for, but I thought it might be worth throwing out there.
I think this is already been answered here.
Short answer: print toString() method or the PrepareStatement to see the query with the bind variables substituted with values.
BUT: It al depends of the implementor. Not all JDBC drivers add this nicety.
If your particular driver doesn't comply with this, then the only workaround would be composing the SQL by concatenating the values instead of using bind variables (losing the performance advantages the RDBMS gives you when using bind variables).
But for this you have to convert things to strings, etc.
This would be paradoxical, since I have found that concatenated SQLs are the most error prone and are the ones that need most printing and checking.
So, I'm using jdbc to talk to a MySQL DB. For many a table and for many queries/views, I have created one class which encapsulates one row of the table or the query/table result. Accesses to the DB return one object of such a class (when I know exactly that there's only one matching row) or a Vector of such objects.
Each class features a factory method that builds an object from a row of a ResultSet. A lot of ResultSet.getXXX() methods are needed, as is finicky bookkeeping about which value is in which column, especially after changes to the table/query/view layout.
Creating and maintaining these objects is a boring, work-intensive and mind-numbing task. In other words, the sort of task that is done by a tool. It should read SQL (the MySQL variant, alas) and generate Java code. Or, at least, give me a representation (XML? DOM?) of the table/query/view, allowing me to do the java code generation myself.
Can you name this tool?
I'm a little confused about your questions. Why don't you use an Object-relational-mapping framework like Hibernate?
I used to have the same problem having to read and write a lot of SQL directly. Eventually I started writing newer projects with Hibernate and haven't looked back. The system takes care for me of building the actual tables and running the SQL in the background, and I can mostly work with the java objects.
If you are looking for a simple framework to help with the drudge work in writing sql, I would recommend ibatis sql maps. This framework basically does exactly what you want.
Hibernate is also a good option, but it seems a bit oversized for a simple problem like yours.
You might also have a look at the spring framework. This aims to create a simple environment for writing java application and has a very usable sql abstraction as well. But be careful with spring, you might start to like the framework and spend too many happy hours with it 8)
As to your concern with reflection. Java has no major problems anymore with performance overhead of reflection (at least since Version 1.4 and with O/R mapping tools).
In my experience, it is better to care about well written and easily understandable code, than caring about some performance overhead this might perhaps cost, that is only theoretical.
In most cases performance problems will not show up, where you expect them and can only be identified with measurement tools used on your code after it has been written. The most common problems with performance are I/O related or are based on some error in your own code (i.e. massively creating new instances of classes or loops with millions of runs, that are not necessary...) and not in the jdk itself.
I created a mini-framework like that years ago, but it was for prototyping and not for production.
The idea follows and it is VERY very simple to do. The tradeoff is the cost of using reflection. Although Hibernate and others ORM tools pay this cost also.
The idea is very simple.
You have a Dao class where you execute the query.
Read the ResultSet Metadata and there you can grab the table name, fields, types etc.
Find in the class path a Class that matches the table name and / or have the same number/types of fields.
Set the values using reflection.
Return this object and cast it in the other side and you're done.
It might seem absurd to find the class at runtime. And may look too risky too, because the query may change or the table structure may change. But think about it. When that happens, you have to update your mappings anyway to match the new structure. So instead you just update the matching class and live happy with that.
I'm not aware on how does ORM tools work to reduce reflection call cost ( because the mapping the only thing it does is help you to find the matching class ) In my version the lookup among about 30,000 classes ( I added jars from other places to test it ) took only .30 ms or something like that. I saved in cache that class and the second time I didn't have to make the lookup.
If you're interested ( and still reading ) I'll try to find the library in my old PC.
At the end my code was something like this:
Employee e = ( Employee ) MagicDataSource.find( "select * from employee where id = 1 ");
or
Employee[] emps = ( Employee[] ) MagicDataSource.findAll("select * from employee ");
Inside it was like:
Object[] findAll( String query ) {
ResultSet rs = getConnection().prepareStatemet( query ).executeQuery();
ResultSetMetaData md = rs.getMetadata();
String tableName = md.getTableName();
String clazz = findClass( toCamelCase( tableName ) ); // search in a list where all the class names where loaded.
Class.forName( clazz );
while( rs.next() ) {
for each attribute etc. etc.
setter...
end
result.append( object );
}
return result.toArray();
}
If anyone knows how ORM tools deal with reflection cost please let me know. The code I have read from open source projects don't event attempt to do anything about it.
At the end it let me create quick small programs for system monitoring or stuff like that. I don't do that job anymore and that lib is now in oblivion.
Apart from the ORMs...
If you're using the rs.getString and rs.getInt routines, then you can certainly ease your maintenance burden if you rely on named columns rather than numbered columns.
Specifically rs.getInt("id") rather than rs.getInt(1), for example.
It's been rare that I've had an actual column change data type, so future SQL maintenance is little more than adding the new columns that were done to the table, and those can be simply tacked on to the end of your monster bind list in each of you little DAO objects.
Next, you then take that idiom of using column names, and you extend it to a plan of using consistent names, and, at the same, time, "unique" names. The intent there is that each column in your database has a unique name associated with it. In theory it can be as simple (albeit verbose) as tablename_columnname, thus if you have a "member" table, the column name is "member_id" for the id column.
What does this buy you?
It buys you being able to use your generic DAOs on any "valid" result set.
A "valid" result set is a result set with the columns named using your unique naming spec.
So, you get "select id member_id, name member_name from member where id = 1".
Why would you want to do that? Why go to that bother?
Because then your joins become trivial.
PreparedStatement = con.prepareStatement("select m.id member_id, m.name member_name, p.id post_id, p.date post_date, p.subject post_subject from member m, post p where m.id = p.member_id and m.id = 123");
ResultSet rs = ps.executeQuery();
Member m = null;
Post p = null;
while(rs.next()) {
if (m == null) {
m = MemberDAO.createFromResultSet(rs);
}
p = PostDAO.createFromResultSet(rs);
m.addPost(p);
}
See, here the binding logic doesn't care about the result set contents, since it's only interested in columns it cares about.
In your DAOs, you make them slightly clever about the ResultSet. Turns out if you do 'rs.getInt("member_id")' and member_id doesn't happen to actually BE in the result set, you'll get a SQLException.
But with a little work, using ResultSetMetaData, you can do a quick pre-check (by fetching all of the column names up front), then rather than calling "rs.getInt" you can call "baseDAO.getInt" which handles those details for you so as not to get the exception.
The beauty here is that once you do that, you can fetch incomplete DAOs easily.
PreparedStatement = con.prepareStatement("select m.id member_id from member m where m.id = 123");
ResultSet rs = ps.executeQuery();
Member m = null;
if (rs.next()) {
m = MemberDAO.createFromResultSet(rs);
}
Finally, it's really (really) a trivial bit of scripting (using, say, AWK) that can take the properties of a bean and convert it into a proper blob of binding code for an initial DAO. A similar script can readily take a SQL table statement and convert it in to a Java Bean (at least the base members) that then your IDE converts in to a flurry of getters/setters.
By centralizing the binding code in to the DAO, maintenance is really hardly anything at all, since it's changed in one place. Using partial binding, you can abuse them mercilessly.
PreparedStatement = con.prepareStatement("select m.name member_name, max(p.date) post_date from member m, post p where post.member_id = m.id and m.id = 123");
ResultSet rs = ps.executeQuery();
Member m = null;
Post p = null;
if (rs.next()) {
m = MemberDAO.createFromResultSet(rs);
p = MemberDAO.craateFromResultSet(rs);
}
System.out.println(m.getName() + " latest post was on " + p.getDate());
Your burden moving forward is mostly writing the SQL, but even that's not horrible. There's not much difference between writing SQL and EQL. Mind, is does kind of suck having to write a select statement with a zillion columns in it, since you can't (and shouldn't anyway) use "select * from ..." (select * always (ALWAYS) leads to trouble, IME).
But those are just the reality. I have found,though, that (unless you're doing reporting), that problem simply doesn't happen a lot. It happens at least once for most every table, but it doesn't happen over and over and over. And, naturally, once you have it once, you can either "cut and paste" your way to glory, or refactor it (i.e. sql = "select " + MemberDAO.getAllColumns() + ", " + PostDAO.getAllColumns() + " from member m, post p").
Now, I like JPA and ORMs, I find them useful, but I also find them a PITA. There is a definite love/hate relationship going on there. And when things are going smooth, boy, is it smooth. But when it gets rocky -- hoo boy. Then it can get ugly. As a whole, however, I do recommend them.
But if you're looking for a "lightweight" non-framework, this technique is useful, practical, low overhead, and gives you a lot of control over your queries. There's simply no black magic or dark matter between your queries and your DB, and when things don't work, it's not some arcane misunderstanding of the framework or edge case bug condition in someone elses 100K lines of code, but rather, odds are, a bug in your SQL -- where it belongs.
Edit: Nevermind. While searching for a solution to my own problem, I forgot to check the date on this thing. Sorry. You can ignore the following.
#millermj - Are you doing that for fun, or because there's a need? Just curious, because that sounds exactly like what Java IDEs like Eclipse and NetBeans already provide (using the Java Persistence API) with New->JPA->Entity Classes from Tables functionality.
I could be missing the point, but if someone just needs classes that match their tables and are persistable, the JPA plus some IDE "magic" might be just enough.