Escaping SQL Strings in Java

Escaping SQL Strings in Java - java

Background:
I am currently developing a Java front end for an Enterprise CMS database (Business Objects). At the moment, I am building a feature to allow the user to build a custom database query. I have already implemented measures to ensure the user is only able to select using a subset of the available columns and operators that have been approved for user access (eg. SI_EMAIL_ADDRESS can be selected while more powerful fields like SI_CUID cannot be). So far things have been going on swimmingly, but it is now time to secure this feature against potential SQL injection attacks.
The Question:
I am looking for a method to escape user input strings. I have already seen PerparedStatement, however I am forced to use third party APIs to access the database. These APIs are immutable to me and direct database access is out of the question. The individual methods take strings representing the queries to be run, thus invalidating PreparedStatement (which, to my knowledge, must be run against a direct database connection).
I have considered using String.replace(), but I do not want to reinvent the wheel if possible. In addition, I am a far cry from the security experts that developed PerparedStatement.
I had also looked at the Java API reference for PerparedStatement, hoping to find some sort of toString() method. Alas, I have been unable to find anything of the sort.
Any help is greatly appreciated. Thank you in advance.
References:
Java - escape string to prevent SQL injection
Java equivalent for PHP's mysql_real_escape_string()

Of course it would be easier and more secure to use PreparedStatement.
ANSI SQL requires a string literal to begin and end with a single quote, and the only escape mechanism for a single quote is to use two single quotes:
'Joe''s Caffee'
So in theory, you only need to replace a single quote with two single quotes. However, there are some problems. First, some databases (MySQL for example) also (or only) support a backslash as an escape mechanism. In that case, you would need to double the backslashes (as well).
For MySQL, I suggest to use the MySQLUtils. If you don't use MySQL, then you need to check what are the exact escape mechanisms to use.

You may still be able to used a prepared statement. See this post: get query from java sql preparedstatement. Also, based on that post, you may be able to use Log4JDBC to handle this.
Either of these options should prevent you from needing to worry about escaping strings to prevent SQL injection, since the prepared statement does it for you.

Although, there is no standard way to handle PHP's mysql_real_escape_string() in Java What I did was to chain replaceAll method to handle every aspect that may be necessary to avoid any exception. Here is my sample code:
public void saveExtractedText(String group,String content)
{
try {
content = content.replaceAll("\", "\\")
.replaceAll("\n","\n")
.replaceAll("\r", "\r")
.replaceAll("\t", "\t")
.replaceAll("\00", "\0")
.replaceAll("'", "\'")
.replaceAll("\"", "\\"");
state.execute("insert into extractiontext(extractedtext,extractedgroup) values('"+content+"','"+group+"')");
} catch (Exception e) {
e.printStackTrace();
}

Related

How to retrieve all SQL-queries from Java source code?

We have many Java Spring projects which use Sybase database.
We want to migrate it to MSSQL.
One of the tasks is to develop a script to find all SQL-queries used in the projects' source code. Moreover, there is a brought usage of stored procedures in the projects.
What is an appropriate approach to do so?
#Override
public void update(int id, Entity entity) {
jdbcTemplate.update(
"UPDATE exclusion SET [enabled] = :enabled WHERE [id] = :id",
HashMapBuilder.<String, Object>builder()
.put("id", id)
.put("enabled", entity.enabled)
.build()
);
}
It is the easiest case.
Firstly, we want to REGEX the source code in order to find SQL by a list of SQL keywords.

In essence, you want to find any (SQL) string being fed to a jdbc call.
This means your tool must know what the jdbc methods are (e.g., "jdbcTemplate.update"), and which argument of each method is string intended to be SQL. That's sort of easy since it is documented.
What is hard is to find the string, because you assemble it dynamically; there's no guarantee that the entire SQL string is actually sitting as a direct argument to the function call. It might be computed by combining SQL string fragments using "+" and arbitrary function calls.
This means you have to parse the Java in a compiler sense, know what the meaning of each symbol is, and trace values through the dataflows in the code.
There's no way on earth a regex can do this reliably. (You can do it badly and maybe that's good enough for you, I suggest hunting for all jdbc method call names).
There's a worse problem: once you've figured out what the SQL string is, you know need to know if it is MSSQL-compliant. That requires parsing the abstract string (remember, it is assembled from a bunch of fragments) using an MSSQL-compliant parser (again, no regex can do context-free parsing) and complain about the ones that don't parse.
Even that may not be enough, if MSSQL has statements that look identical to sybase statements, but mean different things.
THis is a really hard problem to solve well using automation. (There are research papers that describe all of the above activities).
I think what you will have to do is find all SQL calls, and hand-inspect each for compatibility.
Next time, you should build your application with a database access layer. Then all the SQL calls are in one place.

How to build a SQL string, with values filled in, from a statement with named parameters?

I'm looking for a utility to prepare a SQL statement with named parameters and values, but not execute it. I just want the resulting SQL statement, with values substituted for named params, as a java.lang.String object. I did not find anything in Spring or Apache Commons. [I know how to enable debug logging for java.sql.*] Because I'm querying a db instance on a mainframe, prepared statements are not allowed; the support has been disabled, for some strange reason. That decision is beyond my control or influence. Do you know of a utility that can help me? I guess I could roll my own utility if I had to, but I'd rather not.

This is like saying you'd like to see your Java code with the user's input hardcoded into the source.
That would be ridiculous, because you know that the user's input is never combined with the Java source. It's combined (so to speak) with the compiled Java app at runtime. There is never a time when input data becomes merged into the source.
It's the same way with prepared SQL statements.
During prepare(), the RDBMS receives the textual SQL string, and parses it internally and retains a sort of "bytecode" version of the query. In this internal representation, the parameter placeholders are noted, and the query cannot execute before values are provided.
During execute(), the RDBMS receives parameter values, and combines them with the bytecode. The parameters never see the original SQL text! That's the way it is supposed to work.

First, you should know that one of the reasons for the prepared statement is security. The natural way of simply replacing placeholders with textual representation of parameters and then sending a simple string had been the cause of many SQL injection attacks. A classical example is
SELECT * FROM tab WHERE tabid = ?
with a parameter of 1; DELETE FROM tab and textual replacement of parameters, you transform a simple query in a delete all statement. Of course real attacks could be much more clever than that ...
It is really strange that in a mainframe database, one recommends plain SQL statements over prepared statements. In my experiences, security reasons leaded to the opposite rule. You should really ask for the reason of that, and what is the recommended approach. It could be be the usage of a special library, or a framework or ... but if you can, do avoid textual replacement.
Edit:
If your are really stuck with textual replacement, you will have to roll your own utility. As explained above, I cannot imagine the a framework does that. In a real word application, it can be made reasonably sure if you can validate all the inputs to avoid any possibility of SQL injection (no special characters or only at know places). But if I were at your place, I would not try to mimic SQL prepared statements, and I would simply use String.format where the format string will be the SQL query with placeholders in Formatter syntax.

How can I support the SQL GO statement in a Java / jtds application?

I'm working on a Java based OSS app SqlHawk which as one of its features is to run upgrade sql scripts against a server.
Microsoft have made it a convention to split a script into batches with the GO statement, which is a good idea but just asking for false matches on the string.
At the moment I have a very rudimentary:
// split where GO on its own on a line
Pattern batchSplitter = Pattern.compile("^GO", Pattern.MULTILINE);
...
String[] splitSql = batchSplitter.split(definition);
...
which kind of works but is prone to being tripped up by things like quoted GO statements or indentation issues.
I think the only way to make this truly reliable is to have an SQL parser in the app, but I have no idea how to go about this, or whether that might actually end up being less reliable (especially given this tool supports multiple DBMSs).
What ways could I solve this problem? Code examples would be very helpful to me here.
Relevant sqlHawk code on github.
Currently using jtds to execute the batches found in the scripts.

GO is a client batch seperator command. You can replace it with ;. It should not be sent in your EXEC dynamic SQL.
USE master
GO --<----- client actually send the first batch to SQL and wait for a response
SELECT * from sys.databases
GO
Should be translated in
Application.Exec("USE master");
Application.Exec("SELECT * from sys.databases");
or you can write it this way:
Application.Exec("'USE master;SELECT * from sys.databases")
More about GO
http://msdn.microsoft.com/en-us/library/ms188037(v=sql.90).aspx

Ok, so this isn't going to be exactly what you want, but you might find it a start. I released SchemaEngine (which forms the core of most of my products) as open source here. In there, you will find C# code that does what you want very reliably (i.e. not tripping up with strings, comments etc etc). It also support the 'GO x' syntax to repeat a batch x times.
If you download that and have a look in /Atlantis.SchemaEngine/Helpers you'll find a class called BatchParser.cs which contains a method called ParseBatches - which does pretty much what it says on the tin.

How to include ' in the input form?

I just realize that in my forms I couldn't save name like O'Brian (It would saved as O only and 'Brian will be truncated).
I'm using grails 1.2.2 with mysql.
is there simple ways to allow ' to be inserted into db ? rather than modify each form and put html replacement for that char ?

If inserting into the database is the problem, then you can use parameterized queries. This is strongly recommended anyway, since it avoids possible security risks.
Imagine if instead of entering just a quote character, the user enters "Brian'; DROP TABLE data" into your form!

use the escape character, \
e.g. O\'Brian
See http://dev.mysql.com/doc/refman/5.0/en/string-syntax.html
That said, most DB abstraction layers will allow you to use parameterized queries that do this for you

Grails and its database abstraction GORM should handle that for you, unless you are saving it yourself using some lower level API:s. See the documentation here.
You should not need to replace such characters yourself, so I suggest you have another look at your code and see if you can spot what might cause the problem. I hope you can find an easy solution, it shouldn't be hard with Grails :-)

How to handle HUGE SQL strings in the source code

I am working on a project currently where there are SQL strings in the code that are around 3000 lines.
The project is a java project, but this question could probably apply for any language.
Anyway, this is the first time I have ever seen something this bad.
The code base is legacy, so we can suddenly migrate to Hibernate or something like that.
How do you handle very large SQL strings like that?
I know its bad, but I don't know exactly what is the best thing to suggest for a solution.

It seems to me that making those hard-coded values into stored procedures and referencing the sprocs from code instead might be high yield and low effort.

Does the SQL has a lot of string concatenations for the variables?
If it doesn't you can extract them a put them in resources files. But you'll have to remove the string conatentation in the line breaks.
The stored procedure approach you used is very good, but sometimes when there's need to understand what the SQL is doing, you have to switch from workspace to your favorite SQL IDE. That's the only bad thing.
For my suggestion it would be like this:
String query = "select ......."+
3000 lines.
To
ResourceBundle bundle = ResourceBundle.getBundle("queries");
String query = bundle.getString( "customerQuery" );
Well that's the idea.

I guess the first question is, what are you supposed to do with it? If it's not broken, quietly close it up and pretend you've never seen it. Otherwise, refactor like mad - hopefully there's some exit condition somewhere that looks something like a contract.

The best thing I could come up with so far is to put the query into several stored procedures, the same way I would handle a method thats too long in Java.

I'm in the same spot you are... My plan was to pull the SQL into separate.sql files within the project and create a utility method to read the file in when I need the query.
string sql = "select asd,asdf,ads,asdf,asdf,"
+ "from asdfj asfj as fasdkfjl asf"
+ "..........................."
+ "where user = #user and ........";
The query gets dumped into a file called usageReportByUser.sql
and the becomes something like this.
string sql = util.queries("usageReportByUser");
Make sure it's done in a way that the files are not publicly accessible.

Use the framework iBatis

I wrote a toolkit for this a while back and have used it in several projects. It allows you to put queries in mostly text files and generate bindings and documentation for them.
Check out this example and an example use (it's pretty much just a prepared statement compiled to a java class with early bound/typesafe query bindings).
It generates some nice javadocs, too, but I don't have any of those online at the moment.

I second the iBatis recommendation. At the least, you can pull the SQL out from Java code that most likely uses StringBuffer and appending or String concat into XML where it is just easier to maintain.
I did this for a legacy web app and I turned on debugging and ran unit tests for the DAOs and just copied the generated sql for each statement into the iBatis xml. Worked pretty smooth.

What I do in PHP is this:
$query = "SELECT * FROM table WHERE ";
$query .= "condition < 5 AND ";
$query .= "condition2 > 10 AND ";
and then, once you've finished layering on $query:
mysql_query($query);

One easy way would be to break them out into some kind of constants. That would at least make the code more readable.

I store them in files (or resources), and then read and cache them on app start (or on change if it's a service or something).
Or, just put them into a big old SqlQueries class as consts or readonlys.

I have had success converting large dynamic queries to linq queries. (1K lines+) This works very well for reporting scenarios where you have a lot of dynamic filtering and dynamic grouping on a relatively small number of tables. Create an edmx for those tables and you can write excellent strong typed composable queries.
I found that performance actually improved and the resulting sql was a lot simpler.
I'm sure you would get similar mileage with Hibernate - other than being able to use linq ofcourse. But generaly speaking if the generated sql needs to be highly dynamic, then it is not a good candidate for a stored proc. writing dynamic sql inside a stored proc is the worst of both worlds. sql generation frameworks would be my preferred way to go here. If you dig Hibernate I think it would be a good solution.
That being said, if the queries are just simple strings with parameters, then just throw them in a stored proc and be done with it. -but then you miss out on the results being nice to work with objects.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.