How to retrieve all SQL-queries from Java source code?

How to retrieve all SQL-queries from Java source code? - java

We have many Java Spring projects which use Sybase database.
We want to migrate it to MSSQL.
One of the tasks is to develop a script to find all SQL-queries used in the projects' source code. Moreover, there is a brought usage of stored procedures in the projects.
What is an appropriate approach to do so?
#Override
public void update(int id, Entity entity) {
jdbcTemplate.update(
"UPDATE exclusion SET [enabled] = :enabled WHERE [id] = :id",
HashMapBuilder.<String, Object>builder()
.put("id", id)
.put("enabled", entity.enabled)
.build()
);
}
It is the easiest case.
Firstly, we want to REGEX the source code in order to find SQL by a list of SQL keywords.

In essence, you want to find any (SQL) string being fed to a jdbc call.
This means your tool must know what the jdbc methods are (e.g., "jdbcTemplate.update"), and which argument of each method is string intended to be SQL. That's sort of easy since it is documented.
What is hard is to find the string, because you assemble it dynamically; there's no guarantee that the entire SQL string is actually sitting as a direct argument to the function call. It might be computed by combining SQL string fragments using "+" and arbitrary function calls.
This means you have to parse the Java in a compiler sense, know what the meaning of each symbol is, and trace values through the dataflows in the code.
There's no way on earth a regex can do this reliably. (You can do it badly and maybe that's good enough for you, I suggest hunting for all jdbc method call names).
There's a worse problem: once you've figured out what the SQL string is, you know need to know if it is MSSQL-compliant. That requires parsing the abstract string (remember, it is assembled from a bunch of fragments) using an MSSQL-compliant parser (again, no regex can do context-free parsing) and complain about the ones that don't parse.
Even that may not be enough, if MSSQL has statements that look identical to sybase statements, but mean different things.
THis is a really hard problem to solve well using automation. (There are research papers that describe all of the above activities).
I think what you will have to do is find all SQL calls, and hand-inspect each for compatibility.
Next time, you should build your application with a database access layer. Then all the SQL calls are in one place.

Related

Why parsing Gremlin query in Java isn't generic?

I'm parsing a Gremlin query in Java (well, actually I'm writing Scala, and using the Groovy compiled JARs like it was Java).
The query is a String variable that is given by user input. In other words - I cannot tell what the query will be, I'm only assuming it's a valid Gremlin query (syntactically and logically).
I started with a simple Gremlin.compile(query) that returns Pipe on which I'm iterating. However, according to the example, one must invoke .setStarts prior to iterating the Pipe. And I must know what the is the runtime type S in my Pipe<S,E>.
It feels like this API isn't generic enough, the following line from the example
pipe.setStarts(new SingleIterator<Vertex>(graph.getVertex(1)));
will work for some cases, but for Vertex Iteration for one example (g.V()) it will throw a CastException.
Is there a way to work-around it?
Perhaps using the underlying Script Engine (like the next examples in the link above) will help me to achieve more generic code?

I found a workaround. It feels a bit ugly but it does the job.
I'm using ScriptEngine with bindings of 'g' for the Graph, so the user can start his/her queries with g.. (not helps for generics, but makes it more user-friendly by not making the user use the Identity Pipe (_()) at the beginning of his/her queries).
(kind of ugly, I know) I'm extracting from the query string (using RegEx) the starting vertex (if exists), finding it programatically and (if found) invoking setStarts with it. If it's not found I'm giving the Graph itself as the parameter for setStarts, assuming its a Vertex Iteration query.

How to build a SQL string, with values filled in, from a statement with named parameters?

I'm looking for a utility to prepare a SQL statement with named parameters and values, but not execute it. I just want the resulting SQL statement, with values substituted for named params, as a java.lang.String object. I did not find anything in Spring or Apache Commons. [I know how to enable debug logging for java.sql.*] Because I'm querying a db instance on a mainframe, prepared statements are not allowed; the support has been disabled, for some strange reason. That decision is beyond my control or influence. Do you know of a utility that can help me? I guess I could roll my own utility if I had to, but I'd rather not.

This is like saying you'd like to see your Java code with the user's input hardcoded into the source.
That would be ridiculous, because you know that the user's input is never combined with the Java source. It's combined (so to speak) with the compiled Java app at runtime. There is never a time when input data becomes merged into the source.
It's the same way with prepared SQL statements.
During prepare(), the RDBMS receives the textual SQL string, and parses it internally and retains a sort of "bytecode" version of the query. In this internal representation, the parameter placeholders are noted, and the query cannot execute before values are provided.
During execute(), the RDBMS receives parameter values, and combines them with the bytecode. The parameters never see the original SQL text! That's the way it is supposed to work.

First, you should know that one of the reasons for the prepared statement is security. The natural way of simply replacing placeholders with textual representation of parameters and then sending a simple string had been the cause of many SQL injection attacks. A classical example is
SELECT * FROM tab WHERE tabid = ?
with a parameter of 1; DELETE FROM tab and textual replacement of parameters, you transform a simple query in a delete all statement. Of course real attacks could be much more clever than that ...
It is really strange that in a mainframe database, one recommends plain SQL statements over prepared statements. In my experiences, security reasons leaded to the opposite rule. You should really ask for the reason of that, and what is the recommended approach. It could be be the usage of a special library, or a framework or ... but if you can, do avoid textual replacement.
Edit:
If your are really stuck with textual replacement, you will have to roll your own utility. As explained above, I cannot imagine the a framework does that. In a real word application, it can be made reasonably sure if you can validate all the inputs to avoid any possibility of SQL injection (no special characters or only at know places). But if I were at your place, I would not try to mimic SQL prepared statements, and I would simply use String.format where the format string will be the SQL query with placeholders in Formatter syntax.

Escaping SQL Strings in Java

Background:
I am currently developing a Java front end for an Enterprise CMS database (Business Objects). At the moment, I am building a feature to allow the user to build a custom database query. I have already implemented measures to ensure the user is only able to select using a subset of the available columns and operators that have been approved for user access (eg. SI_EMAIL_ADDRESS can be selected while more powerful fields like SI_CUID cannot be). So far things have been going on swimmingly, but it is now time to secure this feature against potential SQL injection attacks.
The Question:
I am looking for a method to escape user input strings. I have already seen PerparedStatement, however I am forced to use third party APIs to access the database. These APIs are immutable to me and direct database access is out of the question. The individual methods take strings representing the queries to be run, thus invalidating PreparedStatement (which, to my knowledge, must be run against a direct database connection).
I have considered using String.replace(), but I do not want to reinvent the wheel if possible. In addition, I am a far cry from the security experts that developed PerparedStatement.
I had also looked at the Java API reference for PerparedStatement, hoping to find some sort of toString() method. Alas, I have been unable to find anything of the sort.
Any help is greatly appreciated. Thank you in advance.
References:
Java - escape string to prevent SQL injection
Java equivalent for PHP's mysql_real_escape_string()

Of course it would be easier and more secure to use PreparedStatement.
ANSI SQL requires a string literal to begin and end with a single quote, and the only escape mechanism for a single quote is to use two single quotes:
'Joe''s Caffee'
So in theory, you only need to replace a single quote with two single quotes. However, there are some problems. First, some databases (MySQL for example) also (or only) support a backslash as an escape mechanism. In that case, you would need to double the backslashes (as well).
For MySQL, I suggest to use the MySQLUtils. If you don't use MySQL, then you need to check what are the exact escape mechanisms to use.

You may still be able to used a prepared statement. See this post: get query from java sql preparedstatement. Also, based on that post, you may be able to use Log4JDBC to handle this.
Either of these options should prevent you from needing to worry about escaping strings to prevent SQL injection, since the prepared statement does it for you.

Although, there is no standard way to handle PHP's mysql_real_escape_string() in Java What I did was to chain replaceAll method to handle every aspect that may be necessary to avoid any exception. Here is my sample code:
public void saveExtractedText(String group,String content)
{
try {
content = content.replaceAll("\", "\\")
.replaceAll("\n","\n")
.replaceAll("\r", "\r")
.replaceAll("\t", "\t")
.replaceAll("\00", "\0")
.replaceAll("'", "\'")
.replaceAll("\"", "\\"");
state.execute("insert into extractiontext(extractedtext,extractedgroup) values('"+content+"','"+group+"')");
} catch (Exception e) {
e.printStackTrace();
}

Where to store expected output of a test?

Writing a test I expect the tested method to return certain outputs. Usually I'm checking that for a given database operation I get a certain output. My practice has usually been that of writing an array as a quick map/properties file in the test itself.
This solution is quick, and is not vulnerable to run-time changes of an external file to load the expected results from.
A solution is to place the data in a java source file, so I bloat less the test and still get a compile-time checked test. How about this?
Or is loading the exepected results as resources a better approach? A .properties file is not good enough since I can have only one value per key. Is commons-config the way to go?
I'd prefer a simple solution where I name the properties per key, so for each entry I might have a doc-length and numFound property value (sounds like the elements of an xml node)?
How do you go about this?

You must remember about maintaining such tests. After writing several web services tests with Spring-WS test support I must admit that storing requests (test setup) and expected responses in external XML files wasn't such a good idea. Each request-response pair had the same name prefix as test case so everything was automated and very clean. But still refactoring and diagnosing test failures becomes painful. After a while I realized that embedding XML in test case as String, although ugly, is much easier to maintain.
In your case, I assume you invoke some database query and you get a list of maps in response. What about writing some nice DSL to make assertions on these structures? Actually, FEST-Assert is quite good for that.
Let's say you test the following query (I know it's an oversimplification):
List<Map<String, Object>> rs = db.query("SELECT id, name FROM Users");
then you can simply write:
assertThat(rs).hasSize(1);
assertThat(rs.get(0))
.hasSize(2)
.includes(
entry("id", 7),
entry("name", "John")
)
);
Of course it can and should be further simplified to fit your needs better. Isn't it easier to have a full test scenario on one screen rather than jump from one file to another?
Or maybe you should try Fitnesse (looks like you are no longer doing unit testing, so acceptance testing framework should be fine), where tests are stored in wiki-like documents, including tables?

Yes, using resources for expected results (and also setup data) works well and is pretty common.
XML may well be a useful format for you - being hierarchical can certainly help (one element per test method). It depends on the exact situation, but it's definitely an option. Alternatively, JSON may be easier for you. What are you comfortable with, in terms of serialization APIs?

Given that you mention you are usually testing that a certain DB operation returns expected output, you may want to take a look at using DBUnit:
// Load expected data from an XML dataset
IDataSet expectedDataSet = new FlatXmlDataSetBuilder().build(new File("expectedDataSet.xml"));
ITable expectedTable = expectedDataSet.getTable("TABLE_NAME");
// Assert actual database table match expected table
Assertion.assertEquals(expectedTable, actualTable);
DBUnit handles comparing the state of a table after some operation has completed and asserting that the data in the table matches an expected DataSet. The most common format for the DataSet that you compare the actual table state with is probably using an XmlDataSet, where the expected data is loaded from an XML file, but there are other subclasses as well.
If you are already doing testing like this, then it sounds like you may have written most of the same logic already - but DBUnit may give you additional features you haven't implemented on your own yet for free.

How to handle HUGE SQL strings in the source code

I am working on a project currently where there are SQL strings in the code that are around 3000 lines.
The project is a java project, but this question could probably apply for any language.
Anyway, this is the first time I have ever seen something this bad.
The code base is legacy, so we can suddenly migrate to Hibernate or something like that.
How do you handle very large SQL strings like that?
I know its bad, but I don't know exactly what is the best thing to suggest for a solution.

It seems to me that making those hard-coded values into stored procedures and referencing the sprocs from code instead might be high yield and low effort.

Does the SQL has a lot of string concatenations for the variables?
If it doesn't you can extract them a put them in resources files. But you'll have to remove the string conatentation in the line breaks.
The stored procedure approach you used is very good, but sometimes when there's need to understand what the SQL is doing, you have to switch from workspace to your favorite SQL IDE. That's the only bad thing.
For my suggestion it would be like this:
String query = "select ......."+
3000 lines.
To
ResourceBundle bundle = ResourceBundle.getBundle("queries");
String query = bundle.getString( "customerQuery" );
Well that's the idea.

I guess the first question is, what are you supposed to do with it? If it's not broken, quietly close it up and pretend you've never seen it. Otherwise, refactor like mad - hopefully there's some exit condition somewhere that looks something like a contract.

The best thing I could come up with so far is to put the query into several stored procedures, the same way I would handle a method thats too long in Java.

I'm in the same spot you are... My plan was to pull the SQL into separate.sql files within the project and create a utility method to read the file in when I need the query.
string sql = "select asd,asdf,ads,asdf,asdf,"
+ "from asdfj asfj as fasdkfjl asf"
+ "..........................."
+ "where user = #user and ........";
The query gets dumped into a file called usageReportByUser.sql
and the becomes something like this.
string sql = util.queries("usageReportByUser");
Make sure it's done in a way that the files are not publicly accessible.

Use the framework iBatis

I wrote a toolkit for this a while back and have used it in several projects. It allows you to put queries in mostly text files and generate bindings and documentation for them.
Check out this example and an example use (it's pretty much just a prepared statement compiled to a java class with early bound/typesafe query bindings).
It generates some nice javadocs, too, but I don't have any of those online at the moment.

I second the iBatis recommendation. At the least, you can pull the SQL out from Java code that most likely uses StringBuffer and appending or String concat into XML where it is just easier to maintain.
I did this for a legacy web app and I turned on debugging and ran unit tests for the DAOs and just copied the generated sql for each statement into the iBatis xml. Worked pretty smooth.

What I do in PHP is this:
$query = "SELECT * FROM table WHERE ";
$query .= "condition < 5 AND ";
$query .= "condition2 > 10 AND ";
and then, once you've finished layering on $query:
mysql_query($query);

One easy way would be to break them out into some kind of constants. That would at least make the code more readable.

I store them in files (or resources), and then read and cache them on app start (or on change if it's a service or something).
Or, just put them into a big old SqlQueries class as consts or readonlys.

I have had success converting large dynamic queries to linq queries. (1K lines+) This works very well for reporting scenarios where you have a lot of dynamic filtering and dynamic grouping on a relatively small number of tables. Create an edmx for those tables and you can write excellent strong typed composable queries.
I found that performance actually improved and the resulting sql was a lot simpler.
I'm sure you would get similar mileage with Hibernate - other than being able to use linq ofcourse. But generaly speaking if the generated sql needs to be highly dynamic, then it is not a good candidate for a stored proc. writing dynamic sql inside a stored proc is the worst of both worlds. sql generation frameworks would be my preferred way to go here. If you dig Hibernate I think it would be a good solution.
That being said, if the queries are just simple strings with parameters, then just throw them in a stored proc and be done with it. -but then you miss out on the results being nice to work with objects.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.