How to handle HUGE SQL strings in the source code

How to handle HUGE SQL strings in the source code - java

I am working on a project currently where there are SQL strings in the code that are around 3000 lines.
The project is a java project, but this question could probably apply for any language.
Anyway, this is the first time I have ever seen something this bad.
The code base is legacy, so we can suddenly migrate to Hibernate or something like that.
How do you handle very large SQL strings like that?
I know its bad, but I don't know exactly what is the best thing to suggest for a solution.

It seems to me that making those hard-coded values into stored procedures and referencing the sprocs from code instead might be high yield and low effort.

Does the SQL has a lot of string concatenations for the variables?
If it doesn't you can extract them a put them in resources files. But you'll have to remove the string conatentation in the line breaks.
The stored procedure approach you used is very good, but sometimes when there's need to understand what the SQL is doing, you have to switch from workspace to your favorite SQL IDE. That's the only bad thing.
For my suggestion it would be like this:
String query = "select ......."+
3000 lines.
To
ResourceBundle bundle = ResourceBundle.getBundle("queries");
String query = bundle.getString( "customerQuery" );
Well that's the idea.

I guess the first question is, what are you supposed to do with it? If it's not broken, quietly close it up and pretend you've never seen it. Otherwise, refactor like mad - hopefully there's some exit condition somewhere that looks something like a contract.

The best thing I could come up with so far is to put the query into several stored procedures, the same way I would handle a method thats too long in Java.

I'm in the same spot you are... My plan was to pull the SQL into separate.sql files within the project and create a utility method to read the file in when I need the query.
string sql = "select asd,asdf,ads,asdf,asdf,"
+ "from asdfj asfj as fasdkfjl asf"
+ "..........................."
+ "where user = #user and ........";
The query gets dumped into a file called usageReportByUser.sql
and the becomes something like this.
string sql = util.queries("usageReportByUser");
Make sure it's done in a way that the files are not publicly accessible.

Use the framework iBatis

I wrote a toolkit for this a while back and have used it in several projects. It allows you to put queries in mostly text files and generate bindings and documentation for them.
Check out this example and an example use (it's pretty much just a prepared statement compiled to a java class with early bound/typesafe query bindings).
It generates some nice javadocs, too, but I don't have any of those online at the moment.

I second the iBatis recommendation. At the least, you can pull the SQL out from Java code that most likely uses StringBuffer and appending or String concat into XML where it is just easier to maintain.
I did this for a legacy web app and I turned on debugging and ran unit tests for the DAOs and just copied the generated sql for each statement into the iBatis xml. Worked pretty smooth.

What I do in PHP is this:
$query = "SELECT * FROM table WHERE ";
$query .= "condition < 5 AND ";
$query .= "condition2 > 10 AND ";
and then, once you've finished layering on $query:
mysql_query($query);

One easy way would be to break them out into some kind of constants. That would at least make the code more readable.

I store them in files (or resources), and then read and cache them on app start (or on change if it's a service or something).
Or, just put them into a big old SqlQueries class as consts or readonlys.

I have had success converting large dynamic queries to linq queries. (1K lines+) This works very well for reporting scenarios where you have a lot of dynamic filtering and dynamic grouping on a relatively small number of tables. Create an edmx for those tables and you can write excellent strong typed composable queries.
I found that performance actually improved and the resulting sql was a lot simpler.
I'm sure you would get similar mileage with Hibernate - other than being able to use linq ofcourse. But generaly speaking if the generated sql needs to be highly dynamic, then it is not a good candidate for a stored proc. writing dynamic sql inside a stored proc is the worst of both worlds. sql generation frameworks would be my preferred way to go here. If you dig Hibernate I think it would be a good solution.
That being said, if the queries are just simple strings with parameters, then just throw them in a stored proc and be done with it. -but then you miss out on the results being nice to work with objects.

Related

How to retrieve all SQL-queries from Java source code?

We have many Java Spring projects which use Sybase database.
We want to migrate it to MSSQL.
One of the tasks is to develop a script to find all SQL-queries used in the projects' source code. Moreover, there is a brought usage of stored procedures in the projects.
What is an appropriate approach to do so?
#Override
public void update(int id, Entity entity) {
jdbcTemplate.update(
"UPDATE exclusion SET [enabled] = :enabled WHERE [id] = :id",
HashMapBuilder.<String, Object>builder()
.put("id", id)
.put("enabled", entity.enabled)
.build()
);
}
It is the easiest case.
Firstly, we want to REGEX the source code in order to find SQL by a list of SQL keywords.

In essence, you want to find any (SQL) string being fed to a jdbc call.
This means your tool must know what the jdbc methods are (e.g., "jdbcTemplate.update"), and which argument of each method is string intended to be SQL. That's sort of easy since it is documented.
What is hard is to find the string, because you assemble it dynamically; there's no guarantee that the entire SQL string is actually sitting as a direct argument to the function call. It might be computed by combining SQL string fragments using "+" and arbitrary function calls.
This means you have to parse the Java in a compiler sense, know what the meaning of each symbol is, and trace values through the dataflows in the code.
There's no way on earth a regex can do this reliably. (You can do it badly and maybe that's good enough for you, I suggest hunting for all jdbc method call names).
There's a worse problem: once you've figured out what the SQL string is, you know need to know if it is MSSQL-compliant. That requires parsing the abstract string (remember, it is assembled from a bunch of fragments) using an MSSQL-compliant parser (again, no regex can do context-free parsing) and complain about the ones that don't parse.
Even that may not be enough, if MSSQL has statements that look identical to sybase statements, but mean different things.
THis is a really hard problem to solve well using automation. (There are research papers that describe all of the above activities).
I think what you will have to do is find all SQL calls, and hand-inspect each for compatibility.
Next time, you should build your application with a database access layer. Then all the SQL calls are in one place.

How to build a SQL string, with values filled in, from a statement with named parameters?

I'm looking for a utility to prepare a SQL statement with named parameters and values, but not execute it. I just want the resulting SQL statement, with values substituted for named params, as a java.lang.String object. I did not find anything in Spring or Apache Commons. [I know how to enable debug logging for java.sql.*] Because I'm querying a db instance on a mainframe, prepared statements are not allowed; the support has been disabled, for some strange reason. That decision is beyond my control or influence. Do you know of a utility that can help me? I guess I could roll my own utility if I had to, but I'd rather not.

This is like saying you'd like to see your Java code with the user's input hardcoded into the source.
That would be ridiculous, because you know that the user's input is never combined with the Java source. It's combined (so to speak) with the compiled Java app at runtime. There is never a time when input data becomes merged into the source.
It's the same way with prepared SQL statements.
During prepare(), the RDBMS receives the textual SQL string, and parses it internally and retains a sort of "bytecode" version of the query. In this internal representation, the parameter placeholders are noted, and the query cannot execute before values are provided.
During execute(), the RDBMS receives parameter values, and combines them with the bytecode. The parameters never see the original SQL text! That's the way it is supposed to work.

First, you should know that one of the reasons for the prepared statement is security. The natural way of simply replacing placeholders with textual representation of parameters and then sending a simple string had been the cause of many SQL injection attacks. A classical example is
SELECT * FROM tab WHERE tabid = ?
with a parameter of 1; DELETE FROM tab and textual replacement of parameters, you transform a simple query in a delete all statement. Of course real attacks could be much more clever than that ...
It is really strange that in a mainframe database, one recommends plain SQL statements over prepared statements. In my experiences, security reasons leaded to the opposite rule. You should really ask for the reason of that, and what is the recommended approach. It could be be the usage of a special library, or a framework or ... but if you can, do avoid textual replacement.
Edit:
If your are really stuck with textual replacement, you will have to roll your own utility. As explained above, I cannot imagine the a framework does that. In a real word application, it can be made reasonably sure if you can validate all the inputs to avoid any possibility of SQL injection (no special characters or only at know places). But if I were at your place, I would not try to mimic SQL prepared statements, and I would simply use String.format where the format string will be the SQL query with placeholders in Formatter syntax.

How to use antlr to create SQL Select Query parser?

i am using squiggle to create dynamic select queries for postgresql, in one point of my application i need to reverse engineer the created select query and extract columns, tables, joins, orders,group by and .... from it, i googled a bit and found general sql parser which meets my demands but i need an open source solution to be able to modify it as i wish, i googled a bit more and read some where that i need antlr to write my own parser like fellas in hibernate did. the question is i don't know how to use it and not able to find an easy to understand example. the more important question is, considering scope of problem ( a select query) do i really need antlr to do this?
thank you

Don't reinvent the wheel. Use for example JSqlParser (github/jsqlparser). It comes with examples of extracting tablenames.

How can I support the SQL GO statement in a Java / jtds application?

I'm working on a Java based OSS app SqlHawk which as one of its features is to run upgrade sql scripts against a server.
Microsoft have made it a convention to split a script into batches with the GO statement, which is a good idea but just asking for false matches on the string.
At the moment I have a very rudimentary:
// split where GO on its own on a line
Pattern batchSplitter = Pattern.compile("^GO", Pattern.MULTILINE);
...
String[] splitSql = batchSplitter.split(definition);
...
which kind of works but is prone to being tripped up by things like quoted GO statements or indentation issues.
I think the only way to make this truly reliable is to have an SQL parser in the app, but I have no idea how to go about this, or whether that might actually end up being less reliable (especially given this tool supports multiple DBMSs).
What ways could I solve this problem? Code examples would be very helpful to me here.
Relevant sqlHawk code on github.
Currently using jtds to execute the batches found in the scripts.

GO is a client batch seperator command. You can replace it with ;. It should not be sent in your EXEC dynamic SQL.
USE master
GO --<----- client actually send the first batch to SQL and wait for a response
SELECT * from sys.databases
GO
Should be translated in
Application.Exec("USE master");
Application.Exec("SELECT * from sys.databases");
or you can write it this way:
Application.Exec("'USE master;SELECT * from sys.databases")
More about GO
http://msdn.microsoft.com/en-us/library/ms188037(v=sql.90).aspx

Ok, so this isn't going to be exactly what you want, but you might find it a start. I released SchemaEngine (which forms the core of most of my products) as open source here. In there, you will find C# code that does what you want very reliably (i.e. not tripping up with strings, comments etc etc). It also support the 'GO x' syntax to repeat a batch x times.
If you download that and have a look in /Atlantis.SchemaEngine/Helpers you'll find a class called BatchParser.cs which contains a method called ParseBatches - which does pretty much what it says on the tin.

Google App Engine and SQL LIKE

Is there any way to query GAE datastore with filter similar to SQL LIKE statement? For example, if a class has a string field, and I want to find all classes that have some specific keyword in that string, how can I do that?
It looks like JDOQL's matches() don't work... Am I missing something?
Any comments, links or code fragments are welcome

As the GAE/J docs say, BigTable doesn't have such native support. You can use JDOQL String.matches for "something%" (i.e startsWith). That's all there is. Evaluate it in-memory otherwise.

If you have a lot of items to examine you want to avoid loading them at all. The best way would probably be to break down the inputs a write time. If you are only searching by whole words then that is easy
For example, "Hello world" becomes "Hello", "world" - just add both to a multi valued property. If you have a lot of text you want to avoid loading the multi valued property because you only need it for the index lookup. You can do this by creating a "Relation Index Entity" - see bret slatkins Google IO talk for details.
You may also want to break down the input into 3 character, 4 character etc strings or stem the words - perhaps with a lucene stemmer.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to handle HUGE SQL strings in the source code - java

It seems to me that making those hard-coded values into stored procedures and referencing the sprocs from code instead might be high yield and low effort.

I guess the first question is, what are you supposed to do with it? If it's not broken, quietly close it up and pretend you've never seen it. Otherwise, refactor like mad - hopefully there's some exit condition somewhere that looks something like a contract.

The best thing I could come up with so far is to put the query into several stored procedures, the same way I would handle a method thats too long in Java.

Use the framework iBatis

What I do in PHP is this: $query = "SELECT * FROM table WHERE "; $query .= "condition < 5 AND "; $query .= "condition2 > 10 AND "; and then, once you've finished layering on $query: mysql_query($query);

One easy way would be to break them out into some kind of constants. That would at least make the code more readable.

I store them in files (or resources), and then read and cache them on app start (or on change if it's a service or something). Or, just put them into a big old SqlQueries class as consts or readonlys.

Related

How to retrieve all SQL-queries from Java source code?

How to build a SQL string, with values filled in, from a statement with named parameters?

How to use antlr to create SQL Select Query parser?

How can I support the SQL GO statement in a Java / jtds application?

Google App Engine and SQL LIKE

Categories

Resources