How to use antlr to create SQL Select Query parser?

How to use antlr to create SQL Select Query parser? - java

i am using squiggle to create dynamic select queries for postgresql, in one point of my application i need to reverse engineer the created select query and extract columns, tables, joins, orders,group by and .... from it, i googled a bit and found general sql parser which meets my demands but i need an open source solution to be able to modify it as i wish, i googled a bit more and read some where that i need antlr to write my own parser like fellas in hibernate did. the question is i don't know how to use it and not able to find an easy to understand example. the more important question is, considering scope of problem ( a select query) do i really need antlr to do this?
thank you

Don't reinvent the wheel. Use for example JSqlParser (github/jsqlparser). It comes with examples of extracting tablenames.

Related

How to create a simple Italian Model for a Named Entity Extraction of Persons using OpenNLP?

I have to do a project with OpenNLP, strictly in italian language. Since it's almost impossible to find some existing structures in this language, my idea is to create a simple model myself. Reading some posts on this platform, my idea is try to do this using model-builder addon.
First of all, it's possible to obtain my goal with this addon?
If so, referring to this other post, what kind of file is meant by "modelOutFile"? In my case I don't have an existing model.
N.B.: the addon uses some deprecated functions (such as nameFinderME.train()).
Naively, I tried to pass as a "modelOutFile" a simple empty file "model.bin", but, of course I bumped into an error:
Cannot invoke "java.util.Properties.getProperty(String)" because "manifest" is null
Furthermore, I used a few names and sentences for the test (I only wanted to know if this worked), not the large amount requested (15000 sentences at least).
I'm open to other suggestions instead of the use of modelbuilder addons.
Hope someone can help me.

Simple dynamic hyperlink

I want to see JR create, if possible, a dynamic (for lack of a better term known to me) hyperlink:
This is the sql query which fetches the data:
select name,ID
from table
where condition1=65537;
What I want JR to do is to change the hyperlink reference expression which is initially:
"https://www.some_site.com/item=X"
X
needs to be the ID number and be able to be parsed as a normal link
I tried the more obvious ideas:
X=$F{ID} /// doesn't work
X=$F{ID}.intValue() /// doesn't work
Is it:
a) even possible?
b) if possible, how?
I might be missing something very obvious, so apologies if this is something already addressed.

Sorry for the possible spam
"https://www.some_site.com/ID="+$F{ID}
is now the hyperlink reference expression and it all works out, in all possible export formats of jasper

How to retrieve all SQL-queries from Java source code?

We have many Java Spring projects which use Sybase database.
We want to migrate it to MSSQL.
One of the tasks is to develop a script to find all SQL-queries used in the projects' source code. Moreover, there is a brought usage of stored procedures in the projects.
What is an appropriate approach to do so?
#Override
public void update(int id, Entity entity) {
jdbcTemplate.update(
"UPDATE exclusion SET [enabled] = :enabled WHERE [id] = :id",
HashMapBuilder.<String, Object>builder()
.put("id", id)
.put("enabled", entity.enabled)
.build()
);
}
It is the easiest case.
Firstly, we want to REGEX the source code in order to find SQL by a list of SQL keywords.

In essence, you want to find any (SQL) string being fed to a jdbc call.
This means your tool must know what the jdbc methods are (e.g., "jdbcTemplate.update"), and which argument of each method is string intended to be SQL. That's sort of easy since it is documented.
What is hard is to find the string, because you assemble it dynamically; there's no guarantee that the entire SQL string is actually sitting as a direct argument to the function call. It might be computed by combining SQL string fragments using "+" and arbitrary function calls.
This means you have to parse the Java in a compiler sense, know what the meaning of each symbol is, and trace values through the dataflows in the code.
There's no way on earth a regex can do this reliably. (You can do it badly and maybe that's good enough for you, I suggest hunting for all jdbc method call names).
There's a worse problem: once you've figured out what the SQL string is, you know need to know if it is MSSQL-compliant. That requires parsing the abstract string (remember, it is assembled from a bunch of fragments) using an MSSQL-compliant parser (again, no regex can do context-free parsing) and complain about the ones that don't parse.
Even that may not be enough, if MSSQL has statements that look identical to sybase statements, but mean different things.
THis is a really hard problem to solve well using automation. (There are research papers that describe all of the above activities).
I think what you will have to do is find all SQL calls, and hand-inspect each for compatibility.
Next time, you should build your application with a database access layer. Then all the SQL calls are in one place.

How can I support the SQL GO statement in a Java / jtds application?

I'm working on a Java based OSS app SqlHawk which as one of its features is to run upgrade sql scripts against a server.
Microsoft have made it a convention to split a script into batches with the GO statement, which is a good idea but just asking for false matches on the string.
At the moment I have a very rudimentary:
// split where GO on its own on a line
Pattern batchSplitter = Pattern.compile("^GO", Pattern.MULTILINE);
...
String[] splitSql = batchSplitter.split(definition);
...
which kind of works but is prone to being tripped up by things like quoted GO statements or indentation issues.
I think the only way to make this truly reliable is to have an SQL parser in the app, but I have no idea how to go about this, or whether that might actually end up being less reliable (especially given this tool supports multiple DBMSs).
What ways could I solve this problem? Code examples would be very helpful to me here.
Relevant sqlHawk code on github.
Currently using jtds to execute the batches found in the scripts.

GO is a client batch seperator command. You can replace it with ;. It should not be sent in your EXEC dynamic SQL.
USE master
GO --<----- client actually send the first batch to SQL and wait for a response
SELECT * from sys.databases
GO
Should be translated in
Application.Exec("USE master");
Application.Exec("SELECT * from sys.databases");
or you can write it this way:
Application.Exec("'USE master;SELECT * from sys.databases")
More about GO
http://msdn.microsoft.com/en-us/library/ms188037(v=sql.90).aspx

Ok, so this isn't going to be exactly what you want, but you might find it a start. I released SchemaEngine (which forms the core of most of my products) as open source here. In there, you will find C# code that does what you want very reliably (i.e. not tripping up with strings, comments etc etc). It also support the 'GO x' syntax to repeat a batch x times.
If you download that and have a look in /Atlantis.SchemaEngine/Helpers you'll find a class called BatchParser.cs which contains a method called ParseBatches - which does pretty much what it says on the tin.

How to handle HUGE SQL strings in the source code

I am working on a project currently where there are SQL strings in the code that are around 3000 lines.
The project is a java project, but this question could probably apply for any language.
Anyway, this is the first time I have ever seen something this bad.
The code base is legacy, so we can suddenly migrate to Hibernate or something like that.
How do you handle very large SQL strings like that?
I know its bad, but I don't know exactly what is the best thing to suggest for a solution.

It seems to me that making those hard-coded values into stored procedures and referencing the sprocs from code instead might be high yield and low effort.

Does the SQL has a lot of string concatenations for the variables?
If it doesn't you can extract them a put them in resources files. But you'll have to remove the string conatentation in the line breaks.
The stored procedure approach you used is very good, but sometimes when there's need to understand what the SQL is doing, you have to switch from workspace to your favorite SQL IDE. That's the only bad thing.
For my suggestion it would be like this:
String query = "select ......."+
3000 lines.
To
ResourceBundle bundle = ResourceBundle.getBundle("queries");
String query = bundle.getString( "customerQuery" );
Well that's the idea.

I guess the first question is, what are you supposed to do with it? If it's not broken, quietly close it up and pretend you've never seen it. Otherwise, refactor like mad - hopefully there's some exit condition somewhere that looks something like a contract.

The best thing I could come up with so far is to put the query into several stored procedures, the same way I would handle a method thats too long in Java.

I'm in the same spot you are... My plan was to pull the SQL into separate.sql files within the project and create a utility method to read the file in when I need the query.
string sql = "select asd,asdf,ads,asdf,asdf,"
+ "from asdfj asfj as fasdkfjl asf"
+ "..........................."
+ "where user = #user and ........";
The query gets dumped into a file called usageReportByUser.sql
and the becomes something like this.
string sql = util.queries("usageReportByUser");
Make sure it's done in a way that the files are not publicly accessible.

Use the framework iBatis

I wrote a toolkit for this a while back and have used it in several projects. It allows you to put queries in mostly text files and generate bindings and documentation for them.
Check out this example and an example use (it's pretty much just a prepared statement compiled to a java class with early bound/typesafe query bindings).
It generates some nice javadocs, too, but I don't have any of those online at the moment.

I second the iBatis recommendation. At the least, you can pull the SQL out from Java code that most likely uses StringBuffer and appending or String concat into XML where it is just easier to maintain.
I did this for a legacy web app and I turned on debugging and ran unit tests for the DAOs and just copied the generated sql for each statement into the iBatis xml. Worked pretty smooth.

What I do in PHP is this:
$query = "SELECT * FROM table WHERE ";
$query .= "condition < 5 AND ";
$query .= "condition2 > 10 AND ";
and then, once you've finished layering on $query:
mysql_query($query);

One easy way would be to break them out into some kind of constants. That would at least make the code more readable.

I store them in files (or resources), and then read and cache them on app start (or on change if it's a service or something).
Or, just put them into a big old SqlQueries class as consts or readonlys.

I have had success converting large dynamic queries to linq queries. (1K lines+) This works very well for reporting scenarios where you have a lot of dynamic filtering and dynamic grouping on a relatively small number of tables. Create an edmx for those tables and you can write excellent strong typed composable queries.
I found that performance actually improved and the resulting sql was a lot simpler.
I'm sure you would get similar mileage with Hibernate - other than being able to use linq ofcourse. But generaly speaking if the generated sql needs to be highly dynamic, then it is not a good candidate for a stored proc. writing dynamic sql inside a stored proc is the worst of both worlds. sql generation frameworks would be my preferred way to go here. If you dig Hibernate I think it would be a good solution.
That being said, if the queries are just simple strings with parameters, then just throw them in a stored proc and be done with it. -but then you miss out on the results being nice to work with objects.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to use antlr to create SQL Select Query parser? - java

Don't reinvent the wheel. Use for example JSqlParser (github/jsqlparser). It comes with examples of extracting tablenames.

Related

How to create a simple Italian Model for a Named Entity Extraction of Persons using OpenNLP?

Simple dynamic hyperlink

How to retrieve all SQL-queries from Java source code?

How can I support the SQL GO statement in a Java / jtds application?

How to handle HUGE SQL strings in the source code

Categories

Resources