Get the main where of query - java

I want to get the index of main where clause of database query in Java ?
How can I handle this with Regex?
For example in this query I want to get Second where clause:
select u.id, (select x.id from XEntity where x.id = 200)
from UserEntity u
**where** u.id in (select g.id from AnotherEntity g where g.id = 100)
I think the main where is which that number of "(" characters and ")" characters is equal after it.
but I don't know how can I get this with regex.
With Best Regards

What Toote and David Brabant said is absolutely correct. Parsing SQL, especially complex SQL using only regex is a very hard problem.
In terms of parsing SQL in Java, which seems to be the thrust of your question, there's a very good (if apparently un-maintained) library called JSQLParser. A more up-to-date version of this library can be found on Github here (disclaimer: I made a very small contribution to this). The main page shows an example of visitor designed to consume the output of the AST here.
There is also a grammar for ANTLR available in it's grammar list. Or, if you're feeling adventurous, the H2 database supports a rather extensive range of SQL, including some proprietary features of, e.g., MySQL. You could modify it's Parser to generate an appropriate structure for extracting the information you need.

Regular Expressions are not very good at recognizing such complex structures as SQL queries can be. Mainly because SQL is not context-free, which is exactly the issue you are running into: the WHERE can appear in a lot of places and you want one in particular that depends on the overall structure of the query.
What you will need is an appropriate parser. The only JavaScript SQL parser I could find is not too complete, but you can always help develop it by making sure it fits your needs.

Related

Can I use JOOQ as an SQL parser?

I'm trying to parse a SELECT statement in Java. I'm familiar with JOOQ, and was hoping to use that. I know it's not explicitly designed as an SQL parser—it's actually a lot more than that, so I was thinking there might be a way to use its internal parsers to parse SELECT queries.
I saw some information on how to access some of JOOQ's internals using the Visitor pattern, but I need to navigate inside the query using a tree-like structure that will allow access to each part of the query individually. I don't want to use the Visitor pattern for all use cases.
Is this possible? How would I go about doing it?
Yes, you can. jOOQ has a parser that can be used:
Programmatically
As a CLI
Online, as a SQL dialect translator
As of jOOQ 3.17, there's an experimental model API which can be used to traverse your expression tree externally, e.g. using pattern matching, or internally using the new Traverser API. It is also still possible to traverse the expression tree using a VisitListener when rendering the expression tree back to SQL.
A full-fledged SQL parser is available from DSLContext.parser() and from DSLContext.parsingConnection() (see the manual's section about parsing connections for the latter).
The SQL Parsing API page gives this trivial example:
ResultQuery<?> query =
DSL.using(configuration)
.parser()
.parseResultQuery("SELECT * FROM (VALUES (1, 'a'), (2, 'b')) t(a, b)");
parseResultQuery is the method you need for a single SELECT query, use parse(String) if you may have multiple queries.

Filtering executed SQL queries on application server

First of all, I know this is bad practice but regardless I'm still looking for an answer.
In our web application we have a textarea where the user can write SQL to bring in custom data sets and view them in a chart. The way this works is essentially taking the written string and executing it as a query. What I'm looking for is everything I need to implement in our application server back end security wise as to disallow the execution of queries that produce results other than SELECT type queries.
The user won't be able to execute any type of SELECT query he wants since the app server backend expends the returned result set to have 2 columns named X_FIELD and Y_FIELD so we're not so much worried about the user being able to view data as much as him executing SQL that will break the database.
What we thought of doing is parsing the string for keywords such as DROP, ALTER, CREATE etc. Are there specific things that we have to look out for? Is there a tool/library that automates this? We're using java for our back end code.
Filtering queries can be done at the application level but it requires much more database-specific expertise than creating separate security systems for each database.
As an example, I created an open source program that can do this for Oracle. It won't solve your problem but the code can at least help explain why this is a bad idea.
First, it's important to understand that Oracle SQL syntax is much more complicated than most programming languages, such as Java.
Oracle has 2175 keywords and almost none of them are reserved. Forget about parsing SQL - none of the existing 3rd party parsers are accurate enough to do this securely.
Luckily a full parser is not needed for this task. Oracle syntax is structured in such a way that any statement can be classified with only 8 tokens, excluding
whitespace and comments.
But building a tokenizer and a
statement classifier is still difficult. That solution will handle
unusual kinds of selects, such as (select * from dual) or with asdf as (select 1 a from dual) select a from asdf;. But even a SELECT statement can cause
changes to the database; either through PL/SQL hidden in a function or type, or locking rows through a for update.
And don't forget to remove the (sometimes optional) terminator. They work fine
in most IDEs, but they are not allowed in dynamic SQL. Don't just remove the last characters, or the last token, because some SELECT statements allow semicolons in the middle.
That's a lot of work for just one database! If you want to use this method to implement security policies you need almost 100% accuracy. Very few people are fanatical enough about any database to build this. There's no chance you can do this for multiple databases.

Good Practices For Generating Variations of SQL Queries

I'm working on a (Java) project that requires different variations of SQL queries depending on the filters the user wants to use.
I have 4+ queries right now that all use the same tables, require the same table joins, and use the same "order by". Currently I have hard coded these queries into the code and I'm not happy with that. I would like to dynamically generate them but I'm having trouble figuring out a solution or if I even should bother to generate them.
Note: I can not use stored procedures.
EXAMPLE:
SELECT t1.column1, t2.column2, t3.column3 FROM
(SELECT column1, column2, sum(column3) FROM t1
WHERE X = Y
GROUP BY column1, column2
ORDER BY column1)
LEFT JOIN t2 on t1.column1 = t2.column1
LEFT JOIN t3 on t1.column2 = t3.column2
WHERE Y = Z AND A = B
ORDER BY t1.column1
The differences are in the WHERE, SELECT, and GROUP BY statements. I could put nested if-statements between the dynamic parts but that seems too messy.
if ()
"SELECT A"
else
"SELECT B"
+ "FROM T1"
if ()
"WHERE x = y
"LEFT JOIN ..."
etc.
Doing something like this feels wrong. Should I just stick to hard coding them or is there a better solution?
EDIT: I included it in the tags but I wanted to note up here that I'm using Oracle.
I've had the same type of problem on projects I've done. In some of these cases I've used the Builder pattern to create dynamic SQL statements. One advantage of using a Builder is that you can unit test your Builder for all the combinations of your criteria. Yes, you will still have some conditional logic, but it will all be encapsulated in your SQL Builder.
There are more than one good solutions for this - there are several tools for creating typesafe and easly manipulable SQL strings out there in Java.
But since you are using Oracle, the best way to go is probably prepared statements.
http://en.wikipedia.org/wiki/Prepared_statement
https://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html
If it is only a few (less than, say, ten) combinations, you could have the hard-coded SQL strings to choose from. For more than that, you want to programmatically build the statements on the fly. There is no good way to do this with just JDBC, you might want to invest the time to look at some database access libraries.
To avoid having to manipulate SQL strings when dynamically building database queries, have a look at jOOQ. Or, if you want to go that way, an ORM (such as JPA) will also have a CriteriaBuilder.
You will still have the same conditional statements and logic, but at least you can work on Java objects instead of having to manipulate strings and worry that you get all keywords in the right order and all parens balanced.

Design pattern for constructing conditional SQL statement

I have an issue.
I have SQL that I need to append different type of "restrictions" or even do a join. This depends on user's search criteria.
This SQL will involve different table as it can search one-to-many relationship, therefore hibernate ORM can't support my requirement.
May I know if there is a design pattern to help construct such SQL statements?
The design pattern that fits to the problem of representing a language statement is the Interpreter pattern. But before you start to code your SQL parser, take a look to ANTLR.
And what is more important, ask yourself two questions:
Are the number of different SQL's justify the effort of develop a general SQL interpreter solutions instead of programming (just if-else statements) my 5-10 different queries?
Have I reviewed in detail the Hibernate reference manual?
I exactly have a similar requirement where I have a context-free language to define the search criteria, parsed to ParseEntry objects in a ParseTree which are analogous to the Restrictions. I use a SQLQueryGeneratorVisitor to visit the parse table and generate the SQL query, similary a HibernateCriteriaGeneratorVisitor if the criteria needs to be generated for a single entity. So, I essentially used the Visitor pattern making the parse tree and the entries visitable so that different types of criteria can be generated (SQL/Hibernate or something else in future).

Java coding best-practices for reusing part of a query to count

The implementing-result-paging-in-hibernate-getting-total-number-of-rows question trigger another question for me, about some implementation concern:
Now you know you have to reuse part of the HQL query to do the count, how to reuse efficiently?
The differences between the two HQL queries are:
the selection is count(?), instead of the pojo or property (or list of)
the fetches should not happen, so some tables should not be joined
the order by should disappear
Is there other differences?
Do you have coding best-practices to achieve this reuse efficiently (concerns: effort, clarity, performance)?
Example for a simple HQL query:
select a from A a join fetch a.b b where a.id=66 order by a.name
select count(a.id) from A a where a.id=66
UPDATED
I received answers on:
using Criteria (but we use HQL mostly)
manipulating the String query (but everybody agrees it seems complicated and not very safe)
wrapping the query, relying on database optimization (but there is a feeling that this is not safe)
I was hoping someone would give options along another path, more related to String concatenation.
Could we build both HQL queries using common parts?
Have you tried making your intentions clear to Hibernate by setting a projection on your (SQL?)Criteria?
I've mostly been using Criteria, so I'm not sure how applicable this is to your case, but I've been using
getSession().createCriteria(persistentClass).
setProjection(Projections.rowCount()).uniqueResult()
and letting Hibernate figure out the caching / reusing / smart stuff by itself.. Not really sure how much smart stuff it actually does though.. Anyone care to comment on this?
Well, I'm not sure this is a best-practice, but is my-practice :)
If I have as query something like:
select A.f1,A.f2,A.f3 from A, B where A.f2=B.f2 order by A.f1, B.f3
And I just want to know how many results will get, I execute:
select count(*) from ( select A.f1, ... order by A.f1, B.f3 )
And then get the result as an Integer, without mapping results in a POJO.
Parse your query for remove some parts, like 'order by' is very complicated. A good RDBMS will optimize your query for you.
Good question.
Nice question. Here's what I've done in the past (many things you've mentioned already):
Check whether SELECT clause is present.
If it's not, add select count(*)
Otherwise check whether it has DISTINCT or aggregate functions in it. If you're using ANTLR to parse your query, it's possible to work around those but it's quite involved. You're likely better off just wrapping the whole thing with select count(*) from ().
Remove fetch all properties
Remove fetch from joins if you're parsing HQL as string. If you're truly parsing the query with ANTLR you can remove left join entirely; it's rather messy to check all possible references.
Remove order by
Depending on what you've done in 1.2 you'll need to remove / adjust group by / having.
The above applies to HQL, naturally. For Criteria queries you're quite limited with what you can do because it doesn't lend itself to manipulation easily. If you're using some sort of a wrapper layer on top of Criteria, you will end up with equivalent of (limited) subset of ANTLR parsing results and could apply most of the above in that case.
Since you'd normally hold on to offset of your current page and the total count, I usually run the actual query with given limit / offset first and only run the count(*) query if number of results returns is more or equal to limit AND offset is zero (in all other cases I've either run the count(*) before or I've got all the results back anyway). This is an optimistic approach with regards to concurrent modifications, of course.
Update (on hand-assembling HQL)
I don't particularly like that approach. When mapped as named query, HQL has the advantage of build-time error checking (well, run-time technically, because SessionFactory has to be built although that's usually done during integration testing anyway). When generated at runtime it fails at runtime :-) Doing performance optimizations isn't exactly easy either.
Same reasoning applies to Criteria, of course, but it's a bit harder to screw up due to well-defined API as opposed to string concatenation. Building two HQL queries in parallel (paged one and "global count" one) also leads to code duplication (and potentially more bugs) or forces you to write some kind of wrapper layer on top to do it for you. Both ways are far from ideal. And if you need to do this from client code (as in over API), the problem gets even worse.
I've actually pondered quite a bit on this issue. Search API from Hibernate-Generic-DAO seems like a reasonable compromise; there are more details in my answer to the above linked question.
In a freehand HQL situation I would use something like this but this is not reusable as it is quite specific for the given entities
Integer count = (Integer) session.createQuery("select count(*) from ....").uniqueResult();
Do this once and adjust starting number accordingly till you page through.
For criteria though I use a sample like this
final Criteria criteria = session.createCriteria(clazz);
List<Criterion> restrictions = factory.assemble(command.getFilter());
for (Criterion restriction : restrictions)
criteria.add(restriction);
criteria.add(Restrictions.conjunction());
if(this.projections != null)
criteria.setProjection(factory.loadProjections(this.projections));
criteria.addOrder(command.getDir().equals("ASC")?Order.asc(command.getSort()):Order.desc(command.getSort()));
ScrollableResults scrollable = criteria.scroll(ScrollMode.SCROLL_INSENSITIVE);
if(scrollable.last()){//returns true if there is a resultset
genericDTO.setTotalCount(scrollable.getRowNumber() + 1);
criteria.setFirstResult(command.getStart())
.setMaxResults(command.getLimit());
genericDTO.setLineItems(Collections.unmodifiableList(criteria.list()));
}
scrollable.close();
return genericDTO;
But this does the count every time by calling ScrollableResults:last().

Categories

Resources