Good Practices For Generating Variations of SQL Queries

Good Practices For Generating Variations of SQL Queries - java

I'm working on a (Java) project that requires different variations of SQL queries depending on the filters the user wants to use.
I have 4+ queries right now that all use the same tables, require the same table joins, and use the same "order by". Currently I have hard coded these queries into the code and I'm not happy with that. I would like to dynamically generate them but I'm having trouble figuring out a solution or if I even should bother to generate them.
Note: I can not use stored procedures.
EXAMPLE:
SELECT t1.column1, t2.column2, t3.column3 FROM
(SELECT column1, column2, sum(column3) FROM t1
WHERE X = Y
GROUP BY column1, column2
ORDER BY column1)
LEFT JOIN t2 on t1.column1 = t2.column1
LEFT JOIN t3 on t1.column2 = t3.column2
WHERE Y = Z AND A = B
ORDER BY t1.column1
The differences are in the WHERE, SELECT, and GROUP BY statements. I could put nested if-statements between the dynamic parts but that seems too messy.
if ()
"SELECT A"
else
"SELECT B"
+ "FROM T1"
if ()
"WHERE x = y
"LEFT JOIN ..."
etc.
Doing something like this feels wrong. Should I just stick to hard coding them or is there a better solution?
EDIT: I included it in the tags but I wanted to note up here that I'm using Oracle.

I've had the same type of problem on projects I've done. In some of these cases I've used the Builder pattern to create dynamic SQL statements. One advantage of using a Builder is that you can unit test your Builder for all the combinations of your criteria. Yes, you will still have some conditional logic, but it will all be encapsulated in your SQL Builder.

There are more than one good solutions for this - there are several tools for creating typesafe and easly manipulable SQL strings out there in Java.
But since you are using Oracle, the best way to go is probably prepared statements.
http://en.wikipedia.org/wiki/Prepared_statement
https://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html

If it is only a few (less than, say, ten) combinations, you could have the hard-coded SQL strings to choose from. For more than that, you want to programmatically build the statements on the fly. There is no good way to do this with just JDBC, you might want to invest the time to look at some database access libraries.
To avoid having to manipulate SQL strings when dynamically building database queries, have a look at jOOQ. Or, if you want to go that way, an ORM (such as JPA) will also have a CriteriaBuilder.
You will still have the same conditional statements and logic, but at least you can work on Java objects instead of having to manipulate strings and worry that you get all keywords in the right order and all parens balanced.

Related

OpenJPA distinct on

In my DB schema I have conversations with several emails. I want to get the newest emails from a list of conversations. In PostgreSql the query:
select distinct on (conversation_id) *
from email
where conversation_id in (7085214, 7084964)
order by conversation_id, processing_date desc
OpenJPA:
(List<Email>) entityManager.createQuery("SELECT
distinct(email.conversation.id), email FROM Email email WHERE
email.conversation.id in :id ORDER BY email.conversation.id,
email.processingDate DESC").setParameter("id", conversationIds);
It gives back a map of the conversation ids and the whole list of emails in the conversations.
How could I make it right?
Thanks

Use native SQL.
The only other way to do what you want is to develop a patch to OpenJPA that "teaches" it how to use the PostgreSQL extension DISTINCT ON in its JPQL parser and query generator. Most ORMs accept such extensions via dialect hooks. Don't expect this to be a simple task, though - unless you're writing a lot of these queries, native SQL is almost certain to be much easier.
You can't just use DISTINCT or DISTINCT ON like functions. They aren't; they're completely separate syntax. A JPQL engine would try to convert it into a true function call that'd fail at runtime - or in the case of distinct on, just fail to parse it in the first place.
BTW, DISTINCT ON is a bit like GROUP BY in some other vendor databases like MySQL, where you're allowed to specify columns in the SELECT that don't appear in the GROUP BY or an aggregate. So in MySQL people probably do this by just producing a technically invalid query that MySQL accepts anyway - it's quite likely that the OpenJPA JPQL handler won't notice the problem, so it'll pass it through fine. This trick won't work for DISTINCT ON and PostgreSQL is strictly standards compliant about GROUP BY, it won't let you produce a non-deterministic query with GROUP BY.

Memcache implementation design

Iam trying to implement memcache in my web application and just wanted to get suggestions that whether what iam doing is right in terms of design.
I have a class SimpleDataAccessor which run all my insert, update and select sql queries. So any query that has to be performed is executed inside the method of this class.
So inside the method where I have my select query implementation i have a method which stores the resultset in memcache like this.
storeinMC(resultset.getJSON(),sqlquery);
the sqlquery here is my key.
Also before running the selectquery i check in memcache that whether I have a resultset already for that query.
if((String res=getRSFromMC(sqlquery)==null)
So i've tried to keep it plain and simple.
Do you see any issues with this.?

As rai.skumar rightfully pointed out your SQL statements could be constructed differently (e.g. WHERE clause could contain same conditions in diff order, etc.)
So to overcome above mentioned issues, you need to parse your SQL and get all the relevant pieces from it. Then you can combine these pieces into a cache key.
You can take a look at SQL parsers: ZQL, JSqlParser, General SQL Parser for Java that return you java classes out of your SQL.
Another option would be to use JPA instead of straight JDBC. For example Hibernate has great JPA support and fully capable of caching your queries.
If you feel closer to JDBC you could use MyBatis that has very JDBC like syntax and caching support.

Consider below queries:
String k1 = "Select * from table"; //Query1
String k2 = "Select * from TABLE"; // Query2 ; notice TABLE is in caps
Both of above SQL queries are same and will fetch same data. But if above queries are used as keys in Memchached they will get stored at different places ( as k1.equals(k2) will return false).
Also if somehow you can ensure that there are no typos or extra spaces, it won't be very efficient as keys/queries could be very big.

Get the main where of query

I want to get the index of main where clause of database query in Java ?
How can I handle this with Regex?
For example in this query I want to get Second where clause:
select u.id, (select x.id from XEntity where x.id = 200)
from UserEntity u
**where** u.id in (select g.id from AnotherEntity g where g.id = 100)
I think the main where is which that number of "(" characters and ")" characters is equal after it.
but I don't know how can I get this with regex.
With Best Regards

What Toote and David Brabant said is absolutely correct. Parsing SQL, especially complex SQL using only regex is a very hard problem.
In terms of parsing SQL in Java, which seems to be the thrust of your question, there's a very good (if apparently un-maintained) library called JSQLParser. A more up-to-date version of this library can be found on Github here (disclaimer: I made a very small contribution to this). The main page shows an example of visitor designed to consume the output of the AST here.
There is also a grammar for ANTLR available in it's grammar list. Or, if you're feeling adventurous, the H2 database supports a rather extensive range of SQL, including some proprietary features of, e.g., MySQL. You could modify it's Parser to generate an appropriate structure for extracting the information you need.

Regular Expressions are not very good at recognizing such complex structures as SQL queries can be. Mainly because SQL is not context-free, which is exactly the issue you are running into: the WHERE can appear in a lot of places and you want one in particular that depends on the overall structure of the query.
What you will need is an appropriate parser. The only JavaScript SQL parser I could find is not too complete, but you can always help develop it by making sure it fits your needs.

one-to-many filtering in media database

With pms-mlx, I've been working on a media library to extend the functionalities of the ps3 media server for some time now and just recently discovered a major flow in my concept; it is not possible to filter if using more than one one-to-many property.
Before asking questions, I should explain how it works.
Here's the DB structure for the relevant part of the code. h2 is being used.
Every video is a file and has 0-n of the attached properties.
Once some videos have been stored, it is possible to view and play them in folders showing only a subset of all entries, by setting conditions. Here's an example:
To retrieve the wanted files, the query is being constructed like this
SELECT <all_properties>
FROM FILE, VIDEO
LEFT JOIN VIDEOAUDIO ON VIDEO.FILEID = VIDEOAUDIO.FILEID
LEFT JOIN SUBTITLES ON VIDEO.FILEID = SUBTITLES.FILEID
LEFT JOIN FILETAGS ON VIDEO.FILEID = FILETAGS.FILEID
LEFT JOIN FILEPLAYS ON VIDEO.FILEID = FILEPLAYS.FILEID
WHERE <whereClause>
ORDER BY <orderBy>
Only the where and order by parts are dynamic. The where clause is being created by replacing the condition names (c1, c2) by their corresponding SQL counterparts. For the above example this where clause will be generated:
WHERE VIDEO.FILEID = FILE.ID
AND ((VIDEO.NAME LIKE 'A%' OR VIDEO.NAME LIKE 'B%')
AND FILE.DATEINSERTEDDB > '2011-09-28 18:48:43')
When doing the query, many rows will be returned for each file. While iterating through the results, the 'new' data contained in the raw will be added to create the object.
When the filter contains a one-to-many condition, the query is being done in two steps. First, the IDs of all the videos are being retrieved and then the data loaded.
What I've been missing until now, is that if having e.g. two file tags and both should be met, nothing works anymore.
If setting the filter:
Resulting in this where clause:
WHERE VIDEO.FILEID = FILE.ID
AND ((VIDEO.FILEID = FILETAGS.FILEID
AND FILETAGS.KEY = 'Actor' AND FILETAGS.VALUE LIKE 'A%')
AND (VIDEO.FILEID = FILETAGS.FILEID
AND FILETAGS.KEY = 'Actor' AND FILETAGS.VALUE LIKE 'B%'))
which is never met, as a single row only contains one key and one value field.
Would it be possible to create a query where all the data for a single video would be contained on one row? Or substitute the generated SQL condition by something else?
I'm no db expert and would love to here from someone having a good idea :)
The code to generate the where clause is in the method formatFilter on line 445 of DBFileInfo.java. The loading of the data is being done on line 189 of DBVideoFileInfo.java if anybody is interested to see the code.

Your complex table relationships is a bit confusing for a simple example of the problem, but can you not just build a bunch of criteria up?
WHERE FILEID IN (SELECT [all files that have brad])
AND FILEID IN (SELECT [all files that have angelina])
AND FILEID IN (SELECT [all files that have elvis])
Surely that would only return files that have Brad, Angelina and Elvis in? (Probably not many).

You need your GUI to let user specify if he/she wants filter to work as AND or OR. In your last case, obviously, you need OR. So your WHERE clause will look like:
WHERE VIDEO.FILEID = FILE.ID
AND VIDEO.FILEID = FILETAGS.FILEID
AND FILETAGS.KEY = 'Actor'
AND (FILETAGS.VALUE LIKE 'A%' OR FILETAGS.VALUE LIKE 'B%')
Notice that repeatition of common conditions is eliminated and common piece is moved out to upper level of WHERE.
I think you'll need both your GUI and SQL generator for this.
UPDATE In case you need both 'A%' and 'B%' to be in result movies, you need change WHERE clause to
WHERE VIDEO.FILEID = FILE.ID
AND EXISTS
-- look if movie have 'A%' actor
(SELECT 1 FROM FILETAGS WHERE
AND VIDEO.FILEID = FILETAGS.FILEID
AND FILETAGS.KEY = 'Actor'
AND FILETAGS.VALUE LIKE 'A%')
AND EXISTS
-- look if the same movie have 'B%' actor
(SELECT 1 FROM FILETAGS WHERE
AND VIDEO.FILEID = FILETAGS.FILEID
AND FILETAGS.KEY = 'Actor'
AND FILETAGS.VALUE LIKE 'B%')
Anyway, these are just samples how queries may look like, depending on user's input.
Original question, how I understood it, was about particular situation, not a general solution. Clearly, in general case, your application will need to employ different conditions reflecting user choice in GUI.
For general case, you'll need some form of "SQL Builder", like this one. Or you may want to employ some ORM tool which will build SQL semi-automatically for you. But if you don't know SQL very well, I'll advise you to start with SQL builder to get some low-level SQL experience. Using ORM properly entails good understanding of SQL and related stuff.

Java coding best-practices for reusing part of a query to count

The implementing-result-paging-in-hibernate-getting-total-number-of-rows question trigger another question for me, about some implementation concern:
Now you know you have to reuse part of the HQL query to do the count, how to reuse efficiently?
The differences between the two HQL queries are:
the selection is count(?), instead of the pojo or property (or list of)
the fetches should not happen, so some tables should not be joined
the order by should disappear
Is there other differences?
Do you have coding best-practices to achieve this reuse efficiently (concerns: effort, clarity, performance)?
Example for a simple HQL query:
select a from A a join fetch a.b b where a.id=66 order by a.name
select count(a.id) from A a where a.id=66
UPDATED
I received answers on:
using Criteria (but we use HQL mostly)
manipulating the String query (but everybody agrees it seems complicated and not very safe)
wrapping the query, relying on database optimization (but there is a feeling that this is not safe)
I was hoping someone would give options along another path, more related to String concatenation.
Could we build both HQL queries using common parts?

Have you tried making your intentions clear to Hibernate by setting a projection on your (SQL?)Criteria?
I've mostly been using Criteria, so I'm not sure how applicable this is to your case, but I've been using
getSession().createCriteria(persistentClass).
setProjection(Projections.rowCount()).uniqueResult()
and letting Hibernate figure out the caching / reusing / smart stuff by itself.. Not really sure how much smart stuff it actually does though.. Anyone care to comment on this?

Well, I'm not sure this is a best-practice, but is my-practice :)
If I have as query something like:
select A.f1,A.f2,A.f3 from A, B where A.f2=B.f2 order by A.f1, B.f3
And I just want to know how many results will get, I execute:
select count(*) from ( select A.f1, ... order by A.f1, B.f3 )
And then get the result as an Integer, without mapping results in a POJO.
Parse your query for remove some parts, like 'order by' is very complicated. A good RDBMS will optimize your query for you.
Good question.

Nice question. Here's what I've done in the past (many things you've mentioned already):
Check whether SELECT clause is present.
If it's not, add select count(*)
Otherwise check whether it has DISTINCT or aggregate functions in it. If you're using ANTLR to parse your query, it's possible to work around those but it's quite involved. You're likely better off just wrapping the whole thing with select count(*) from ().
Remove fetch all properties
Remove fetch from joins if you're parsing HQL as string. If you're truly parsing the query with ANTLR you can remove left join entirely; it's rather messy to check all possible references.
Remove order by
Depending on what you've done in 1.2 you'll need to remove / adjust group by / having.
The above applies to HQL, naturally. For Criteria queries you're quite limited with what you can do because it doesn't lend itself to manipulation easily. If you're using some sort of a wrapper layer on top of Criteria, you will end up with equivalent of (limited) subset of ANTLR parsing results and could apply most of the above in that case.
Since you'd normally hold on to offset of your current page and the total count, I usually run the actual query with given limit / offset first and only run the count(*) query if number of results returns is more or equal to limit AND offset is zero (in all other cases I've either run the count(*) before or I've got all the results back anyway). This is an optimistic approach with regards to concurrent modifications, of course.
Update (on hand-assembling HQL)
I don't particularly like that approach. When mapped as named query, HQL has the advantage of build-time error checking (well, run-time technically, because SessionFactory has to be built although that's usually done during integration testing anyway). When generated at runtime it fails at runtime :-) Doing performance optimizations isn't exactly easy either.
Same reasoning applies to Criteria, of course, but it's a bit harder to screw up due to well-defined API as opposed to string concatenation. Building two HQL queries in parallel (paged one and "global count" one) also leads to code duplication (and potentially more bugs) or forces you to write some kind of wrapper layer on top to do it for you. Both ways are far from ideal. And if you need to do this from client code (as in over API), the problem gets even worse.
I've actually pondered quite a bit on this issue. Search API from Hibernate-Generic-DAO seems like a reasonable compromise; there are more details in my answer to the above linked question.

In a freehand HQL situation I would use something like this but this is not reusable as it is quite specific for the given entities
Integer count = (Integer) session.createQuery("select count(*) from ....").uniqueResult();
Do this once and adjust starting number accordingly till you page through.
For criteria though I use a sample like this
final Criteria criteria = session.createCriteria(clazz);
List<Criterion> restrictions = factory.assemble(command.getFilter());
for (Criterion restriction : restrictions)
criteria.add(restriction);
criteria.add(Restrictions.conjunction());
if(this.projections != null)
criteria.setProjection(factory.loadProjections(this.projections));
criteria.addOrder(command.getDir().equals("ASC")?Order.asc(command.getSort()):Order.desc(command.getSort()));
ScrollableResults scrollable = criteria.scroll(ScrollMode.SCROLL_INSENSITIVE);
if(scrollable.last()){//returns true if there is a resultset
genericDTO.setTotalCount(scrollable.getRowNumber() + 1);
criteria.setFirstResult(command.getStart())
.setMaxResults(command.getLimit());
genericDTO.setLineItems(Collections.unmodifiableList(criteria.list()));
}
scrollable.close();
return genericDTO;
But this does the count every time by calling ScrollableResults:last().

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.