How to use Hibernate Query|Criteria.scroll() with DISTINCT without duplications

How to use Hibernate Query|Criteria.scroll() with DISTINCT without duplications - java

I have implemented usage of ScrollableResults for big DB table, everything worked perfectly until I wanted to do the same for another table using joins.
The entity where I have the problem has some one-to-many associations so I have to use DISTINCT not to get duplicates. Everything works well when I am obtaining results of the query using list(). But when I use scroll(), DISTINCT seems to be ignored completely - I just get many duplicates.
Query query = gameSession.createQuery("SELECT DISTINCT c FROM City c JOIN FETCH c.inhabitans i");
This works well, list has no duplicates:
List<City> list = query.list();
This is does not work (giving many duplicates like there would be no DISTINCT used):
ScrollableResults sr = query.scroll(ScrollMode.FORWARD_ONLY);
Everything is the same when I use Criteria instead of Query. I have found out only 3 things about this particular problem:
Few questions like mine without answer,
A bug describing a case that could be absolutely the same like mine, but that should have been fixed long time ago,
Little comment in one of the SO answers telling that "DISTINCT_ROOT_ENTITY does not interact very well when scroll() is used".
This makes ScrollableResults useless for me, but I still need it because of the huge memory save. Do you know how to achieve scrolling results with DISTINCT used? Or any workaround?
Hibernate version: 4.2.4; JDK 7; DB: MSSQL

Add an "order by" clause to your query with the Id of the root entity.

Related

Performace issues while using hibernate 'Restrictions.in'

I have a table in Oracle: UserDetail[id, name, country]. The problem statement is to get all 'UserDetail' entities whose 'name' is from a list of given input list using Hibernate. The most obvious solution is using 'Restrictions.in' from hibernate criteria api:
//Session Construction code
List<String> usernames = getUserNames();
Criteria criteria = session.createCriteria(UserDetail.class);
criteria.add(Restrictions.in("name", usernames)); //usernames -> List of usernames
List<UserDetail> users = criteria.list();
The question is will there be any performance issues if the size of list(usernames) is like 10k and the number of users present in the database is roughly around 10 million. Would like to know what the performance issue will be and what can be an alternative way to get the data set with this kind of filter.
Thanks in advance

If this list of user names is determined through some kind of query, I would recommend you embed that query as subquery instead as that will usually perform better. If the list is provided by a user, there is not much you can do other than what you already have.

PostgreSQL multiple 'WHERE' conditions (1000+) request

I'm not a pro in SQL at all :)
Having a very critical performance issue.
Here is the info directly related to problem.
I have 2 tables in my DB- table condos and table goods.
table condos have the fields:
id (PK)
name
city
country
table items:
id (PK)
name
multiple fields not related to issue
condo_id (FK)
I have 1000+ entities in condos table and 1000+ in items table.
The problem is how i perform items search
currently it is:
For example, i want to get all the items for city = Sydney
Perform a SELECT condos.condo_id FROM public.condos WHERE city = 'Sydney'
Make a SELECT * FROM public.items WHERE item.condo_id = ? for each condo_id i get in step 1.
The issue is that once i get 1000+ entities in condos table, the request is performed 1000+ times for each condo_id belongs to 'Sydney'. And the execution of this request takes more then a 2 minutes which is a critical performance issue.
So, the questions is:
What is the best way for me to perform such search ? should i put a 1000+ id's in single WHERE request? or?
For add info, i use PostgreSQL 9.4 and Spring MVC.

Use a table join to perform a query such that you do not need to perform a additional query. In your case you can join condos and items by condo_id which is something like:
SELECT i.*
FROM public.items i join public.condos c on i.condo_id = c.condo_id
WHERE c.city = 'Sydney'
Note that performance tuning is a board topic. It can varied from environment to environment, depends on how you structure the data in table and how you organize the data in your code.
Here is some other suggestion that may also help:
Try to add index to the field where you use sorting and searching, e.g. city in condos and condo_id in items. There is a good answer to explain how indexing work.
I also recommend you to perform EXPLAIN to devises a query plan for your query whether there is full table search that may cause performance issue.
Hope this can help.

Essentially what you need is to eliminate the N+1 query and at the same time ensure that your City field is indexed. You have 3 mechanisms to go. One is already stated in one of the other answers you have received this is the SUBSELECT approach. Beyond this approach you have another two.
You can use what you have stated :
SELECT condos.condo_id FROM public.condos WHERE city = 'Sydney'
SELECT *
FROM public.items
WHERE items.condo_id IN (up to 1000 ids here)
the reason why I am stating up to 1000 is because some SQL providers have limitations.
You also can do join as a way to eliminate the N+1 selects
SELECT *
FROM public.items join public.condos on items.condo_id=condos.condo_id and condos.city='Sydney'
Now what is the difference in between the 3 queries.
Pros of Subselect query is that you get everything at once.
The Cons is that if you have too many elements the performance may suffer:
Pros of simple In clause. Effectivly solves the N+1 problem,
Cons may lead to some extra queries compared to the Subselect
Joined query pros, you can initialize in one go both Condo and Item.
Cons leads to some data duplication on Condo side
If we have a look into a framework like Hibernate, we can find there that in most of the cases as a fetch strategy is used either Joined either IN strategies. Subselect is used rarely.
Also if you have critical performance you may consider reading everything In Memory and serving it from there. Judging from the content of these two tables it should be fairly easy to just upload it into a Map.
Effectively everything that solves your N+1 query problem is a solution in your case if we are talking of just 2 times 1000 queries. All three options are solutions.

You could use the first query as a subquery in an in operator in the second query:
SELECT *
FROM public.items
WHERE item.condo_id IN (SELECT condos.condo_id
FROM public.condos
WHERE city = 'Sydney')

sql query into hibernate

I am a beginner at hibernate and have read up a lot but I'm stuck at this one point.
In my JSF app that I'm implementing hibertate, I have this SQL query that works in my database:
SELECT *
FROM CourseProduct
INNER JOIN Course
ON CourseProduct.number=Course.number
inner join Product
on CourseProduct.product=Product.product;
I am trying to do the same thing with hibernate for my JSF application. So far I came up with:
List results = session.createCriteria(Course.class)
.setFetchMode("product", FetchMode.JOIN)
.setFetchMode("number", FetchMode.JOIN)
.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY)
.list();
Is this correct or completely wrong? Also how do I access the fields from the results (if I even have to do that, since hibernate populates the classes for me)? It seems like the results I get are only the the Course Table, the value of the primary key in Product, but not the other 2 fields in the table Product.
EDIT
I guess I solved my own problem. It looks as though the above code is correct, I just didn't realize that in order to access the class Product I had to access it from the Set in the Course class! I just used an iterator to get the data I need in the get method for the set of Products in the Course class.

I guess I solved my own problem. It looks as though the above code is correct, I just didn't realize that in order to access the class Product I had to access it from the Set in the Course class! I just used an iterator to get the data I need in the get method for the set of Products in the Course class.
Update: I really solved the problem. I just got rid of hibernate. The sql query works fine, got my data using a perpared Statement and the result set using the regular old way (java.sql.DriverManager).
For some reason the hibernate driver didn't even like using my statement as a native SQL (kept giving me an exception trying to convert an Integer). I googled the problem and they say it's a bug in hibernate!

Java coding best-practices for reusing part of a query to count

The implementing-result-paging-in-hibernate-getting-total-number-of-rows question trigger another question for me, about some implementation concern:
Now you know you have to reuse part of the HQL query to do the count, how to reuse efficiently?
The differences between the two HQL queries are:
the selection is count(?), instead of the pojo or property (or list of)
the fetches should not happen, so some tables should not be joined
the order by should disappear
Is there other differences?
Do you have coding best-practices to achieve this reuse efficiently (concerns: effort, clarity, performance)?
Example for a simple HQL query:
select a from A a join fetch a.b b where a.id=66 order by a.name
select count(a.id) from A a where a.id=66
UPDATED
I received answers on:
using Criteria (but we use HQL mostly)
manipulating the String query (but everybody agrees it seems complicated and not very safe)
wrapping the query, relying on database optimization (but there is a feeling that this is not safe)
I was hoping someone would give options along another path, more related to String concatenation.
Could we build both HQL queries using common parts?

Have you tried making your intentions clear to Hibernate by setting a projection on your (SQL?)Criteria?
I've mostly been using Criteria, so I'm not sure how applicable this is to your case, but I've been using
getSession().createCriteria(persistentClass).
setProjection(Projections.rowCount()).uniqueResult()
and letting Hibernate figure out the caching / reusing / smart stuff by itself.. Not really sure how much smart stuff it actually does though.. Anyone care to comment on this?

Well, I'm not sure this is a best-practice, but is my-practice :)
If I have as query something like:
select A.f1,A.f2,A.f3 from A, B where A.f2=B.f2 order by A.f1, B.f3
And I just want to know how many results will get, I execute:
select count(*) from ( select A.f1, ... order by A.f1, B.f3 )
And then get the result as an Integer, without mapping results in a POJO.
Parse your query for remove some parts, like 'order by' is very complicated. A good RDBMS will optimize your query for you.
Good question.

Nice question. Here's what I've done in the past (many things you've mentioned already):
Check whether SELECT clause is present.
If it's not, add select count(*)
Otherwise check whether it has DISTINCT or aggregate functions in it. If you're using ANTLR to parse your query, it's possible to work around those but it's quite involved. You're likely better off just wrapping the whole thing with select count(*) from ().
Remove fetch all properties
Remove fetch from joins if you're parsing HQL as string. If you're truly parsing the query with ANTLR you can remove left join entirely; it's rather messy to check all possible references.
Remove order by
Depending on what you've done in 1.2 you'll need to remove / adjust group by / having.
The above applies to HQL, naturally. For Criteria queries you're quite limited with what you can do because it doesn't lend itself to manipulation easily. If you're using some sort of a wrapper layer on top of Criteria, you will end up with equivalent of (limited) subset of ANTLR parsing results and could apply most of the above in that case.
Since you'd normally hold on to offset of your current page and the total count, I usually run the actual query with given limit / offset first and only run the count(*) query if number of results returns is more or equal to limit AND offset is zero (in all other cases I've either run the count(*) before or I've got all the results back anyway). This is an optimistic approach with regards to concurrent modifications, of course.
Update (on hand-assembling HQL)
I don't particularly like that approach. When mapped as named query, HQL has the advantage of build-time error checking (well, run-time technically, because SessionFactory has to be built although that's usually done during integration testing anyway). When generated at runtime it fails at runtime :-) Doing performance optimizations isn't exactly easy either.
Same reasoning applies to Criteria, of course, but it's a bit harder to screw up due to well-defined API as opposed to string concatenation. Building two HQL queries in parallel (paged one and "global count" one) also leads to code duplication (and potentially more bugs) or forces you to write some kind of wrapper layer on top to do it for you. Both ways are far from ideal. And if you need to do this from client code (as in over API), the problem gets even worse.
I've actually pondered quite a bit on this issue. Search API from Hibernate-Generic-DAO seems like a reasonable compromise; there are more details in my answer to the above linked question.

In a freehand HQL situation I would use something like this but this is not reusable as it is quite specific for the given entities
Integer count = (Integer) session.createQuery("select count(*) from ....").uniqueResult();
Do this once and adjust starting number accordingly till you page through.
For criteria though I use a sample like this
final Criteria criteria = session.createCriteria(clazz);
List<Criterion> restrictions = factory.assemble(command.getFilter());
for (Criterion restriction : restrictions)
criteria.add(restriction);
criteria.add(Restrictions.conjunction());
if(this.projections != null)
criteria.setProjection(factory.loadProjections(this.projections));
criteria.addOrder(command.getDir().equals("ASC")?Order.asc(command.getSort()):Order.desc(command.getSort()));
ScrollableResults scrollable = criteria.scroll(ScrollMode.SCROLL_INSENSITIVE);
if(scrollable.last()){//returns true if there is a resultset
genericDTO.setTotalCount(scrollable.getRowNumber() + 1);
criteria.setFirstResult(command.getStart())
.setMaxResults(command.getLimit());
genericDTO.setLineItems(Collections.unmodifiableList(criteria.list()));
}
scrollable.close();
return genericDTO;
But this does the count every time by calling ScrollableResults:last().

Hibernate equivalent to EclipseLink's batch query hint?

One thing I like about EclipseLink has this great thing called the batch query hint, which I'm yet to find a Hibernate equivalent of.
Basically doing a whole bunch of joins gets messy real quick and you end up querying way more data than you necessarily want (remember that if you join person to 6 addresses the person information is returned 6 times; now keep multiplying that out by extra joins).
Imagine a Person entity with 0:M collections of Address, Email, Phone and OrderHistory. Joining all that is not good but with the batch method:
List persons = entityManager.createQuery("select p from Person p"
.setHint(QueryHints.BATCH, "p.address")
.setHint(QueryHints.BATCH, "p.email")
.setHint(QueryHints.BATCH, "p.phone")
.setHint(QueryHints.BATCH, "p.orderHistory")
.getResultList();
This will do a query on the Person table and that's it. When you first access an address record it will do a single query for the entire Address table. If you specified a where clause on the Person table, this same criteria will be used for the Address load too.
So instead of doing 1 query, you do 5.
If you were doing that with joins you might get it all in one query but you may very well be loading way more data because of the joins.
Anyway, I've gone looking in the Hibernate docs for an equivalent to this but don't see one. Is there one?

There isn't one.

There are two things I know of that might help:
1) hibernate.default_batch_fetch_size
2) Criteria.setFetchMode and Criteria.setFetchSize

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.