As the title states, I want to only retrieve a maximum of for example 1000 rows, but if the queries result would be 1001, i would like to know in some way. I have seen examples which would check the amount of rows in result with a a second query, but i would like to have it in the query i use to get the 1000 rows. I am using hibernate and criteria to receive my results from my database. Database is MS SQL
What you want is not posssible in a generic way.
The 2 usual patterns for pagination are :
use 2 queries : a first one that count, the next one that get a page of result
use only one query, where you fetch one result more than what you show on the page
With the first pattern, your pagination have more functionalities because you can display the total number of pages, and allow the user to jump to the page he wants directly, but you get this possibility at the cost of an additional sql query.
With the second pattern you can just say to the user if there is one more page of data or not. The user can then just jump to the next page, (or any previous page he already saw).
You want to have two information that results from two distinct queries :
select (count) from...
select col1, col2, from...
You cannot do it in a single executed Criteria or JPQL query.
But you can do it with a native SQL query (by using a subquery by the way) with a different way according to the DBMS used.
By making it, you would make more complex your code, make it more dependent to a specific DBMS and you would probably not gained really something in terms of performance.
I think that you should use rather a count and a second query to get the rows.
And if later you want to exploit the result of the count to fetch next results, you should favor the use of the pagination mechanisms provided by Hibernate rather doing it in a custom way.
Related
I want to get all data from offset to limit from a table with about 40 columns and 1.000.000 rows. I tried to index the id column via postgres and get the result of my select query via java and an entitymanager.
My query needs about 1 minute to get my results, which is a bit too long. I tried to use a different index and also limited my query down to 100 but still it needs this time. How can i fix it up? Do I need a better index or is anything wrong with my code?
CriteriaQuery<T> q = entityManager.getCriteriaBuilder().createQuery(Entity.class);
TypedQuery<T> query = entityManager.createQuery(q);
List<T> entities = query.setFirstResult(offset).setMaxResults(limit).getResultList();
Right now you probably do not utilize the index at all. There is some ambiguity how a hibernate limit/offset will translate to database operations (see this comment in the case of postgres). It may imply overhead as described in detail in a reply to this post.
If you have a direct relationship of offset and limit to the values of the id column you could use that in a query of the form
SELECT e
FROM Entity
WHERE id >= offset and id < offset + limit
Given the number of records asked for is significantly smaller than the total number of records int the table the database will use the index.
The next thing is, that 40 columns is quite a bit. If you actually need significantly less for your purpose, you could define a restricted entity with just the attributes required and query for that one. This should take out some more overhead.
If you're still not within performance requirements you could chose to take a jdbc connection/query instead of using hibernate.
Btw. you could log the actual sql issued by jpa/hibernate and use it to get an execution plan from postgress, this will show you what the query actually looks like and if an index will be utilized or not. Further you could monitor the database's query execution times to get an idea which fraction of the processing time is consumed by it and which is consumed by your java client plus data transfer overhead.
There also is a technique to mimick the offset+limit paging, using paging based on the page's first record's key.
Map<Integer, String> mapPageTopRecNoToKey = new HashMap<>();
Then search records >= page's key and load page size + 1 records to find the next page.
Going from page 1 to page 5 would take a bit more work but would still be fast.
This of course is a terrible kludge, but the technique at that time indeed was a speed improvement on some databases.
In your case it would be worth specifying the needed fields in jpql: select e.a, e.b is considerably faster.
I have a JPA method in my repository trying to find entities with a where clause. The problem is that i have huge data set, and when i try to send more than 32k elements in the list clause, i received an error. I found that is a PostgreSQL driver limitation, but i cant find a workaround.
I tried Pageable request but is hard to send only 30k for 8 millions record. Is there any possibility to send more than 30k objects in my in list where clause?
List<Object> findAllByIdIn(List<Long> ids)
No, you don't want to do it especially if you plan to send 8 million identifiers. Working around the IN statement or bind parameter limit is inefficient. Consider the following:
Thousands of bind parameters will result in megabytes of SQL. It will take considerable time to send the SQL text to the database. In fact the database might take longer to read the SQL text than execute the query as per Tom's answer to "Limit and conversion very long IN list: WHERE x IN ( ,,, ...)" question.
SQL parsing will be inefficient. Not only the megabytes of SQL text take time to read but with increased bind parameter count each query will usually have a distinct number of bound parameters used. This distinct bound parameter count is going to result in each query being parsed and planned separately (see this article which explains it).
There is a hard limit of bound parameters in a SQL statement. You just discovered it, 32760.
For those types of queries it's usually better to create temporary tables. Create a new temporary table before your query, insert all the identifiers into it and join it with the entity table. This join will be equivalent to IN condition except SQL text will be short.
It's important to understand from where are these 8 million identifiers loaded. If you are pulling these from the database in the previous query just to pass them back to the the next query you most likely want to write a stored procedure. There is possibly a flaw in your current approach, JPA is not always the right tool for the job.
I have a code where I am getting the data from various sources and sorting and ordering them to be send to the user.
I am taking the data by firing a query which contains multiple joins to a list of DTO, then again I am firing another query which further contains multiple joins to the same list of DTO. then I am adding both the lists of DTOs to be presented to the user.
Query 1:
Select * from TABLE1, TABLE2....
Query 2:
Select * from TABLE5, TABLE7....
dto1.addAll(dto2);
dto1.sort(Comparator....);
I am sorting it again programatically is because of below reason:
Query 1 returned sorted data lets assume
1,2,3,4
Query 2 returned sorted data lets assume
1,2,3,4
After combining both the lists, I will get
1,2,3,4,1,2,3,4
Expected data
1,1,2,2,3,3,4,4
My question is, on which case performance will be better?
fetch the sorted data from both the queries, add the list and then sort and order them.
fetch the unsorted data from both the queries, add the list and then sort and order only once.
In the first case, it will get sorted thrice, but on the second case, it will sort and order only once.
When I tested with putting hundreds of thousands of records in the table, I didn't found much difference, second case was a bit faster than the first one.
So, in case of efficiency and performance, which one should be recommended?
Do it all in MySQL:
( SELECT ... )
UNION ALL
( SELECT ... )
ORDER BY ...
Don't worry about sorting in the two selects; wait until the end to do it.
ALL assumes that there are no dups you need to get rid of.
This approach may be fastest simply because it is a single SQL request to the database. And because it does only one sort.
I think all three will have similar performance. You could get a little bit higher speed using one or the other but I don't think it will be significant.
Now, in terms of load, that's a different story. Are you more limited by CPU resources (in your local machine) or by database resources (in the remote DB server)? Most of the time the database will be sitting there idle while your application will be processing a lot of other stuff. If that's the case, I would prefer to put the load on the database, rather than the application itself: that is, I would let the database combine and sort the data in a single SQL call; then the application would simply use the ready-to-use data.
Edit on Dec 22. 2018:
If both queries run on the same database, you can run them as a single one and combine the results using a CTE (Common Table Expression). For example:
with
x (col1, col2, col3, col4, ...) as (
select * from TABLE1, TABLE2... -- query 1
union all
select * from TABLE5, TABLE7... -- query 2
)
select * from x
order by col1
The ORDER BY at then end operates over the combined result. Alternatively, if your database doesn't support CTEs, you can write:
select * from (
select * from TABLE1, TABLE2... -- query 1
union all
select * from TABLE5, TABLE7... -- query 2
) x
order by col1
I think 2nd one is better performer because if you run a sorting algorithm after merging your two list. So you don't need to run sort query to db. So database sorting query cost not requiring your 2nd query.
But if you retrieve data with sorted order and then again you run sorting algorithm then it will must take some more cost to execute although its negligible.
I have a complex query that requires a full-text search on some fields and basic restrictions on other fields. Hibernate Search documentation strongly advises against adding database query restrictions to a full text search query and instead recommends putting all of the necessary fields into the full-text index. The problem I have with that is that the other fields are volatile; values can change every minute or so and those updates to the database may occur outside of the JVM doing the search, so there is a high likelihood that the local Lucene index would be out of date with respect to those fields.
Looking for strategy recommendations here. The best I've come up with so far is to join the results manually by first executing the database query (fetching only object IDs) and then execute the full text search. and somehow efficiently filter the Lucene results by the set of object IDs from the database. Of course, I don't know how many results I'll get from each separate query, so I'm worried about performance and memory. It could be tens of thousands of rows apiece in the worst case.
I am quite interested in other ideas for this as we have a very similar scenario.
We only needed to show 50 results rows as a maximum with a couple of lookups per row. We run the query against the lucene index with the db pk ids in the index and the pull the lookups out of the database per row. It's still performant for us.
As you seem to want to process more than a few rows and lookups I did consider an alternative. Timestamp any db row updates. This would allow us to query the DB for stale indexes and then iteratively call a reindex of related documents.
I have the same problem and do a separate Lucene and criteria query. If I first do the criteria query I will use the resulting ids to apply a custom IdFilter for Lucene search which checks whether the result is in the given Id collection from the first query. However this approach does not scale well because also in my case the number of results after the first query might be huge and the filter is limited to 1024 ids. I did not find a good solution but I change the order of my two queries depending on the number of the to be expected results. The first query should be the one which filters out most of the results.
You can do a scheduler index update base on the last modified date.
I need to add paginator for my Hibernate application. I applied it to some of my database operations which I perform using Criteria by setting Projection.count().This is working fine.
But when I use hql to query, I can't seem to get and efficient method to get the result count.
If I do query.list().size() it takes lot of time and I think hibernate does load all the objects in memory.
Can anyone please suggest an efficient method to retrieve the result count when using hql?
You'll have to use another query and Query#iterate(). See section 14.16. Tips & Tricks:
You can count the number of query results without returning them:
( (Integer) session.createQuery("select count(*) from ....").iterate().next() ).intValue()
I've done similar things in the past. If you want to get a total count of records for the paginator, then I'd suggest a separate query that you do first. This query should just do a count and return the total.
As you suspect, in order to count your records on your main query, hibernate does have to load up all the records, although it will do it's best to not load all the data for each record. But still this takes time.
If you can get away with it, because even a count query can take time if your where clauses are inefficient, just check that you got a full page of records back, and put up an indicator of some sort to show that there could be more results on the next page. That's the fastest method because you are only queries for each page as you need it.