Is using query.getResultList(); method affect badly on performance? - java

I'm using Java persistence API to develop a standalone software. I need to get single results as well as multiple results from database. I have several questions also,
First one is,
I'm using query.getResultList(); method for retrieving both single result and multiple results. I want to know weather this method retrieve the entire result set or some part of the result set (like when executing a limit query, eg: *SELECT * FROM some_db.some_table LIMIT 100 OFFSET 120;*). If not, does it affect badly on performance? (In this case I'm thinking about a database which contain around 10000 records in some tables)
Second one,
What the impact between using query.getResultList(); and query.getSingleResult();

You should separate DB-side and server-side parts.
If you get a 100 rows from DB-side to server-side, it does not matter for latter server-side execution how huge you table is.
So, you are trying to understand an overhead of query.getResultList() using with just one row I think. IMHO, you should not bother about that.
Based on docs it seems that you have small difference between query.getResultList() and query.getSingleResult() when you are getting 0 rows as a result.
query.getSingleResult()
... if there are no results at all a NoResultException is thrown.
UPD
For Persistance API try to use this way to create a query:
getEntityManager().createQuery(query)
.setFirstResult(offset) // offset
.setMaxResults(limit) // limit
.getResultList();

Related

Hibernate JPA EntityManager.createQuery() performance

I have a query which has 2 'in' Clauses. First in clause takes around 125 values and second in clause of query takes around 21000 values. Its implemented using JPA CriteriaBuilder.
Query itself executes very fast and return results within seconds. Only problem is entityManager.createQuery(CriteriaQuery) takes around 12-13 minutes to return.
I search all over SO, all the threads are related to performance of Query.getResultList. None of them discuss about performance of entityManager.createQuery(CriteriaQuery). If you have seen such behavior earlier, please let me know, how to resolve it.
My JDK version is 1.7. Dependency version of javaee-api is 6.0. Application is deployed on JBOSS EAP 6.4. But that's not the concern as of now, as I am testing my code using junit using EntityManager connected to actual Oracle database. If you require more information, kindly let me know.
A hybrid approach is to dynamically create a query and then save it as a named query in the entity manager factory.
At that point it becomes just like any other named query that may have been declared statically in metadata. While this may seem like a good compromise, it turns out to be useful in only a few specific cases. The main advantage it offers is if there are queries that are not known until runtime, but then reissued repeatedly. Once the dynamic query becomes a named query it will only bear the cost of processing once.
It is implementation-specific whether that cost is paid when the query is registered as a named query, or deferred until the first time it is executed.
A dynamic query can be turned into a named query by using the
EntityManagerFactory addNamedQuery()
Keep us informed by the result and good luck
I observed that, having single query with 21 IN clauses (each with 1000 expressions) and all combined with OR clauses, made query run slower. I tried another approach of executing every IN Clause as a part of separate query. So these 21 individual queries performed better overall.
Another issue I observed was that Query with CriteriaBuilder was slow when result set is huge (something like 20K rows in result set). I solved this issue by adding query hint to my typed query:
TypedQuery.setHint("org.hibernate.fetchSize", 5000);
Hope it will help others.
Code in Hibernate is not expected to be used for binding lots of params:
for ( ImplicitParameterBinding implicitParameterBinding : parameterMetadata.implicitParameterBindings() ) {
implicitParameterBinding.bind( jpaqlQuery );
}
Unfortunately you need to find different approach if you want to do something similar.

How to get the number of results in an App Engine query before actually iterating through them all

In my Google App Engine app I need to fetch and return a potentially large number of entities from a datastore query in response to a service call GET request. This call may return potentially thousands of entities and MBs of serialized data.
The first portion of the response packet communicates how many entities are in the serialized results, followed by all of the serialized entities. Currently I am iterating through all the entities in the query with a QueryResultIterator up to a maximum page size limit, after which I return a cursor from which can be used to continue fetching where the previous call left off (if the maximum was reached and there are still results in the query). As I iterate through the results, I save them in a list. Once I've either exhausted the query results or reached the maximum page size, I can then get the number of entities from the size of this list. But then I have to iterate through this list again to serialize each of the entities and write the results to the response output stream.
I don't know that this is the most efficient method to perform this operation. Is there a way I can get the number of entities in a query's results before actually iterating through them all or fetching them directly into a list? (The list method doesn't work anyway because I'm using cursors, which requires the use of QueryResultIterator).
QueryResultIterator has a method getIndexList(). Would this be a less costly way to get the number of entities in the query's results? I'm assuming this list would contain exactly one index object for each entity in the query's results. Also, I'd need this list to only contain the indexes for the entities after the current cursor position for the interator. Is my understanding correct or would this method not do what I think it would?
A list of just indexes would require much less memory than loading a list of whole entities. Although, I don't know if this list would be limited at all by the query's prefetch or chunk sizes, or if I'd want to use the query's limit parameter at all because I would only be interested in knowing how many entities were in the results up to the maximum page size plus one (to know there are still more results and provide a cursor to continue).
Currently I'm setting the prefetch and chunk size (to the size of my page limit), but I'm not using the limit or offset parameters since I'm using cursors instead. From what I understand cursors are preferable to offset/limit. Would setting the limit parameter affect continuing a query with a cursor?
Clearly I have quite a few questions as to how GAE datastore queries work and how they're affected by changing parameters. So any insights are appreciated. The documentation for App Engine APIs is often sparse, as in one sentence descriptions of methods stating pretty much what can be deduced from the method signature. They don't generally go into much detail otherwise. Maybe the way I'm doing it currently is just fine after all. It works as is, but I'm trying to optimize the service call to get the best response time possible for my client application.
UPDATE: By the way, I am using Objectify v3 in my app and to perform this query. There are several places I am required to use the low-level datastore API, including to do geo-location queries (with geomodel) and projection queries (which aren't support in Objectify v3). So if there is a good way to do this using Objectify, that would be ideal. Otherwise I can use the low-level API, but it's always messier this way.
Both the low-level api and Objectify have a count() method (look at the javadocs for details). However, counting can be a very expensive and lengthy operation - it costs 1 small op for every number returned. For example, count() returning 5000 costs 5000 small ops (plus 1 read for the query), and takes as long as it would take to do a keys-only scan of all 5000 (which is what GAE actually does).
If you absolutely must have an exact count, you probably need to aggregate this value yourself by incrementing/decrementing a (possibly sharded) counter. This gets very tricky when you are dealing with filtered queries.
There is no one right solution here. Google searches give you totals like "About 119,000,000 results" which are deliberately inexact and almost certainly precalculated. For smaller result sets, using count() can be acceptable - but you might want to apply a limit() so that you never break the bank. You can always say "More than 500 results..."
if you want to fetch no of record than you can use following code
com.google.appengine.api.datastore.Query qry = new com.google.appengine.api.datastore.Query("EntityName");
com.google.appengine.api.datastore.DatastoreService datastoreService = DatastoreServiceFactory.getDatastoreService();
int totalCount = datastoreService.prepare(qry).countEntities(FetchOptions.Builder.withDefaults());
anf if you want filter than you can used
qry.addFilter("firstName", FilterOperator.EQUAL, firstName);
i hope it will help you

Large SQL dataset query using java

I have the following configuration:
SQL Server 2008
Java as backend technology - Spring + Hibernate
Basically what I want to do is a select with a where clause on a table. The problem is the table has about 700M entries and the query takes a really long time.
Can you please indicate some pointers on where to optimize the query or what sort of techniques are can I use in order to get an improvement in performance?
Thanks.
Using indexes is the standard technique used to deal with this problem. As requested, here are some pointers that should get you started:
http://odetocode.com/articles/70.aspx
http://www.simple-talk.com/sql/learn-sql-server/sql-server-index-basics/
http://www.petri.co.il/introduction-to-sql-server-indexes.htm
The first thing I do in this case is isolate whether it is the amount of data I am returning that is the problem or not (an i/o issue). A simple non-scientific way to do this is change your query to just return the count:
select count(*) --just return a count, no data!
from MyTable
inner join MyOtherTable on ...
where ...
If this runs very quickly, it tells you your indexes are in order (assuming no sub-selects in your WHERE clause). If not, then you need to work on indexes, the WHERE clause, or your query construction itself (JOINs being done, etc).
Once that is satisfactory, add back in your SELECT clause. If it is slow, you are going to have to look at your data access pattern:
Can you return fewer columns?
Can you return fewer rows at once?
Is there caching you can do in the application layer?
Is this query a candidate for partitioned/materialized views (if your database supports those)?
I would run Profiler to find the exact query that is being generated. ORMs can create less than optimal queries. Once you know the query, you can run it in SSMS and see the execution plan. This will give you clues as to where you have performance problems.
Several things that can cause performance problems:
Lack of correct indexing (Foreign keys should be indexed if you have
joins as well as the criteria in the where clause)
Lack of sargability in the where clause forcing the query to not use
existing indexes
Returning more columns than are needed
Correlated subqueries and scalar functions that cause
row-by-agonzing-row operations
Returning too much data (will anybody really be looking at 1 million
records returned? You only want to return the amount you show on page
not the whole possible recordset)
Locking and blocking
There's more (After all whole very long books are written o nthis subject) but that should be enough to get you started at where to look.
You should provide some indexes for those column you often use to restrict the result. Other thing is the pagination of the result set.
Regardless of the specific DB, I would do the following:
run an explain analyze
make sure you have an index for the columns that are part of your where clause
If indexes are ok, it's very likely that you are fetching a lot of
records from disk, which is very slow: if you really cannot refine
your query so that you fetch fewer records, consider clustering your
table, to improve disk locality of your records.

Using Hibernate's ScrollableResults to slowly read 90 million records

I simply need to read each row in a table in my MySQL database using Hibernate and write a file based on it. But there are 90 million rows and they are pretty big. So it seemed like the following would be appropriate:
ScrollableResults results = session.createQuery("SELECT person FROM Person person")
.setReadOnly(true).setCacheable(false).scroll(ScrollMode.FORWARD_ONLY);
while (results.next())
storeInFile(results.get()[0]);
The problem is the above will try and load all 90 million rows into RAM before moving on to the while loop... and that will kill my memory with OutOfMemoryError: Java heap space exceptions :(.
So I guess ScrollableResults isn't what I was looking for? What is the proper way to handle this? I don't mind if this while loop takes days (well I'd love it to not).
I guess the only other way to handle this is to use setFirstResult and setMaxResults to iterate through the results and just use regular Hibernate results instead of ScrollableResults. That feels like it will be inefficient though and will start taking a ridiculously long time when I'm calling setFirstResult on the 89 millionth row...
UPDATE: setFirstResult/setMaxResults doesn't work, it turns out to take an unusably long time to get to the offsets like I feared. There must be a solution here! Isn't this a pretty standard procedure?? I'm willing to forgo Hibernate and use JDBC or whatever it takes.
UPDATE 2: the solution I've come up with which works ok, not great, is basically of the form:
select * from person where id > <offset> and <other_conditions> limit 1
Since I have other conditions, even all in an index, it's still not as fast as I'd like it to be... so still open for other suggestions..
Using setFirstResult and setMaxResults is your only option that I'm aware of.
Traditionally a scrollable resultset would only transfer rows to the client on an as required basis. Unfortunately the MySQL Connector/J actually fakes it, it executes the entire query and transports it to the client, so the driver actually has the entire result set loaded in RAM and will drip feed it to you (evidenced by your out of memory problems). You had the right idea, it's just shortcomings in the MySQL java driver.
I found no way to get around this, so went with loading large chunks using the regular setFirst/max methods. Sorry to be the bringer of bad news.
Just make sure to use a stateless session so there's no session level cache or dirty tracking etc.
EDIT:
Your UPDATE 2 is the best you're going to get unless you break out of the MySQL J/Connector. Though there's no reason you can't up the limit on the query. Provided you have enough RAM to hold the index this should be a somewhat cheap operation. I'd modify it slightly, and grab a batch at a time, and use the highest id of that batch to grab the next batch.
Note: this will only work if other_conditions use equality (no range conditions allowed) and have the last column of the index as id.
select *
from person
where id > <max_id_of_last_batch> and <other_conditions>
order by id asc
limit <batch_size>
You should be able to use a ScrollableResults, though it requires a few magic incantations to get working with MySQL. I wrote up my findings in a blog post (http://www.numerati.com/2012/06/26/reading-large-result-sets-with-hibernate-and-mysql/) but I'll summarize here:
"The [JDBC] documentation says:
To enable this functionality, create a Statement instance in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
This can be done using the Query interface (this should work for Criteria as well) in version 3.2+ of the Hibernate API:
Query query = session.createQuery(query);
query.setReadOnly(true);
// MIN_VALUE gives hint to JDBC driver to stream results
query.setFetchSize(Integer.MIN_VALUE);
ScrollableResults results = query.scroll(ScrollMode.FORWARD_ONLY);
// iterate over results
while (results.next()) {
Object row = results.get();
// process row then release reference
// you may need to evict() as well
}
results.close();
This allows you to stream over the result set, however Hibernate will still cache results in the Session, so you’ll need to call session.evict() or session.clear() every so often. If you are only reading data, you might consider using a StatelessSession, though you should read its documentation beforehand."
Set fetch size in query to an optimal value as given below.
Also, when caching is not required, it may be better to use StatelessSession.
ScrollableResults results = session.createQuery("SELECT person FROM Person person")
.setReadOnly(true)
.setFetchSize( 1000 ) // <<--- !!!!
.setCacheable(false).scroll(ScrollMode.FORWARD_ONLY)
FetchSize must be Integer.MIN_VALUE, otherwise it won't work.
It must be literally taken from the official reference: https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html
Actually you could have gotten what you wanted -- low-memory scrollable results with MySQL -- if you had used the answer mentioned here:
Streaming large result sets with MySQL
Note that you will have problems with Hibernate lazy-loading because it will throw an exception on any queries performed before the scroll is finished.
With 90 million records, it sounds like you should be batching your SELECTs. I've done with with Oracle when doing the initial load into a distrbuted cache. Looking at the MySQL documentation, the equivalent seems to be using the LIMIT clause: http://dev.mysql.com/doc/refman/5.0/en/select.html
Here's an example:
SELECT * from Person
LIMIT 200, 100
This would return rows 201 through 300 of the Person table.
You'd need to get the record count from your table first and then divide it by your batch size and work out your looping and LIMIT parameters from there.
The other benefit of this would be parallelism - you can execute multiple threads in parallel on this for faster processing.
Processing 90 million records also doesn't sound like the sweet spot for using Hibernate.
The problem could be, that Hibernate keeps references to all objests in the session until you close the session. That has nothing to do with query caching. Maybe it would help to evict() the objects from the session, after you are done writing the object to the file. If they are no longer references by the session, the garbage collector can free the memory and you won't run out of memory anymore.
I propose more than a sample code, but a query template based on Hibernate to do this workaround for you (pagination, scrolling and clearing Hibernate session).
It can also easily be adapted to use an EntityManager.
I've used the Hibernate scroll functionality successfully before without it reading the entire result set in. Someone said that MySQL does not do true scroll cursors, but it claims to based on the JDBC dmd.supportsResultSetType(ResultSet.TYPE_SCROLL_INSENSITIVE) and searching around it seems like other people have used it. Make sure it's not caching the Person objects in the session - I've used it on SQL queries where there was no entity to cache. You can call evict at the end of the loop to be sure or test with a sql query. Also play around with setFetchSize to optimize the number of trips to the server.
recently i worked over a problem like this, and i wrote a blog about how face that problem. is very like, i hope be helpfull for any one.
i use lazy list approach with partial adquisition. i Replaced the limit and offset or the pagination of query to a manual pagination.
In my example, the select returns 10 millions of records, i get them and insert them in a "temporal table":
create or replace function load_records ()
returns VOID as $$
BEGIN
drop sequence if exists temp_seq;
create temp sequence temp_seq;
insert into tmp_table
SELECT linea.*
FROM
(
select nextval('temp_seq') as ROWNUM,* from table1 t1
join table2 t2 on (t2.fieldpk = t1.fieldpk)
join table3 t3 on (t3.fieldpk = t2.fieldpk)
) linea;
END;
$$ language plpgsql;
after that, i can paginate without count each row but using the sequence assigned:
select * from tmp_table where counterrow >= 9000000 and counterrow <= 9025000
From java perspective, i implemented this pagination through partial adquisition with a lazy list. this is, a list that extends from Abstract list and implements get() method. The get method can use a data access interface to continue get next set of data and release the memory heap:
#Override
public E get(int index) {
if (bufferParcial.size() <= (index - lastIndexRoulette))
{
lastIndexRoulette = index;
bufferParcial.removeAll(bufferParcial);
bufferParcial = new ArrayList<E>();
bufferParcial.addAll(daoInterface.getBufferParcial());
if (bufferParcial.isEmpty())
{
return null;
}
}
return bufferParcial.get(index - lastIndexRoulette);<br>
}
by other hand, the data access interface use query to paginate and implements one method to iterate progressively, each 25000 records to complete it all.
results for this approach can be seen here
http://www.arquitecturaysoftware.co/2013/10/laboratorio-1-iterar-millones-de.html
Another option if you're "running out of RAM" is to just request say, one column instead of the entire object How to use hibernate criteria to return only one element of an object instead the entire object? (saves a lot of CPU process time to boot).
For me it worked properly when setting useCursors=true, otherwise The Scrollable Resultset ignores all the implementations of fetch size, in my case it was 5000 but Scrollable Resultset fetched millions of records at once causing excessive memory usage. underlying DB is MSSQLServer.
jdbc:jtds:sqlserver://localhost:1433/ACS;TDS=8.0;useCursors=true

Hibernate Hql find result size for paginator

I need to add paginator for my Hibernate application. I applied it to some of my database operations which I perform using Criteria by setting Projection.count().This is working fine.
But when I use hql to query, I can't seem to get and efficient method to get the result count.
If I do query.list().size() it takes lot of time and I think hibernate does load all the objects in memory.
Can anyone please suggest an efficient method to retrieve the result count when using hql?
You'll have to use another query and Query#iterate(). See section 14.16. Tips & Tricks:
You can count the number of query results without returning them:
( (Integer) session.createQuery("select count(*) from ....").iterate().next() ).intValue()
I've done similar things in the past. If you want to get a total count of records for the paginator, then I'd suggest a separate query that you do first. This query should just do a count and return the total.
As you suspect, in order to count your records on your main query, hibernate does have to load up all the records, although it will do it's best to not load all the data for each record. But still this takes time.
If you can get away with it, because even a count query can take time if your where clauses are inefficient, just check that you got a full page of records back, and put up an indicator of some sort to show that there could be more results on the next page. That's the fastest method because you are only queries for each page as you need it.

Categories

Resources