I have a question, would like to get some help with.
I have the query running from Java.
SELECT DISTINCT field1, field1
from tblTableA WITH (NOLOCK)
WHERE criteriaField='CONSTANT TEXT'
I run it with jpa
Query qry = entMgr.createNativeQuery(myQry) ;
List sqlResult = qry.getResultList() ;
Now, that qry.getResultList() takes too much time to run - 75 or more seconds. Yes, it returns close to 700 000 records, but the same query ran on Weblogic 10, using ejb2 runs in less than 5 seconds time
Can anyone help resolving this issue, seems like there maybe a configuration I am missing, or a technique I am not following.
There is something on account of using
jbosscmp-jdbc.xml.
I don't have that in my set up, but found out that there is a lazy-loading feature that we can configure. Now, I am not sure how make the query I am running be configured in xml file.
Also, can this be used with annotations instead of xml file ?
I would try to run this query inside of a non-transactional method:
#TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED)
List getResults(..){
Query qry = entMgr.createNativeQuery(myQry) ;
return qry.getResultList() ;
}
This is sometimes not allowed depending on the environment and is mainly used for the optimization of queries expecting to have large results sets and which would later be managed by the PersistenceContext (so basically when you would use HQL instead of native)
But i would give it a try.
You are performing this select query within a transaction scope. I found an old JIRA ticket on Jboos's site. As the ticket suggests, there is a potential around the flush. If you perform a query with EJB3, a flush is performed or attempted automatically for all the objects you retrieve with your native query. The idea seems to be avoid getting stale objects from the database. But in your case, it is not applicable. Set the flush mode to COMMITand see if the performance improves.
query.setFlushMode( FlushModeType.COMMIT );
Also turn off the Hibernate logging and see if that makes any difference.
Related
I have a question. Where did these methods go?
Dialect.supportsTemporaryTables();
Dialect.generateTemporaryTableName();
Dialect.dropTemporaryTableAfterUse();
Dialect.getDropTemporaryTableString();
I've tried to browse git history for Dialect.java, but no luck. I found that something like
MultiTableBulkIdStrategy was created but I couldn't find any example of how to use it.
To the point...I have legacy code (using hibernate 4.3.11) which is doing batch delete from
multiple tables using temporary table. In those tables there may be 1000 rows, but also there may
be 10 milion rows. So just to make sure I don't kill DB with some crazy delete I create temp table where I put (using select query with some condition) 1000 ids at once
and then use this temp table to delete data from 4 tables. It's running in while cycle until all data based on some condition is not deleted.
Transaction is commited after each cycle.
To make it more complicated this code has to run on top of: mysql, mariadb, oracle, postgresql, sqlserver and h2.
It was done using native SQL, with methods mentioned above. But not I can't find a way how
to refactor it.
My first try was to create query using nested select like this:
delete from TABLE where id in (select id from TABLE where CONDITION limit 1000) but this is way slower as I have to run select query multiple times for each delete and limit is not supported in nested select in HQL.
Any ideas or pointers?
Thanks.
The methods were present in version 4.3.11 but removed in version 5.0.0. It seems a bit unusual that they were removed rather than deprecated - the background is on this Jira ticket.
To quote from this:
Long term, I think the best approach is to remove the Dialect method
intended to support table tabled in a piecemeal fashion and to make
MultiTableBulkIdStrategy be a fully self-contained contract.
The methods were removed in this commit.
So it seems that getDefaultMultiTableBulkIdStrategy() is the intended replacement for these methods - but I'm not entirely clear on how, as it currently has no Javadoc. Guess you could try to work it out from the source code ...or if all else fails, perhaps try to contact Steve Ebersole, who implemented the change?
I have a query which has 2 'in' Clauses. First in clause takes around 125 values and second in clause of query takes around 21000 values. Its implemented using JPA CriteriaBuilder.
Query itself executes very fast and return results within seconds. Only problem is entityManager.createQuery(CriteriaQuery) takes around 12-13 minutes to return.
I search all over SO, all the threads are related to performance of Query.getResultList. None of them discuss about performance of entityManager.createQuery(CriteriaQuery). If you have seen such behavior earlier, please let me know, how to resolve it.
My JDK version is 1.7. Dependency version of javaee-api is 6.0. Application is deployed on JBOSS EAP 6.4. But that's not the concern as of now, as I am testing my code using junit using EntityManager connected to actual Oracle database. If you require more information, kindly let me know.
A hybrid approach is to dynamically create a query and then save it as a named query in the entity manager factory.
At that point it becomes just like any other named query that may have been declared statically in metadata. While this may seem like a good compromise, it turns out to be useful in only a few specific cases. The main advantage it offers is if there are queries that are not known until runtime, but then reissued repeatedly. Once the dynamic query becomes a named query it will only bear the cost of processing once.
It is implementation-specific whether that cost is paid when the query is registered as a named query, or deferred until the first time it is executed.
A dynamic query can be turned into a named query by using the
EntityManagerFactory addNamedQuery()
Keep us informed by the result and good luck
I observed that, having single query with 21 IN clauses (each with 1000 expressions) and all combined with OR clauses, made query run slower. I tried another approach of executing every IN Clause as a part of separate query. So these 21 individual queries performed better overall.
Another issue I observed was that Query with CriteriaBuilder was slow when result set is huge (something like 20K rows in result set). I solved this issue by adding query hint to my typed query:
TypedQuery.setHint("org.hibernate.fetchSize", 5000);
Hope it will help others.
Code in Hibernate is not expected to be used for binding lots of params:
for ( ImplicitParameterBinding implicitParameterBinding : parameterMetadata.implicitParameterBindings() ) {
implicitParameterBinding.bind( jpaqlQuery );
}
Unfortunately you need to find different approach if you want to do something similar.
I have a Java code that uses Spring to connect and execute sql on an Oracle DB. I have a query that takes long time to execute (20 minutes or sometimes more). I have a Executor Service and it has a Thread that will execute the query and process the results. If i put a timeout to the DB and Spring, the system will time out correctly but will return nothing else before that. If i run the query from SQL plus, it will return values. The time out is set up 3 times what it takes to execute on SQL Developer.
Any ideas!?
Assuming that your Spring query is using bind variables, are you using bind variables when you execute the query in SQL*Plus/ SQL Developer? Or are you using literals?
What version of Oracle are you using?
Have you checked to see whether the query plans for the two environments are different?
20 minutes for a query in Oracle? I'll bet you don't have appropriate indexes on the columns in your WHERE clause.
The dead giveaway is to do an EXPLAIN PLAN on the query. If you see a TABLE SCAN, take appropriate measures.
If you can run the same query in SQL*Plus and see it return in a reasonable time, then I'm incorrect and the problem is due to something else that you did in Java code.
I don't see why you need a separate thread for a query. I'd run the code straight, without a thread, and see how it behaves. If you aren't indexed properly, add some; if the query brings back too much data, add WHERE clauses to restrict it. You've taken extraordinary measures without really understanding what the root cause is.
I simply need to read each row in a table in my MySQL database using Hibernate and write a file based on it. But there are 90 million rows and they are pretty big. So it seemed like the following would be appropriate:
ScrollableResults results = session.createQuery("SELECT person FROM Person person")
.setReadOnly(true).setCacheable(false).scroll(ScrollMode.FORWARD_ONLY);
while (results.next())
storeInFile(results.get()[0]);
The problem is the above will try and load all 90 million rows into RAM before moving on to the while loop... and that will kill my memory with OutOfMemoryError: Java heap space exceptions :(.
So I guess ScrollableResults isn't what I was looking for? What is the proper way to handle this? I don't mind if this while loop takes days (well I'd love it to not).
I guess the only other way to handle this is to use setFirstResult and setMaxResults to iterate through the results and just use regular Hibernate results instead of ScrollableResults. That feels like it will be inefficient though and will start taking a ridiculously long time when I'm calling setFirstResult on the 89 millionth row...
UPDATE: setFirstResult/setMaxResults doesn't work, it turns out to take an unusably long time to get to the offsets like I feared. There must be a solution here! Isn't this a pretty standard procedure?? I'm willing to forgo Hibernate and use JDBC or whatever it takes.
UPDATE 2: the solution I've come up with which works ok, not great, is basically of the form:
select * from person where id > <offset> and <other_conditions> limit 1
Since I have other conditions, even all in an index, it's still not as fast as I'd like it to be... so still open for other suggestions..
Using setFirstResult and setMaxResults is your only option that I'm aware of.
Traditionally a scrollable resultset would only transfer rows to the client on an as required basis. Unfortunately the MySQL Connector/J actually fakes it, it executes the entire query and transports it to the client, so the driver actually has the entire result set loaded in RAM and will drip feed it to you (evidenced by your out of memory problems). You had the right idea, it's just shortcomings in the MySQL java driver.
I found no way to get around this, so went with loading large chunks using the regular setFirst/max methods. Sorry to be the bringer of bad news.
Just make sure to use a stateless session so there's no session level cache or dirty tracking etc.
EDIT:
Your UPDATE 2 is the best you're going to get unless you break out of the MySQL J/Connector. Though there's no reason you can't up the limit on the query. Provided you have enough RAM to hold the index this should be a somewhat cheap operation. I'd modify it slightly, and grab a batch at a time, and use the highest id of that batch to grab the next batch.
Note: this will only work if other_conditions use equality (no range conditions allowed) and have the last column of the index as id.
select *
from person
where id > <max_id_of_last_batch> and <other_conditions>
order by id asc
limit <batch_size>
You should be able to use a ScrollableResults, though it requires a few magic incantations to get working with MySQL. I wrote up my findings in a blog post (http://www.numerati.com/2012/06/26/reading-large-result-sets-with-hibernate-and-mysql/) but I'll summarize here:
"The [JDBC] documentation says:
To enable this functionality, create a Statement instance in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
This can be done using the Query interface (this should work for Criteria as well) in version 3.2+ of the Hibernate API:
Query query = session.createQuery(query);
query.setReadOnly(true);
// MIN_VALUE gives hint to JDBC driver to stream results
query.setFetchSize(Integer.MIN_VALUE);
ScrollableResults results = query.scroll(ScrollMode.FORWARD_ONLY);
// iterate over results
while (results.next()) {
Object row = results.get();
// process row then release reference
// you may need to evict() as well
}
results.close();
This allows you to stream over the result set, however Hibernate will still cache results in the Session, so you’ll need to call session.evict() or session.clear() every so often. If you are only reading data, you might consider using a StatelessSession, though you should read its documentation beforehand."
Set fetch size in query to an optimal value as given below.
Also, when caching is not required, it may be better to use StatelessSession.
ScrollableResults results = session.createQuery("SELECT person FROM Person person")
.setReadOnly(true)
.setFetchSize( 1000 ) // <<--- !!!!
.setCacheable(false).scroll(ScrollMode.FORWARD_ONLY)
FetchSize must be Integer.MIN_VALUE, otherwise it won't work.
It must be literally taken from the official reference: https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html
Actually you could have gotten what you wanted -- low-memory scrollable results with MySQL -- if you had used the answer mentioned here:
Streaming large result sets with MySQL
Note that you will have problems with Hibernate lazy-loading because it will throw an exception on any queries performed before the scroll is finished.
With 90 million records, it sounds like you should be batching your SELECTs. I've done with with Oracle when doing the initial load into a distrbuted cache. Looking at the MySQL documentation, the equivalent seems to be using the LIMIT clause: http://dev.mysql.com/doc/refman/5.0/en/select.html
Here's an example:
SELECT * from Person
LIMIT 200, 100
This would return rows 201 through 300 of the Person table.
You'd need to get the record count from your table first and then divide it by your batch size and work out your looping and LIMIT parameters from there.
The other benefit of this would be parallelism - you can execute multiple threads in parallel on this for faster processing.
Processing 90 million records also doesn't sound like the sweet spot for using Hibernate.
The problem could be, that Hibernate keeps references to all objests in the session until you close the session. That has nothing to do with query caching. Maybe it would help to evict() the objects from the session, after you are done writing the object to the file. If they are no longer references by the session, the garbage collector can free the memory and you won't run out of memory anymore.
I propose more than a sample code, but a query template based on Hibernate to do this workaround for you (pagination, scrolling and clearing Hibernate session).
It can also easily be adapted to use an EntityManager.
I've used the Hibernate scroll functionality successfully before without it reading the entire result set in. Someone said that MySQL does not do true scroll cursors, but it claims to based on the JDBC dmd.supportsResultSetType(ResultSet.TYPE_SCROLL_INSENSITIVE) and searching around it seems like other people have used it. Make sure it's not caching the Person objects in the session - I've used it on SQL queries where there was no entity to cache. You can call evict at the end of the loop to be sure or test with a sql query. Also play around with setFetchSize to optimize the number of trips to the server.
recently i worked over a problem like this, and i wrote a blog about how face that problem. is very like, i hope be helpfull for any one.
i use lazy list approach with partial adquisition. i Replaced the limit and offset or the pagination of query to a manual pagination.
In my example, the select returns 10 millions of records, i get them and insert them in a "temporal table":
create or replace function load_records ()
returns VOID as $$
BEGIN
drop sequence if exists temp_seq;
create temp sequence temp_seq;
insert into tmp_table
SELECT linea.*
FROM
(
select nextval('temp_seq') as ROWNUM,* from table1 t1
join table2 t2 on (t2.fieldpk = t1.fieldpk)
join table3 t3 on (t3.fieldpk = t2.fieldpk)
) linea;
END;
$$ language plpgsql;
after that, i can paginate without count each row but using the sequence assigned:
select * from tmp_table where counterrow >= 9000000 and counterrow <= 9025000
From java perspective, i implemented this pagination through partial adquisition with a lazy list. this is, a list that extends from Abstract list and implements get() method. The get method can use a data access interface to continue get next set of data and release the memory heap:
#Override
public E get(int index) {
if (bufferParcial.size() <= (index - lastIndexRoulette))
{
lastIndexRoulette = index;
bufferParcial.removeAll(bufferParcial);
bufferParcial = new ArrayList<E>();
bufferParcial.addAll(daoInterface.getBufferParcial());
if (bufferParcial.isEmpty())
{
return null;
}
}
return bufferParcial.get(index - lastIndexRoulette);<br>
}
by other hand, the data access interface use query to paginate and implements one method to iterate progressively, each 25000 records to complete it all.
results for this approach can be seen here
http://www.arquitecturaysoftware.co/2013/10/laboratorio-1-iterar-millones-de.html
Another option if you're "running out of RAM" is to just request say, one column instead of the entire object How to use hibernate criteria to return only one element of an object instead the entire object? (saves a lot of CPU process time to boot).
For me it worked properly when setting useCursors=true, otherwise The Scrollable Resultset ignores all the implementations of fetch size, in my case it was 5000 but Scrollable Resultset fetched millions of records at once causing excessive memory usage. underlying DB is MSSQLServer.
jdbc:jtds:sqlserver://localhost:1433/ACS;TDS=8.0;useCursors=true
We are using a Java EE application and we are right now using Informix DB.Our code hits the DB with queries like
"select first 10 * from test"
Now as far as I know Oracle does not support 'first 10 *' kind of statements.We have more than 1000 queries like this.Should we manually change this or can have some manual customization?
This is a good reason for either only using standard SQL as much as possible, or for isolating those dependencies into stored procedures (yes, I know that doesn't help you in this specific case, I just thought I'd mention it for future reference).
I suspect you'll have to change each one individually although a simple search over you source code for "select " or "first " will be a good start.
Then you can decide how you want to change them, since you may also still want it to work on Informix.
For what it's worth, I think you get the same effect with Oracle's
select * from ( select * from mytable ) where rownum <= 10
I would be farming the job of dynamically constructing a query (based on a template) out to another layer which can return a different query based on which database you have configured. Then, when you also want to support DB2 (for example), it's a simple matter of changing just that layer.
For example, have a call like:
gimmeRowLimitedSqlQuery ("* from test",10);
which would give you either of:
select first 10 * from test
select * from test where rownum <= 10
I should also mention, although I realise your query is just an example, that SQL can return rows in any order if you don't specify order by so
select first 10 * from test
makes little sense, especially if you may be running it in different DBMS'.
You could write an extension to the JDBC driver to modify the queries on the fly but probably that is an overkill so a careful search and replace on the source code to modify all queries would be more appropriate.
Oracle has the concept of ROWNUM for limiting results. You will have to update your queries for this.
TOP-n and Pagination queries are a little bit more complex than just using ROWNUM. For example, you might be surprised that you don't get the expected results when using ROWNUM with ORDER BY in the same query.
Check http://www.oracle.com/technology/oramag/oracle/07-jan/o17asktom.html for more info on those type of queries in Oracle.