I need to perform a batch insert of 1kk+ rows using NamedParameterJdbcTemplate in Spring jdbc.
So my code is like this:
public void insert(Collection<Person> entities) {
SqlParameterSource[] params = SqlParameterSourceUtils.createBatch(entities.toArray());
namedJdbcTemplate.batchUpdate(insertSql, params);
}
Actually I use batch insert for 5 different tables which are related. So the problem here, that if I insert about 1kk rows for on of these tables, the app. tries to insert them a lot of times and after gives me OutOfMemoryError.
I think it's because I didn't provide a batch size value (kinda a portion of rows of provided collection of total rows which will be commited after insertion), but I don't know how to set this parameter through my NamedParameterJdbcTemplate value.
Or maybe there are other suggestions how it can be accomplished ?
thx
UPDATED: I use SimpleDataSource and DataSourceTransactionManager in my configuration
UPDATED: I tried to use SingleConnectionDataSource and called method setAutoCommit(false). Then, after batchUpdate of 100k rows in code above I called:
try {
((JdbcTemplate) dbTemplate.getJdbcOperations()).getDataSource().getConnection().commit();
} catch (SQLException e) {
}
and it works better but this is not good for me using such code. Maybe there is another better solution ? And for 1kk rows it still says that not enought memory to perform batchUpdate.
Related
I have a table with 62,000,000 rows aprox, a need select data from these a export to .txt or .csv
My query limit the result to 60,000 rows aprox.
When I run my the query in my developer machine, I eat all memory and get a java.lang.OutOfMemoryError
In this moment I use Hibernate for DAO, but I can change to pure JDBC solution when you recommend
My pseoudo-code is
List<Map> list = myDao.getMyData(Params param); //program crash here
initFile();
for(Map map : list){
util.append(map); //this transform row to file
}
closeFile();
Suggesting me to write my file?
Note: I use .setResultTransformer(Transformers.ALIAS_TO_ENTITY_MAP); to get Map instead of any Entity
You could use hibernate's ScrollableResults. See documentation here: http://docs.jboss.org/hibernate/orm/4.3/manual/en-US/html/ch11.html#objectstate-querying-executing-scrolling
This uses server-side cursors, if your database engine / database driver supports this. Be sure for this to work you set the following properties:
query.setReadOnly(true);
query.setCacheable(false);
ScrollableResults results = query.scroll(ScrollMode.FORWARD_ONLY);
while (results.next()) {
SomeEntity entity = results.get()[0];
}
results.close();
lock the table and then perform subset selection and exports, appending to the results file. ensure you unconditionally unlock when done.
Not nice, but the task will perform to completion even on limited resource servers or clients.
I'm having an issue here where the executeUpdate command always returns value 1 even though there's no record to be updated.
First I retrieve several records, do a bit of calculation, and then update the status of some of the retrieved records.
The JPA update code:
private int executeUpdateStatusToSuccess(Long id, Query updateQuery) {
updateQuery.setParameter(1, getSysdateFromDB());
updateQuery.setParameter(2, id);
int cnt = updateQuery.executeUpdate();
return cnt; // always return 1
}
The update query:
UPDATE PRODUCT_PARAM SET STATUS = 2, DATA_TIMESTAMP=? WHERE ID = ? AND STATUS=-1
Note that STATUS column is practically never valued < 0. I'm purposely adding this condition here just to show that even though it shouldn't have updated any record, the executeUpdate() still returns the value 1.
As an additional note, there is no update process anywhere between the data retrieval and the update. It's all done within my local environment.
Any advice if I'm possibly missing anything here? Or if there's some configuration parameter that I need to checK?
EDIT:
For the JPA I'm using EclipseLink.
For the database I'm using Oracle 10g with driver ojdbc5.jar.
In the end I have to look into the EclipseLink JPA source code. So the system actually executes this line
return Integer.valueOf(1);
from the codes inside basicExecuteCall method of DatabaseAccessor class below:
if (isInBatchWritingMode(session)) {
// if there is nothing returned and we are not using optimistic locking then batch
//if it is a StoredProcedure with in/out or out parameters then do not batch
//logic may be weird but we must not batch if we are not using JDBC batchwriting and we have parameters
// we may want to refactor this some day
if (dbCall.isNothingReturned() && (!dbCall.hasOptimisticLock() || getPlatform().canBatchWriteWithOptimisticLocking(dbCall) )
&& (!dbCall.shouldBuildOutputRow()) && (getPlatform().usesJDBCBatchWriting() || (!dbCall.hasParameters())) && (!dbCall.isLOBLocatorNeeded())) {
// this will handle executing batched statements, or switching mechanisms if required
getActiveBatchWritingMechanism().appendCall(session, dbCall);
//bug 4241441: passing 1 back to avoid optimistic lock exceptions since there
// is no way to know if it succeeded on the DB at this point.
return Integer.valueOf(1);
} else {
getActiveBatchWritingMechanism().executeBatchedStatements(session);
}
}
One easy hack will be by not using the batch. I've tried turning it off in persistence.xml and the update returns the expected value, which is 0.
<property name="eclipselink.jdbc.batch-writing" value="none" />
I'm expecting better solution but this one will do for now in my situation.
I know that this question and answer are pretty old but since I stumbled upon this same problem recently and figured out a solution for my use-case (keep batch-writing enabled and still get the updated rows count for some queries), I figured my solution might be helpful to somebody else in the future.
Basically, you can use a query hint to signal that a specific query does not support batch execution. The code to do this is something like this:
import org.eclipse.persistence.config.HintValues;
import org.eclipse.persistence.config.QueryHints;
import javax.persistence.Query;
public class EclipseLinkUtils {
public static void disableBatchWriting(Query query) {
query.setHint(QueryHints.BATCH_WRITING, HintValues.FALSE);
}
}
I have a db fetch call with Spring jdbcTemplate and rows to be fetched is around 1 millions. It takes too much time iterating in result set. After debugging the behavior I found that it process some rows like a batch and then waits for some time and then again takes a batch of rows and process them. It seems row processing is not continuous so overall time is going into minutes. I have used default configuration for data source. Please help.
[Edit]
Here is some sample code
this.prestoJdbcTempate.query(query, new RowMapper<SomeObject>() {
#Override
public SomeObject mapRow(final ResultSet rs, final int rowNum) throws SQLException {
System.out.println(rowNum);
SomeObject obj = new SomeObject();
obj.setProp1(rs.getString(1));
obj.setProp2(rs.getString(2));
....
obj.setProp8(rs.getString(8));
return obj;
}
});
As most of the comments tell you, One mllion records is useless and unrealistic to be shown in any UI - if this is a real business requirement, you need to educate your customer.
Network traffic application and database server is a key factor in performance in scenarios like this. There is one optional parameter that can really help you in this scenario is : fetch size - that too to certain extent
Example :
Connection connection = //get your connection
Statement statement = connection.createStatement();
statement.setFetchSize(1000); // configure the fetch size
Most of the JDBC database drivers have a low fetch size by default and tuning this can help you in this situation. **But beware ** of the following.
Make sure your jdbc driver supports fetch size
Make sure your JVM heap setting ( -Xmx) is wide enough to handle objects created as a result of this.
Finally, select only the columns you need to reduce network overhead.
In spring, JdbcTemplate lets you set the fetchSize
I'm trying to update all my 4000 Objects in ProfileEntity but I am getting the following exception:
javax.persistence.QueryTimeoutException: The datastore operation timed out, or the data was temporarily unavailable.
this is my code:
public synchronized static void setX4all()
{
em = EMF.get().createEntityManager();
Query query = em.createQuery("SELECT p FROM ProfileEntity p");
List<ProfileEntity> usersList = query.getResultList();
int a,b,x;
for (ProfileEntity profileEntity : usersList)
{
a = profileEntity.getA();
b = profileEntity.getB();
x = func(a,b);
profileEntity.setX(x);
em.getTransaction().begin();
em.persist(profileEntity);
em.getTransaction().commit();
}
em.close();
}
I'm guessing that I take too long to query all of the records from ProfileEntity.
How should I do it?
I'm using Google App Engine so no UPDATE queries are possible.
Edited 18/10
In this 2 days I tried:
using Backends as Thanos Makris suggested but got to a dead end. You can see my question here.
reading DataNucleus suggestion on Map-Reduce but really got lost.
I'm looking for a different direction. Since I only going to do this update once, Maybe I can update manually every 200 objects or so.
Is it possible to to query for the first 200 objects and after it the second 200 objects and so on?
Given your scenario, I would advice to run a native update query:
Query query = em.createNativeQuery("update ProfileEntity pe set pe.X = 'x'");
query.executeUpdate();
Please note: Here the query string is SQL i.e. update **table_name** set ....
This will work better.
Change the update process to use something like Map-Reduce. This means all is done in datastore. The only problem is that appengine-mapreduce is not fully released yet (though you can easily build the jar yourself and use it in your GAE app - many others have done so).
If you want to set(x) for all object's, better to user update statement (i.e. native SQL) using JPA entity manager instead of fetching all object's and update it one by one.
Maybe you should consider the use of the Task Queue API that enable you to execute tasks up to 10min. If you want to update such a number of entities that Task Queues do not fit you, you could also consider the user of Backends.
Put the transaction outside of the loop:
em.getTransaction().begin();
for (ProfileEntity profileEntity : usersList) {
...
}
em.getTransaction().commit();
Your class behaves not very well - JPA is not suitable for bulk updates this way - you just starting a lot of transaction in rapid sequence and produce a lot of load on the database. Better solution for your use case would be scalar query setting all the objects without loading them into JVM first ( depending on your objects structure and laziness you would load much more data as you think )
See hibernate reference:
http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/batch.html#batch-direct
Some of the queries we run have 100'000+ results and it takes forever to load them and then send them to the client. So I'm using ScrollableResults to have a paged results feature. But we're topping at roughly 50k results (never exactly the same amount of results).
I'm on an Oracle9i database, using the Oracle 10 drivers and Hibernate is configured to use the Oracle9 dialect. I tried with the latest JDBC driver (ojdbc6.jar) and the problem was reproduced.
We also followed some advice and added an ordering clause, but the problem was reproduced.
Here is a code snippet that illustrates what we do:
final int pageSize = 50;
Criteria crit = sess.createCriteria(ABC.class);
crit.add(Restrictions.eq("property", value));
crit.setFetchSize(pageSize);
crit.addOrder(Order.asc("property"));
ScrollableResults sr = crit.scroll();
...
...
ArrayList page = new ArrayList(pageSize);
do{
for (Object entry : page)
sess.evict(entry); //to avoid having our memory just explode out of proportion
page.clear();
for (int i =0 ; i < pageSize && ! metLastRow; i++){
if (sr.next())
page.add(sr.get(0));
else
metLastRow = true;
}
metLastRow = metLastRow?metLastRow:sr.isLast();
sendToClient(page);
}while(!metLastRow);
So, why is it that I get the result set to tell me its at the end when it should be having so much more results?
Your code snippet is missing important pieces, like the definitions of resultSet and page. But I wonder anyway, shouldn't the line
if (resultSet.next())
be rather
if (sr.next())
?
As a side note, AFAIK cleaning up superfluous objects from the persistence context could be achieved simply by calling
session.flush();
session.clear();
instead of looping through the collection of object to evict each separately. (Of course, this requires that the query is executed in its own independent session.)
Update: OK, next round of guesses :-)
Can you actually check what rows are sent to the client and compare that against the result of the equivalent SQL query directly against the DB? It would be good to know whether this code retrieves (and sends to the client all rows up to a certain limit, or only some rows (like every 2nd) from the whole resultset, or ... that could shed some light on the root cause.
Another thing you could try is
crit.setFirstResults(0).setMaxResults(200000);
As I had the same issue with a large project code based on List<E> instances,
I wrote a really limited List implementation with only iterator support to browse a ScrollableResults without refactoring all services implementations and method prototypes.
This implementation is available in my IterableListScrollableResults.java Gist
It also regularly flushes Hibernate entities from session. Here is a way to use it, for instance when exporting all non archived entities from DB as a text file with a for loop:
Criteria criteria = getCurrentSession().createCriteria(LargeVolumeEntity.class);
criteria.add(Restrictions.eq("archived", Boolean.FALSE));
criteria.setReadOnly(true);
criteria.setCacheable(false);
List<E> result = new IterableListScrollableResults<E>(getCurrentSession(),
criteria.scroll(ScrollMode.FORWARD_ONLY));
for(E entity : result) {
dumpEntity(file, entity);
}
With the hope it may help