Persist performance gradually decreasing - java

Given: Method in a #Stateless EJB, JTA Mysql data source and a list of about 5000 entities which I'm trying to persist in a cycle :
List<MyObj> objList = entityManager.createNamedQuery("Q1", MyObj.class).getResultList();
for(MyObj obj : objList) {
long start_time = Calendar.getInstance().getTimeInMillis();
entityManager.persist(obj);
long end_time = Calendar.getInstance().getTimeInMillis();
logger.info("Object saved in " + (end_time - start_time) + "ms");
}
entityManager.close();
Log shows gradually decreasing performance from 15ms up to 180ms save time per entity. I believe Mysql sever settings are far beyond the needs of this task - it shows insignificant increase in CPU and I/O operations. Flushing EntityManager after each update has no effect.
What could be the reason for such performance decrease? Please point me in the right direction.

It looks like slowdown is caused by large number of entities in persistence context (i.e. in its session cache). flush doesn't help here because it doesn't remove entities from persistence context.
If you need to process a large number of entities using a single persistence context it's recommended to clear the context with clear() periodically.

Related

How can I get JPA/Entity Manager to make parallel queries instead of lumping them into one batch?

Inside the doGet method in my servlet I'm using a JPA TypedQuery to retrieve my data. I'm able to get the data I want through an http get request method. The method to get the data takes roughly 10 seconds and when I make a single request all is good. The problem occurs when I get multiple requests at the same time. If I make 4 request at the same time, all 4 queries are lumped together and they take 40 seconds to get the data back for all of them. How can I get JPA to make 4 separate queries in parallel? Is this something in the persistence.xml that needs set or is it a code related issue? Note: I've also tried executing this code in a thread. A link and some appropriate terminology to increase my understanding would be appreciated.
Thanks!
try{
String sequenceNo = request.getParameter("sequenceNo");
EntityManagrFactory emf = Persistence.createEntityManagerFactory("mydbcon");
EntityManager em = emf.createEntityManager();
long startTime = System.currentTimeMillis();
List<Myeo> returnData = methodToGetData(em);
System.out.println(sequenceNo + " " + (System.currentTimeMillis() - startTime));
String myJson = new Gson().toJson(returnData);
resp.getOutputStream().print(myJson);
resp.getOutputStream().flush();
}finally{
resp.getOutputStream().close();
if (em.isOpen())
em.close();
}
4 simulaneous request samples
localhost/myservlet/mycodeblock?sequenceNo=A
localhost/myservlet/mycodeblock?sequenceNo=B
localhost/myservlet/mycodeblock?sequenceNo=C
localhost/myservlet/mycodeblock?sequenceNo=D
resulting print statements
A 38002
B 38344
C 38785
D 39065
What I want
A 9002
B 9344
C 9785
D 10065
If you do 4 separate GET-requests these request should be called in parallel. They must not be lumped together, since they are called in different transactions.
If that does not work as you wrote, you should check whether you have defined a database-connection-pool-size or a servlet-thread-pool-size which serializes the calls to the dbms.

Bulk insert/update using Stateless session - Hibernate

I have a requirement to insert/update more than 15000 rows in 3 tables. So that's 45k total inserts.
I used Statelesssession in hibernate after reading online that it is the best for batch processing as it doesn't have a context cache.
session = sessionFactory.openStatelessSession;
for(Employee e: emplList) {
session.insert(e);
}
transcation.commit;
But this codes takes more than an hour to complete.
Is there a way to save all the entity objects in one go?
Save the entire collection rather than doing it one by one?
Edit: Is there any other framework that can offer a quick insert?
Cheers!!
You should read this article of Vlad Mihalcea:
How to batch INSERT and UPDATE statements with Hibernate
You need to make sure that you've set the hibernate property:
hibernate.jdbc.batch_size
So that Hibernate can batch these inserts, otherwise they'll be done one at a time.
There is no way to insert all entities in one go. Even if you could do something like session.save(emplList) internally Hibernate will save one by one.
Accordingly to Hibernate User Guide StatelessSession do not use batch feature:
The insert(), update(), and delete() operations defined by the StatelessSession interface operate directly on database rows. They cause the corresponding SQL operations to be executed immediately. They have different semantics from the save(), saveOrUpdate(), and delete() operations defined by the Session interface.
Instead use normal Session and clear the cache from time to time. Acttually, I suggest you to measure your code first and then make changes like use hibernate.jdbc.batch_size, so you can see how much any tweak had improved your load.
Try to change it like this:
session = sessionFactory.openSession();
int count = 0;
int step = 0;
int stepSize = 1_000;
long start = System.currentTimeMillis();
for(Employee e:emplList) {
session.save(e);
count++;
if (step++ == stepSize) {
long elapsed = System.currentTimeMillis() - start;
long linesPerSecond = stepSize / elapsed * 1_000;
StringBuilder msg = new StringBuilder();
msg.append("Step time: ");
msg.append(elapsed);
msg.append(" ms Lines: ");
msg.append(count);
msg.append("/");
msg.append(emplList.size());
msg.append(" Lines/Seconds: ");
msg.append(linesPerSecond);
System.out.println(msg.toString());
start = System.currentTimeMillis();
step = 0;
session.clear();
}
}
transcation.commit;
About hibernate.jdbc.batch_size - you can try different values, including some very large depending on underlying database in use and network configuration. For example, I do use a value of 10,000 for a 1gbps network between app server and database server, giving me 20,000 records per second.
Change stepSize to the same value of hibernate.jdbc.batch_size.

Spring JDBC template ROW Mapper is too slow

I have a db fetch call with Spring jdbcTemplate and rows to be fetched is around 1 millions. It takes too much time iterating in result set. After debugging the behavior I found that it process some rows like a batch and then waits for some time and then again takes a batch of rows and process them. It seems row processing is not continuous so overall time is going into minutes. I have used default configuration for data source. Please help.
[Edit]
Here is some sample code
this.prestoJdbcTempate.query(query, new RowMapper<SomeObject>() {
#Override
public SomeObject mapRow(final ResultSet rs, final int rowNum) throws SQLException {
System.out.println(rowNum);
SomeObject obj = new SomeObject();
obj.setProp1(rs.getString(1));
obj.setProp2(rs.getString(2));
....
obj.setProp8(rs.getString(8));
return obj;
}
});
As most of the comments tell you, One mllion records is useless and unrealistic to be shown in any UI - if this is a real business requirement, you need to educate your customer.
Network traffic application and database server is a key factor in performance in scenarios like this. There is one optional parameter that can really help you in this scenario is : fetch size - that too to certain extent
Example :
Connection connection = //get your connection
Statement statement = connection.createStatement();
statement.setFetchSize(1000); // configure the fetch size
Most of the JDBC database drivers have a low fetch size by default and tuning this can help you in this situation. **But beware ** of the following.
Make sure your jdbc driver supports fetch size
Make sure your JVM heap setting ( -Xmx) is wide enough to handle objects created as a result of this.
Finally, select only the columns you need to reduce network overhead.
In spring, JdbcTemplate lets you set the fetchSize

Hibernate query returns stale data

I have a hibernate query (hibernate 3) that only reads data from the database. The database is updated by a separate application and the query result does not reflect the changes in the database.
With a bit of research, I think it may have something to do with the Hibernate L2 cache (I don't think it's the L1 cache since I always open a new session and close it after it's done).
Session session = sessionFactoryWrapper.getSession();
List<FlowCount> result = session.createSQLQuery(flowCountSQL).list();
session.close();
I tried disabling the second-layer cache in the hibernate config file but it's not working:
<property name="hibernate.cache.use_second_level_cache">false</property>
<property name="hibernate.cache.use_query_cache">false</property>
<propertyname="cache.provider_class">org.hibernate.cache.NoCacheProvider</property>
I also added session.setCacheMode(CacheMode.Refresh); after Session session = sessionFactoryWrapper.getSession(); to force a refresh on the L1 cache but still not working...
Is there another way to pick up the changes in the database? Am I doing something wrong on how to disable the cache? Thanks.
Update:
I did another experiment by monitoring the database query log:
Run the code the 1st time. Check the log. The query shows up.
Wait a few minutes. The data has changed by another application. I verified it through MySql Workbench. To distinguish from the previous query I add a dummy condition.
Run the code the 2nd time. Check the log and the query shows up.
Both time I'm using the same query but since the data has changed, the result should be different but somehow it's not...
In order to force a L1 cache refresh you can use the refresh(Object) method of Session.
From the Hibernate Docs,
Re-read the state of the given instance from the underlying database.
It is inadvisable to use this to implement long-running sessions that
span many business tasks. This method is, however, useful in certain
special circumstances. For example
where a database trigger alters the object state upon insert or update
after executing direct SQL (eg. a mass update) in the same session
after inserting a Blob or Clob
Moreover you mentioned that you added session.setCacheMode(CacheMode.Refresh) to force a refresh on the L1 cache. This won't work because, CacheMode doesn't have to do anything with L1 cache. From the Hibernate Docs again,
CacheMode controls how the session interacts with the second-level
cache and query cache.
Without second-level cache and query cache, hibernate will always fetch all data from database in a new session.
You can check which query exactly is executed by Hibernate by enabling DEBUG log level for org.hibernate package (and TRACE level for org.hibernate.type if you want to see bound variables).
How old of a change is the query reflecting? If it is showing the changes after sometime, it might have to do with how you obtain your session.
I am not familiar with the SessionFactoryWrapper class, is this a custom class that you wrote? Are you somehow caching the session object longer than it is necessary? If so, the query will be reusing the objects if it has already been loaded in the session. This is the idea behind the repeatable read semantics that Hibernate guarantees.
You can clear the session before running your query and it will then return the latest data.
Hibernate's built-in connection pooling mechanism is bugged.
Replace it with a production quality alternative like c3p0.
I had the exact same issue where stale data was returned until I started using c3p0.
Just in case it IS the 1st Level Cache
Can you show the query you make ?
See following Bugs:
https://hibernate.atlassian.net/browse/HHH-9367
https://jira.grails.org/browse/GRAILS-11645
Additional:
http://howtodoinjava.com/2013/07/01/understanding-hibernate-first-level-cache-with-example/
http://www.dineshonjava.com/p/cacheing-in-hibernate-first-level-and.html#.VhZ7o3VElhE
Repeatable finder problem caused by Hibernates 1st Level Cache
To be clear, both test succeed - not logically at all:
userByEmail('foo#bar.com').email != 'foo#bar.com'
Complete Test
#Issue('https://jira.grails.org/browse/GRAILS-11645')
class FirstLevelCacheSpec extends IntegrationSpec {
def sessionFactory
def setup() {
User.withNewSession {
User user = new User(email: 'test#test.org', password: 'test-password')
user.save(flush: true, failOnError: true)
}
}
private void updateObjectInNewSession(){
User.withNewSession {
def u = User.findByEmail('test#test.org', [cache: false])
u.email = 'foo#bar.com'
u.save(flush: true, failOnError: true)
}
}
private User userByEmail(String email){
User.findByEmail(email, [cache: false])
}
def "test first update"() {
when: 'changing the object in another session'
updateObjectInNewSession()
then: 'retrieving the object by changed identifier (no 2nd level cache)'
userByEmail('foo#bar.com').email == 'foo#bar.com'
}
def "test stale object in 1st level"() {
when: 'changing the object after pulling objects to cache by finder'
userByEmail('test#test.org')
updateObjectInNewSession()
then: 'retrieving the object by changed identifier (no 2nd level cache)'
userByEmail('foo#bar.com').email != 'foo#bar.com'
}
}

Update all objects in JPA entity

I'm trying to update all my 4000 Objects in ProfileEntity but I am getting the following exception:
javax.persistence.QueryTimeoutException: The datastore operation timed out, or the data was temporarily unavailable.
this is my code:
public synchronized static void setX4all()
{
em = EMF.get().createEntityManager();
Query query = em.createQuery("SELECT p FROM ProfileEntity p");
List<ProfileEntity> usersList = query.getResultList();
int a,b,x;
for (ProfileEntity profileEntity : usersList)
{
a = profileEntity.getA();
b = profileEntity.getB();
x = func(a,b);
profileEntity.setX(x);
em.getTransaction().begin();
em.persist(profileEntity);
em.getTransaction().commit();
}
em.close();
}
I'm guessing that I take too long to query all of the records from ProfileEntity.
How should I do it?
I'm using Google App Engine so no UPDATE queries are possible.
Edited 18/10
In this 2 days I tried:
using Backends as Thanos Makris suggested but got to a dead end. You can see my question here.
reading DataNucleus suggestion on Map-Reduce but really got lost.
I'm looking for a different direction. Since I only going to do this update once, Maybe I can update manually every 200 objects or so.
Is it possible to to query for the first 200 objects and after it the second 200 objects and so on?
Given your scenario, I would advice to run a native update query:
Query query = em.createNativeQuery("update ProfileEntity pe set pe.X = 'x'");
query.executeUpdate();
Please note: Here the query string is SQL i.e. update **table_name** set ....
This will work better.
Change the update process to use something like Map-Reduce. This means all is done in datastore. The only problem is that appengine-mapreduce is not fully released yet (though you can easily build the jar yourself and use it in your GAE app - many others have done so).
If you want to set(x) for all object's, better to user update statement (i.e. native SQL) using JPA entity manager instead of fetching all object's and update it one by one.
Maybe you should consider the use of the Task Queue API that enable you to execute tasks up to 10min. If you want to update such a number of entities that Task Queues do not fit you, you could also consider the user of Backends.
Put the transaction outside of the loop:
em.getTransaction().begin();
for (ProfileEntity profileEntity : usersList) {
...
}
em.getTransaction().commit();
Your class behaves not very well - JPA is not suitable for bulk updates this way - you just starting a lot of transaction in rapid sequence and produce a lot of load on the database. Better solution for your use case would be scalar query setting all the objects without loading them into JVM first ( depending on your objects structure and laziness you would load much more data as you think )
See hibernate reference:
http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/batch.html#batch-direct

Categories

Resources