JPA query getting slower on each loop iteration in tomcat - java

I've a simple select query which return 200 rows. The query is iterated 1437 times.
Technology : java 8, spring boot 2.1.3.RELEASE, tomcat, hibernate
At each iteration, the query becomes slower. The first query takes 55ms and the last query takes 702ms
However, when I start the same query in Junit "#RunWith(SpringJUnit4ClassRunner.class)" , the queries are not becoming slower. Every query takes +- 37ms
Log of first and last when running in Tomcat
Log of first and last when running junit

As you can see in the logs, one difference is that the Entity Manager is not closed after each iteration on Tomcat (but it is closed on JUnit). After 1k iterations, the entity manager will hold a lot of objects in memory and operations on such a loaded context become expensive. Memory pressure should also be higher and higher during each iteration.
I would try to clear the context more often (i.e. after every iteration) or at least increase the available memory to rule out GC coming into play too often.
See also this answer

I added entitymanager.clear() after each query, and this solved the problem.
Thanks Cascader !!
The result is really impressive. The first query takes 73ms, and it goes the opposite way "down" to 1ms for the last query

Related

SpringBoot repository find response time increases with many concurrent calls, while manual database search not affected

I am trying to improve the response time of my REST APIs (I am using SpringBoot 2.1.3 with Groovy 2.4.14 and MSSQL). I noticed that the more popular GET API are at certain time periods taking much longer than they should be (>4 seconds as opposed to 0.3 seconds). I've looked into CPU usage, memory, blocking threads, blocking DBs, DeferredResult, fetching schemes, SpringBoot and JPA settings, etc none of these were a bottleneck or were just not relevant (the database search is a simple repository.findById() for a domain object with a few primitive fields).
List<Object> getObjectForId(String id) {
curCallCount++
List<Object> objList = objectRepository.findAllById(id)
curCallCount--
objList
}
The issue seems to be that the more existing calls to the service that have not exited at the time of the call, the longer the response time of the API call (there is almost a linear correlation, if there are 50 existing calls to the service, repository.findbyId() takes 5 seconds, and if there are 200, it would take 20 seconds. Meanwhile while there are 200 concurrent calls, the manual database query is still fast (0.3 seconds).
Is this expected for the Spring repository calls? Where is this overhead from repository.findById() coming from in an environment when there are many concurrent calls to the service, even though the manual database search performance is not affected?
Don't know about the API side, but I would certainly start by looking at the compilation/recompilation rate in SQL Server and looking at your plan cache usage. Using a string variable might be passing all your parameters in as nvarchar(max) and limiting the reuse of your query plans.
The issue was the Hikari pool size being too small (10 by default, so that when there are more than 10 concurrent calls, processes must wait for a free thread). Increasing this (to 150, for example) resolved this issue.

Slow Hibernate query on first execution

I have a (very) complicated application which translates a GET-Request to a number of Hibernate queries to an Oracle DB.
It basically retrieves attributes of an object which are scattered in ~100 tables.
I have to undercut a maximum request time even for edge cases (=big result sets).
In the edge cases, the performance is extremely slow on the first call (i.e. after some time has passed).
After that, the query is much faster, even when I flush both the buffer cache and shared pool.
This applies to the SAME GET-Request, i.e. the same object requested. Request of another object, but same attributes again takes a long time on the first call.
For example, same query, same conditions, total of rows fetched is in the (low) thousands:
first call: 26.000ms
first call after flush of buffer cache/shared pool: 2800ms
second call after flush: 1200ms
From researching the web, I already discovered that flushing the pool does not necessarily really flush it, so I cannot rely on that.
As a caveat, I am a developer and have good working knowledge of Oracle, but am not a DBA and do not have access to a full DBA.
I suspect the following reasons for the slow first execution:
Oracle does hard parses which take a long time (the queries executed may contain multiple thousand parameters): I was unable to find out how long a "bad" hard parse could take. Enterprise Manager tells me he only did 1 hard parse on my queries for multiple executions though, so it seems unlikely.
the queries themselves take a long time, but get cached and the caches are not emptied by my actions (maybe disk caching?): Again, Enterprise Manager disagrees and shows very low query times overall.
I did suspect Hibernate/Java reasons at first (lots of objects to create after all), but it seems unlikely with the huge differences in performance
I am at a loss on how to proceed performance tuning and am looking for helpful reading material and/or different ideas on why the first execution is so slow.
The first query frequently takes much more time than any subsequent ones in Oracle DB.
It doesn't seem to be a good practice to rely solely on the Oracle cache in such circumstances. Though, that may come handy if you can mimic a query at first by means of executing a dummy one (perhaps, right after application launched). It may help to reduce an execution time of any subsequent equal call.
Although, such solution might help to boost performance, the more reliable way would be to introduce a programmatic cache at the application level. It can be used for the entities or any other non-persistent objects that are repeatedly fetched.
Please note, in case the scope of the problem limited to the database, it would be a perfect candidate for a question at Database Administrators Stack Exchange.

JDBC Insert after Delete slows down performance

I run JDBC queries in a sequence INSERT, DELETE, INSERT, DELETE etc. I insert one million of records in 1000 batches then delete those million records in single query then insert it again. In this case I am interested only in insertion performance.
When I run in in a loop of i.e. 10 iterations, at 1st iteration the performance is fastest i.e. 11 seconds then after every next iteration performance of insert is few seconds slower then the previous one. However, when I run it not in the loop the insertion time is very similar.
Any idea why?
for(number of iterations){
//process insert of million records here, batch size is 1000
//prepared statement is used and clearBatch is called after every 1000
//inserts,
//at the end prepared statement is closed and connection.commit() is
//called
//Thread.sleep(1000) called here
//everything is inserted now in the DB so delete what has been inserted
//in single query. connection.commit() called again after delete.
//Thread.sleep(1000) and repeat the same actions until loop finishes.
}
Sorry, I don't have the code with me.
Any idea why at every next iteration the insertion is slower?
I can't be sure without the code, but I think you have a memory leak so that the extra time is due to garbage collection. That would explain why it is faster to run the same code several times without a loop. Run the program with GC logging enabled (-XX:+PrintGC) and see what happens.
To eliminate database issues you may want to test with another (new) table and replace the delete with truncate table, just in case.

Java - DB2 Performance Improvements

We have a SELECT statement which will take approx. 3 secs to execute. We are calling this DB2 query inside a nested While loop.
Ex:
While(hashmap1.hasNext()){
while(hashmap2.hasNext()){
SQL Query
}
}
Problem is, the outer While loop will execute approx. 1200 times and inner While loop will execute 200 times. Which means the SQL will be called 1200*200 = 240,000 times. Approx. each iteration of Outer While loop will take 150 secs. So, 1200 * 150 secs = 50 hrs.
We can afford only around 12-15hrs of time, before we kick off the next process.
Is there any way to do this process quickly? Any new technology which can help us in fetching these records faster from DB2.
Any help would be highly appreciated.
Note: We already looked into all possible ways to cut down the no.of iterations.
Sounds to me like you're trying to use the middle tier for something that the database itself is better suited for. It's a classic "N+1" query problem.
I'd rewrite this logic to execute entirely on the database as a properly indexed JOIN. That'll not only cut down on all that network back and forth, but it'll bring the database optimizer to bear and save you the expense of bringing all that data to the middle tier for processing.

executeQuery taking six times as long to run as opposed to when query is run in TOAD

I inherited a...well, I guess I can call it a piece-of-#### Struts application, and am tasked with optimizing a Levey-Jennings process that checks if our quality control standards are up to snuff.
The process itself runs fine, but there has always been a huge spike in performance time even if the dataset is small. I tested time between each part of the algorithm and discovered that the big time hog was Java's executeQuery() method.
Most recently I ran the application and logged the execution time to be 10 seconds. The executeQuery() took six of those seconds by itself. Curious to see what the problem was, I took the query into TOAD and ran it verbatim -- it only took 1 second to run.
I ran an even larger dataset, which took 60 seconds to run in the Levey-Jennings application -- however, in TOAD, it took 10.
Is this a problem with the query at all, or is using executeQuery() typically a precursor to extreme slowdown?
When you run a query in TOAD (or any other IDE), this tool wants to provide you with the results you can see, as fast as possible. Typically they show you a grid with between 10 or 40 rows. To show you those first 10-40 rows as fast as possible, they hint the query or change the optimizing environment to produce those first rows as fast as possible.
Here you can see more information about the FIRST_ROWS hint: http://download.oracle.com/docs/cd/E11882_01/server.112/e17118/sql_elements006.htm#SQLRF50302
The query in your application likely doesn't use a FIRST_ROWS hint. It wants ALL the rows as fast as possible. It doesn't care if the first row shows up immediately. So, the optimizing environment for those two queries is different.
It also doesn't help that TOAD displays the time it took to produce the first rows, because it leads you to think that that's the time it takes to get all the rows. There is an option to navigate to the last row, though. Press that and you'll see that it now takes longer.
Hope this helps.
Regards,
Rob.

Categories

Resources