Optimizing response time [ 1 findAll() vs multiple findByXYZ() inside a loop]

Optimizing response time [ 1 findAll() vs multiple findByXYZ() inside a loop] - java

I have this service that does some accounting calculation (generating annual reports) which is somewhat complex since we use formulas that are then parsed and interpreted, so the java code in itself is complex but I managed to optimize it many times (we use SonarQube and codeMetrics), the problem is I have a DB call inside a for loop which now that I think about it is a problem (I've always been told that r/w operations take longer so reduce them as much as possible) but when you see it, it looks harmless (I just get what i need) but recently we noticed a performance issue, maybe it's because the DB is now larger (although i'm pretty certain I did my tests with large datasets) or maybe it's because we're now on lockdown and using a VPN which may have affected the response time.
anyway what i did was instead of having multiple findByXYZ() inside loops (which after inspection turns out i have 60 db calls after the loops are over), i used 2 findAll() and then inside the loops i just use a stream.filter(...) with this solution I managed to remove about 60 unnecessary db calls and saw a gain in response time by 1-2 seconds sometimes a few hundred ms, my question is, is this a good approach? or are there variables that I'm not taking into consideration that can be causing the issue? like having the server and the DB in the same network vs having them on two different network and the lag that can cause ...
Before
//1st loop
for(..) {
...
Optional<X> neededXInThisLoop = xDao.findByXYZ(x,y,z);
...
}
//2nd loop
for(..) {
...
List<Y> neededYsInThisLoop = yDao.findByX2Y2Z2(x2,y2,z2);
...
}
After
List<X> allXs = xDao.findAll();
List<Y> allYs = yDao.findAll();
//1st loop
for(..) {
...
Optional<X> neededXInThisLoop = allXs.stream.filter(...).findFirst();
...
}
//2nd loop
for(..) {
...
List<X> neededXsInThisLoop = allXs.stream.filter(...).collect(Collectors.toList());
...
}

Your hunch is very much right. The after is much more efficient than the before and you should try to minimize DB calls as much as possible (try to do as much in SQL, and then use stream to further transform the result or such).
DB Calls in for loops (or other repetitive structure) is a very big code smell and can cause serious performance problems.
Ideally you should not do xDao.findAll, but directly use xDao.findAllByXYZ() which just delivers you the filtered list, which you then just map to java pojos.
SQL (or whatever other Data Manipulation Language you might use) does a ton of optimizations. Use it for it's intended purpose.
You can read more about the different ways Spring supports JPA repositories for example in the official Spring documentation. You can, for example, simply name your method in the JpaRepository findAllBy____ (your condition here) or using a #Query annotation to specify a fully fledged SQL or JPQL query and Spring takes care of the rest.

Let me try to explain you,
Right now your DB have suppose 1000 records in that specific collection and using that 60 DB call you are filtering this number with 100 records.
But just think that this table contains 1M records and you applied findALL so you had received that 1M records and now you are trying to apply that filtering logic using Java 8 filters
So this filters itself process very slow with 1M records.
So my only suggestion is, as of now you have limited numbers of records in table so you are able to see performance improve with findALL.
Once this number increase, surely your performance will decrease.
Also you can see findByX and findAllByX on below mentioned link
https://spring.io/blog/2017/06/20/a-preview-on-spring-data-kay#improved-naming-for-crud-repository-methods

Related

Should I batch insert/update in Hibernate?

I am having some doubts regarding an function that updates multiple entities, but it does one by one. Of course, there could be a latency problem if we are working with a remote DB, but aside from that, I am worry that we can get an OutOfMemoryException, because of the amount of entities we are updating in one single transaction. My code goes something like the one below.
EntityHeader entityHeader = entityHeaderService.findById(id);
for(EntityDetail entityDetail : entityHeader.getDetails()) {
for(Entity entity : entityDetail.getEntities()) {
entity.setState(true);
entityService.update(entity);
}
}
This is an example and we also have a similar case in another method, but with inserts instead. These methods can update or insert up to 2k or 3k entities in one transaction. So my question is, should we start using batch operations or the amount of entities is not big enough to worry about it? Also, would it performed better if done with a batch operation?

When optimizing things always ask yourself if it is worth the time, e.g. :
Are these method some batch that run nightly or are something that get called quite often?
Is the performance gain high enough or is negligible?
Anyway ~3k entities in one transaction doesn't sound bad, but there are benefits to jdbc batch even with those numbers (also it is quite easy to achieve).
Kinda hard to tell when you should worry about an OutOfMemoryException as it depends on how much memory you are giving to the jvm and how big are those entites you are updating; just to give you some number i personally had some memory trouble when i had to insert around between 10~100 thousand rows in the same transaction with 4Gb memory, i had to flush and empty hibernate cache every once in a while.

More efficient to do SELECT and compare in Java or DELETE and INSERT

I am hitting a REST API to get data from a service. I transform this data and store it in a database. I will have to do this on some interval, 15 minutes, and then make sure this database has latest information.
I am doing this in a Java program. I am wondering if it would be better, after I have queried all data, to do
1. SELECT statements and compare vs transformed data and do UPDATEs (DELETE all associated records to what was changed and INSERT new)
OR
DELETE ALL and INSERT ALL every time.
Option 1 has potential to be a lot less transactions, guaranteed SELECT on all records because we are comparing, but potentially not a lot of UPDATEs since I don't expect data to be changing much. But it has downside of doing comparisons on all records to detect a change
I am planning on doing this using Spring Boot, JPA layer and possibly postgres

The short answer is "It depends. Test and see for your usecase."
The longer answer: this feels like preoptimization. And the general response for preoptimization is "don't." Especially in DB realms like this, what would be best in one situation can be awful in another. There are a number of factors, including (and not exclusive to) schema, indexes, HDD backing speed, concurrency, amount of data, network speed, latency, and so on:
First, get it working
Identify what's wrong → get a metric
Measure against that metric
Make any obvious or necessary changes
Repeat 1 through 4 as appropriate
The first question I would ask of you is "What does better mean?" Once you define that, the path forward will likely become clearer.

Hibernate really slow. How to make it faster?

In my app. I have Case and for each Case there can be 0 to 2 Claim. If a Case has 0 claims it runs pretty fast, 1 claims and it slows down, and 2 is awfully slow. Any idea how to make this faster? I didn't know if my case and claim were going back and forth causing an infinite recurison, so I added a JsonManagedReference and JsonBackReference, but that doesn't seem to help much with speeds. Any ideas? Here is my Case.java:
#Entity
public class Case {
#OneToMany(mappedBy="_case", fetch = FetchType.EAGER)
#Fetch(FetchMode.JOIN)
#JsonManagedReference(value = "case-claim")
public Set<Claim> claims;
}
In Claim.java:
#Entity
public class Claim implements Cloneable {
#ManyToOne(optional = true)
#JoinColumn(name = "CASE_ID")
#JsonBackReference(value = "case-claim")
private Case _case;
}
output of 0 claims:
https://gist.github.com/elmatt/2cafbe7ecb1fa0b7f6a8
output of 2 claims:
https://gist.github.com/elmatt/b000bc28909453effc95

Your problem has nothing to do with the relationship between Case and Claim.
FYI: 300ms is not "pretty fast." Your problem is that you expect hibernate to magically and quickly deliver a complex object hierarchy to you, with no particular effort on your part. I view ORM as "The Big Lie" - it is super easy to use and works great on toy problems, but tends to fail miserably when you try to scale to interesting applications (like yours).
Don't abandon hibernate, but realize that you are going to need to work harder than you thought you would in order to make it work for you.
I happen to work in a similar data domain (post-adjudication healthcare claim analysis and processing). You should be able to select this kind of data in well under 10ms per claim (with all associated dimensions) using MySQL on modest hardware from a table with >1 billion claims and the DB hosted on a separate server from the app.
How do you get from where you are to where you should be?
1. Minimize the number of round-trips to the database by minimizing the number of separate queries that are executed.
2. Hand-craft your important queries to grab just the rows and joins that you actually need.
3. Use explain plan on every query to make sure that it hits the tables in the right order and every step is appropriately supported by an index.
4. Consider partitioning your big tables and include the partition criteria in your queries to enable partition-pruning to focus the query on the proper data.
5. Be very hesitant to let hibernate manage your relationships between your entities. I generally do not let hibernate deal with any relationships.
A few years ago, I worked on a product that is an iPhone app where the user walks through workflows (e.g., a nurse taking a patient's vitals) and each screen made a round-trip to the app server to execute the workflow step and get the data for the next screen. Think about how little data you can work with on an iPhone screen. Yet the DB portion of the round-trip generally took 2-5 seconds to execute. Everyone there took it for granted, because "That is how long it has always taken." I dug into the code and found that each step was pulling in a significant portion of the database (and then was not used by the business logic).
The only time they tweaked the default hibernate behavior was when they got an exception due to too many joins (yes, MySQL has a limit of something like 67 tables in one query).
The approach of creating your Java data model and simply ORM'ing it into the database generally works just fine on configuration data and the like, but tends to perform terribly for complex data models involving your transactional data. This is what is biting you now.
Your problem is totally fixable, and can be attacked incrementally - you don't have to tear apart the whole application to start making things better.

Can you enable hibernate logging and provide the output. It should indicate the SQL queries being executed against your DB. Information about which DB you are using would also be useful. When you have those I would recommend profiling the queries to ensure your DB is setup appropriately. It sounds like an non indexed query.
Size of the datasets would be helpful in targeting possible issues as well - number of rows and so on.
I would also recommend timing the actual hibernate call (could be as crude as log statement immediately before / after) vs overall processing to identify whether it really is hibernate or some other processing. Without further information & context that is not clear here.
Now you've posted your queries we can see what is happening. It looks like the structure of your entities is more complex than the code snippet originally posted. There are references to Person, Activities, HealthPlan and others in there.
As others have commented your query is triggering a very large select of a lot of data due to the nature of your model.
I recommend creating Named Queries for claims, and then load those using the ID of Case.
You should also review your hibernate model and switch to FetchType.LAZY, other hibernate will create large queries such as the one you have posted. The catch here is that if you try to access a related entity outside of the transaction you will get a lazyinitializationexception. You will need to consider each use case and ensure you load the data you need. Two common mistakes with Hibernate is to use FetchType.EAGER everywhere or to initiate the transaction to early to avoid this. There is not one correct design approach, but I normally do the following
JSP -> Controller -> [TX BOUNDARY] Service -> DAO
You service method(s) should encapsulate the business logic you need to load the data you require, before passing it back to the controller.
Again, per the other answer, I think you're expecting too much of Hibernate. It is a powerful tool but you need to understand how it works to get the best from it.

Multithreaded (or async) calculation on Spring Framework

I am learning Spring Framework, and it is pretty awesome.
I want to use JAVA multithreading, but I don't know how with the Spring Framework.
Here is the code for service:
//StudentService.java
public List<Grade> loadGradesForAllStudents(Date date) {
try{
List<Grade> grades = new ArrayList<Grade>();
List<Student> students = loadCurrentStudents(); // LOAD FROM THE DB
for(Student student : students) { // I WANT TO USE MULTITHREAD FOR THIS PART
// LOAD FROM DB (MANY JOINS)
History studentHistory = loadStudentHistory(student.getStudentId(), date);
// CALCULATION PART
Grade calculatedGrade = calcStudentGrade(studentHistory, date);
grades.add(calculatedGrade);
}
return grades;
} catch(Exception e) {
...
return null;
}
}
And without multithreading, it is pretty slow.
I guess the for loop causes the slowness, but I don't know how to approach this problem. If give me an useful link or example code, I'd appreciate it.
I figured out the method loadStudentHistory is pretty slow (around 300ms) compare to calcStudentGrade (around 30ms).

Using multithreading for this a bad idea in an application with concurrent users, because instead of having each request use one thread and one connection now each query uses multiple threads and multiple connections. It doesn't scale as the number of users grows.
When I look at your example I see two possible issues:
1) You have too many round trips between the application and the database, where each of those trips takes time.
2) It's not clear if each query is using a separate transaction (you don't say where the transactions are demarcated in the example code), if your queries are each creating their own transaction that could be wasteful, because each transaction has overhead associated with it.
Using multithreading will not do much to help with #1 (and if it does help it will put more load on the database) and will either have no effect on #2 or make it worse (depending on the current transaction scopes; if you had the queries in the same transaction before, using multiple threads they'll have to be in different transactions). And as already explained it won't scale up.
My recommendations:
1) Make the service transactional, if it is not already, so that everything it does is within one transaction. Remove the exception-catching/null-returning stuff (which interferes with how Spring wants to use exceptions to rollback transactions) and introduce an exception-handler so that anything thrown from controllers will be caught and logged. That will minimize your overhead from creating transactions and make your exception-handling cleaner.
2) Create one query that brings back a list of your students. That way the query is sent to the database once, then the resultset results are read back in chunks (according to the fetch size on the resultset). You can customize the query to get back only what you need so you don't have an excessive number of joins. Run explain-plan on the query and make sure it uses indexes. You will have a faster query and a much smaller number of round trips, which will make a big speed improvement.

The simple solution is called stream, these enable you to iterate in parallel, for example :
students.stream().parallel().forEach( student -> doSomething(student));
This will give you a noticeable performance-boost but it wont remove the database-query overhead ... if your DB-management system takes about 300ms to return results .... well ... you're either using ORM on big databases or your queries are highly inefficient, i recommend re-analyzing your current solution

Fast way to get results in hibernate?

I currently have hibernate set up in my project. It works well for most things. However today I needed to have a query return a couple hundred thousand rows from a table. It was ~2/3s of the total rows in the table. The problem is the query is taking ~7 minutes. Using straight JDBC and executing what I assumed was an identical query, it takes < 20 seconds. Because of this I assume I am doing something completely wrong. I'll list some code below.
DetachedCriteria criteria =DetachedCriteria.forlass(MyObject.class);
criteria.add(Restrictions.eq("booleanFlag", false));
List<MyObject> list = getHibernateTemplate().findByCriteria(criteria);
Any ideas on why it would be slow and/or what I could do to change it?

You have probably answered your own question already, use straight JDBC.
Hibernate is creating at best an instance of some Object for every row, or worse, multiple Object instances for each row. Hibernate has some really degenerate code generation and instantiation behavior that can be difficult to control, especially with large data sets, and even worse if you have any of the caching options enabled.
Hibernate is not suited for large results sets, and processing hundreds of thousands of rows as objects isn't very performance oriented either.
Raw JDBC is just that raw types for rows columns. Orders of magnitudes of less data.

I'm not sure hibernate is the right thing to use if you need to pull hundreds of thousands of records. The query execute time might be under 20 seconds but the fetch time will be huge and consume a lot of memory. After you get all those records, how do you output them? It's far more data than you could display to a user. Hibernate isn't really a good solution for doing data wharehouse style data crunching.

Probably you have several references to other classes in your MyObject class and in your mapping you set eager loading or something like that. It's very hard to find the issue using the code you wrote because it's OK.
Probably it will be better for you to use Hibernate Profiler - http://hibernateprofiler.com/ . It will show you all the problems with your mappings, configurations and queries.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.