Should I batch insert/update in Hibernate?

Should I batch insert/update in Hibernate? - java

I am having some doubts regarding an function that updates multiple entities, but it does one by one. Of course, there could be a latency problem if we are working with a remote DB, but aside from that, I am worry that we can get an OutOfMemoryException, because of the amount of entities we are updating in one single transaction. My code goes something like the one below.
EntityHeader entityHeader = entityHeaderService.findById(id);
for(EntityDetail entityDetail : entityHeader.getDetails()) {
for(Entity entity : entityDetail.getEntities()) {
entity.setState(true);
entityService.update(entity);
}
}
This is an example and we also have a similar case in another method, but with inserts instead. These methods can update or insert up to 2k or 3k entities in one transaction. So my question is, should we start using batch operations or the amount of entities is not big enough to worry about it? Also, would it performed better if done with a batch operation?

When optimizing things always ask yourself if it is worth the time, e.g. :
Are these method some batch that run nightly or are something that get called quite often?
Is the performance gain high enough or is negligible?
Anyway ~3k entities in one transaction doesn't sound bad, but there are benefits to jdbc batch even with those numbers (also it is quite easy to achieve).
Kinda hard to tell when you should worry about an OutOfMemoryException as it depends on how much memory you are giving to the jvm and how big are those entites you are updating; just to give you some number i personally had some memory trouble when i had to insert around between 10~100 thousand rows in the same transaction with 4Gb memory, i had to flush and empty hibernate cache every once in a while.

Related

Hibernate - SQL is fast, but query is still slow

I'm running Hibernate 4.1 with Javassist runtime instrumentation running on top of Hikari pool, on top of Oracle 12c. JDK is 1.7.
I have a query that runs pretty fast on the database and fetches about 500 entities in Hibernate. The query runtime, according to JProfiler is quite small, about 11 ms, but in total Query.list runs about 7 seconds.
I've tried removing all filters and it shows that most of the time is spent in Javaassist and other Hibernate-related reflection calls (like AbstractProxyHandler and such). I read that the reflection overhead should be pretty small, but it seems like it is not, and it seems like it is too much.
Could you please advise what could be the bottleneck here?

Make sure the object you are retrieving does not have sub-objects that are being fetched lazily as a SELECT instead of eagerly as a JOIN. This can result in a behavior known as SELECT N + 1, where Hibernate ends up running a query to get the 500 objects from their respective table, then an additional query for each object to get the child object. If you have 4 or 5 relationships that are being fetched as SELECT statements for each record, and you have 500 records, suddenly you're running around 2000 queries in order to get the List.
I would recommend turning on the SQL logging for Hibernate to see which queries it's running. If it dumps out a really long list of SELECT queries when you're fetching your list, look at your mapping file to see how your relationships are set up. Try adjusting them to be a fetch="join" and see if those queries go away and if the performance improves.
Here are some possibly related Stack Overflow questions that may be able to provide more specific details.
Hibernate FetchMode SELECT vs JOIN
What is N+1 SELECT query issue?
Something else to note about profilers and other tools of that nature. Often when tracking down a performance issue, a particular block of code will show up as where the most time is being spent. The common conclusion people tend to jump to is that the block of code in question is slow. In your case, you seem to be observing Hibernate's reflective code as being where the time is spent. It is very important to consider that this code itself might not actually be slow, but it is where the time is being spent because it is being called frequently due to algorithm complexity. I have found in many problems, serialization or reflection appears to be slow, when the reality is that the code is used to communicate with an external resource, and that resource is being called 1000s of times when it should only be called a handful. Making 1000s of queries to fetch your list will result in your sampling showing that a lot of time is being spent in the code that processes those queries. Be careful not to mistake code that is called often due to a design/configuration issue for code that is slow. The problem very likely does not lay in hibernate's use of reflection, since reflection generally isn't slow on the order of seconds.

Is it a good practice to do frequently commits for insert to database in java code?

My java code reads the excel file and writes (insert) data from it to oracle database.
For example, I need to read some similar cells in 2000 rows of excel file, my code reads it, insert to database and after do commit.
The first approximately 1000 rows inserts very fast, but another 1000 rows inserts very long.
Possibly reason in the lack of memory.
So, I think to do frequently commits while data is loading to database (e.g. do commit after every 50 rows read).
Is it good practice to do it or there are other ways to solve this problem?

Commits are for atomic operations in the database. You don't just throw them around because you feel like it. Each transaction is generally (depending on isolation level, but assuming serial isolation) a distinct, all-or-nothing operation.
If you don't know what is causing database transaction to take "long time", you should read the logs or talk to someone that knows how to diagnose the cause of the "slowdown" and remedies it. Most likely reason is bad configuration.
The bottom line is, people have transactions that insert 100,000 or even millions of rows as a single transaction without causing issues. And generally, it is better not to commit often for performance reasons.

Databases always have to be consistent, i.e. only commit, if your data is consistent, even if your program crashes afterwards.
(If you don't need that consistency, then why do you use a DB?)
PS: You won't go out of memory that fast.

Multithreaded (or async) calculation on Spring Framework

I am learning Spring Framework, and it is pretty awesome.
I want to use JAVA multithreading, but I don't know how with the Spring Framework.
Here is the code for service:
//StudentService.java
public List<Grade> loadGradesForAllStudents(Date date) {
try{
List<Grade> grades = new ArrayList<Grade>();
List<Student> students = loadCurrentStudents(); // LOAD FROM THE DB
for(Student student : students) { // I WANT TO USE MULTITHREAD FOR THIS PART
// LOAD FROM DB (MANY JOINS)
History studentHistory = loadStudentHistory(student.getStudentId(), date);
// CALCULATION PART
Grade calculatedGrade = calcStudentGrade(studentHistory, date);
grades.add(calculatedGrade);
}
return grades;
} catch(Exception e) {
...
return null;
}
}
And without multithreading, it is pretty slow.
I guess the for loop causes the slowness, but I don't know how to approach this problem. If give me an useful link or example code, I'd appreciate it.
I figured out the method loadStudentHistory is pretty slow (around 300ms) compare to calcStudentGrade (around 30ms).

Using multithreading for this a bad idea in an application with concurrent users, because instead of having each request use one thread and one connection now each query uses multiple threads and multiple connections. It doesn't scale as the number of users grows.
When I look at your example I see two possible issues:
1) You have too many round trips between the application and the database, where each of those trips takes time.
2) It's not clear if each query is using a separate transaction (you don't say where the transactions are demarcated in the example code), if your queries are each creating their own transaction that could be wasteful, because each transaction has overhead associated with it.
Using multithreading will not do much to help with #1 (and if it does help it will put more load on the database) and will either have no effect on #2 or make it worse (depending on the current transaction scopes; if you had the queries in the same transaction before, using multiple threads they'll have to be in different transactions). And as already explained it won't scale up.
My recommendations:
1) Make the service transactional, if it is not already, so that everything it does is within one transaction. Remove the exception-catching/null-returning stuff (which interferes with how Spring wants to use exceptions to rollback transactions) and introduce an exception-handler so that anything thrown from controllers will be caught and logged. That will minimize your overhead from creating transactions and make your exception-handling cleaner.
2) Create one query that brings back a list of your students. That way the query is sent to the database once, then the resultset results are read back in chunks (according to the fetch size on the resultset). You can customize the query to get back only what you need so you don't have an excessive number of joins. Run explain-plan on the query and make sure it uses indexes. You will have a faster query and a much smaller number of round trips, which will make a big speed improvement.

The simple solution is called stream, these enable you to iterate in parallel, for example :
students.stream().parallel().forEach( student -> doSomething(student));
This will give you a noticeable performance-boost but it wont remove the database-query overhead ... if your DB-management system takes about 300ms to return results .... well ... you're either using ORM on big databases or your queries are highly inefficient, i recommend re-analyzing your current solution

Calculation on query vs programmatically

i m working on Java EE projects using Hibernate as ORM , I have come to a phase where i have to perform some mathematical calculation on my Classes , like SUM , COUNT , addition and division .
i have 2 solutions :
To select my classes and apply those operation programmatically in my code
To do calculations on my named queries
i want to please in terms of performance and speed , which one is better ?
And thank you

If you are going to load the same entities that you want to do the aggregation on from the database in the same transaction, then the performance will be better if you do the calculation in Java.
It saves you one round-trip to the database, because in that case you already have the entities in memory.
Other benefits are:
Easier to unit-test the calculation because you can stick to a Java-based unit testing framework
Keeps the logic in one language
Will also work for collections of entities that haven't been persisted yet
But if you're not going to load the same set of entities that you want to do the calculation on, then you will get a performance improvement in almost any situation if you let the database do the calculation. The more entities are involved, the bigger the performance benefit.
Imagine doing a summation over all line items in this year's orders, perhaps several million of them.
It should be clear that having to load all these entities into the memory of the Java process across a TCP connection (even if it is within the same machine) first will take more time, and more memory, than letting the database perform the calculation.
And if your mapping requires additional queries per entity, then Hibernate would have at least one extra round-trip to the database for every entity, in which case the performance benefits of calculating things in SQL on the database would be even bigger.

Are these calculation on the entities (or data)? if yes, then you can indeed go for queries(or even faster, use sql queries iso hql). From performance perspective ,IMO, stored procedures shines but people don't use them so often with hibernate.
Also, if you have some frequent repetitive calculation, try using caching in your application.

How would you go about improving MySQL throughput in this simple scenario?

I have a relatively simple object model:
ParentObject
Collection<ChildObject1>
ChildObject2
The MySQL operation when saving this object model does the following:
Update the ParentObject
Delete all previous items from the ChildObject1 table (about 10 rows)
Insert all new ChildObject1 (again, about 10 rows)
Insert ChildObject2
The objects / tables are unremarkable - no strings, rather mainly ints and longs.
MySQL is currently saving about 20-30 instances of the object model per second. When this goes into prodcution it's going to be doing upwards of a million saves, which at current speeds is going to take 10+ hours, which is no good to me...
I am using Java and Spring. I have profiled my app and the bottle neck is in the calls to MySQL by a long distance.
How would you suggest I increase the throughput?

You can get some speedup by tracking a dirty flag on your objects (especially your collection of child objects). You only delete/update the dirty ones. Depending on what % of them change on each write, you might save a good chunk.
The other thing you can do is do bulk writes via batch updating on the prepared statement. (Look at PreparedStatement.addBatch()) This can be an order of magnitude faster, but might not be record by record,e.g. might look something like:
delete all dirty-flagged children as a single batch command
update all parents as a single batch command
insert all dirty-flagged children as a single batch command.
Note that since you're dealing with millions of records you're probably not going to be able to load them all into a map and dump them at once, you'll have to stream them into a batch handler and dump the changes to the db 1000 records at a time or so. Once you've done this the actual speed is sensitive to the batch size, you'll have to determine the defaults by trial-and-error.

Deleting any existing ChildObject1 records from the table and then inserting the ChildObject1 instances from the current state of your Parent object seems unnecessary to me. Are the values of the all of the child objects different than what was previously stored?
A better solution might involve only modifying the database when you need to, i.e. when there has been a change in state of the ChildObject1 instances.
Rolling your own persistence logic for this type of thing can be hard (your persistence layer needs to know the state of the ChildObject1 objects when they were retrieved to compare them with the versions of the objects at save-time). You might want to look into using an ORM like Hibernate for something like this, which does an excellent job of knowing when it needs to update the records in the database or not.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.