I have an issue where my spring boot application performance is very slow when inserting data.
I am extracting a large subset of data from one database and inserting the data into another database.
The following is my entity.
#Entity
#Table(name = "element")
public class VXMLElementHistorical {
#Id
#Column(name = "elementid")
private long elementid;
#Column(name = "elementname")
private String elementname;
Getter/Setter methods...
I have configured a JPA repository
public interface ElementRepository extends JpaRepository<Element, Long> {
}
and call the save() method with my object
#Transactional
public void processData(List<sElement> hostElements)
throws DataAccessException {
List<Element> elements = new ArrayList<Element>();
for (int i = 0; i < hostElements.size(); i++) {
Element element = new Element();
element.setElementid(hostElements.get(i).getElementid());
element.setElementname(hostElements.get(i).getElementname());
elements.add(element);
}
try{
elementRepository.save(elements);{
//catch etc...
}
What is happening is that for each item, it is taking between 6 and 12 seconds to perform an insert. I have turned on hibernate trace logging and statistics and what is happening when I call the save function is that hibernate performs two queries, a select and an insert. The select query is taking 99% of the overall time.
I have ran the select query direct on the database and the result returns in nanoseconds. Which leads me to believe it is not an indexing issue however I am no DBA.
I have created a load test in my dev environment, and with similar load sizes, the over all process time is no where near as long as in my prod environment.
Any suggestions?
Instead of creating a list of elements and saving those, save the individual elements. Every now an then do a flush and clear to prevent dirty checking to become a bottleneck.
#PersistenceContext
private EntityManager entityManager;
#Transactional
public void processData(List<sElement> hostElements)
throws DataAccessException {
for (int i = 0; i < hostElements.size(); i++) {
Element element = new Element();
element.setElementid(hostElements.get(i).getElementid());
element.setElementname(hostElements.get(i).getElementname());
elementRepository.save(element)
if ( (i % 50) == 0) {
entityManager.flush();
entityManager.clear();
}
}
entityManager.flush(); // flush the last records.
You want to flush + clear every x elements (here it is 50 but you might want to find your own best number.
Now as you are using Spring Boot you also might want to add some additional properties. Like configuring the batch-size.
spring.jpa.properties.hibernate.jdbc.batch_size=50
This will, if your JDBC driver supports it, convert 50 single insert statements into 1 large batch insert. I.e. 50 inserts to 1 insert.
See also https://vladmihalcea.com/how-to-batch-insert-and-update-statements-with-hibernate/
As #M. Deinum said in comment you can improve by calling flush() and clear() after a certain number of inserts like below.
int i = 0;
for(Element element: elements) {
dao.save(element);
if(++i % 20 == 0) {
dao.flushAndClear();
}
}
Since loading the entities seems to be the bottleneck and you really just want to do inserts, i.e. you know the entities don't exist in the database you probably shouldn't use the standard save method of Spring Data JPA.
The reason is that it performs a merge which triggers Hibernate to load an entity that might already exist in the database.
Instead, add a custom method to your repository which does a persist on the entity manager. Since you are setting the Id in advance, make sure you have a version property so that Hibernate can determine that this indeed is a new entity.
This should make the select go away.
Other advice given in other answers is worth considering as a second step:
enable batching.
experiment with intermediate flushing and clearing of the session.
saving one instance at a time without gathering them in a collection, since the call to merge or persist doesn't actually trigger writing to the database, but only the flushing does (this is a simplification, but it shall do for this context)
Related
I want to delete all records with some lineId to save another records with the same lineId(as refresh) but after deleting I can't save anything. There isn't any error, but I don't have my record in database.
When I don't have ma deleting code everything saves correctly.
public void deleteAndSaveEntities(List<Entity> entities, Long lineId){
deleteEntities(lineId);
saveEntities(entities);
}
private void deleteEntities(Long lineId) {
List<Entity> entitiesToDelete = entityRepository.findAllByLineId(lineId);
entityRepository.deleteAll(entitiesToDelete);
}
private void saveEntities(List<Entity> entities) {
entityRepository.saveAll(entities);
}
Actually you want to update the entries that has the lineId. Try it as:
First fetch by find..().
Make related changes on that entries
Then save them.
As thomas mentioned, hibernate reorders the queries within the transaction for performance reasons and executes the delete after the update.
I would commit the transaction between these two operations.
Add a #Transactional over deleteEntities and saveEntities.
But be aware that #Transactional does not work when invoked with in the same object.
You must inject the Service into itself and then call the methods on the self reference
I have the following code:
private void sendList(List<Data> myData) {
myData.forEach(x -> {
sendData(x);
})
}
#Transactional
private void sendData(Data myData){
//do some changes in myData object and inserts into table
}
Currently, it commits after the insert is complete for every Data object.
But, I would like to commit every 500 records.
Is possible to do this?
Firstly, you don’t need a loop to save the data. Use saveAll instead. (It’s the power of crud).
Secondly, you should add batching. By default it isn't switched on. Therefore you have to add some params to application.properties:
spring.jpa.properties.hibernate.jdbc.batch_size=500
spring.jpa.properties.hibernate.order_inserts=true
The 1st property tells Hibernate to collect inserts in batches of 500. The order_inserts property tells Hibernate to take the time to group inserts by entity, creating larger batches.
Source
While streaming over a "data provider" I need to insert a fairly large number of entities in the database, say around 100.000. This whole step needs to be transactional.
To simplify my use-case as much as possible let's assume this is my code:
#Transactional
public void execute() {
for (int i = 0; i < 100000; i++) {
carRespository.save(new Car());
}
}
The problem with this code is that even if it's clear i have no use for the Car entities after the insert query is generated the entity is attached to the Persistence Context and held in memory until the transaction is done.
I would like to make sure that in case the garbage collection is triggered the created entities are cleared. For this currently I see two solutions:
create a native insert query on the repository
Inject the EntityManager in the service and call em.detach(car) after every insert
I tend to prefer the second option as I would not have to manage the native insert statement as the entity changes.
Can you confirm I taking the correct approach or suggest a better alternative?
You can find in the Hibernate documentation the way to insert the batch of data.
When making new objects persistent flush() and then clear() the session regularly in order to control the size of the first-level cache.
Thus the following approach is recommended:
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
session.save(new Car());
if (i%20 == 0 ) {
session.flush();
session.clear();
}
}
tx.commit();
session.close();
You can try using the saveAndFlush(S entity) method from spring data JPA JpaRepository instead of save()
For concurrency purpose, I have got a requirement to update the state of a column of the database to USED while selecting from AVAILABLE pool.
I was thinking to try #Modifying, and #Query(query to update the state based on the where clause)
It is all fine, but this is an update query and so it doesn't return the updated data.
So, is it possible in spring data, to update and return a row, so that whoever read the row first can use it exclusively.
My update query is something like UPDATE MyObject o SET o.state = 'USED' WHERE o.id = (select min(id) from MyObject a where a.state='AVAILABLE'), so basically the lowest available id will be marked used. There is a option of locking, but these requires exceptional handling and if exception occur for another thread, then try again, which is not approved in my scenario
You need to explicitly declare a transaction to avoid other transactions being able to read the values involved until it's commited. The level with best performance allowing it is READ_COMMITED, which doesn't allow dirty reads from other transactions (suits your case). So the code will look like this:
Repo:
#Repository
public interface MyObjectRepository extends JpaRepository<MyObject, Long> {
#Modifying
#Query("UPDATE MyObject o SET o.state = 'USED' WHERE o.id = :id")
void lockObject(#Param("id") long id);
#Query("select min(id) from MyObject a where a.state='AVAILABLE'")
Integer minId();
}
Service:
#Transactional(isolation=Isolation.READ_COMMITTED)
public MyObject findFirstAvailable(){
Integer minId;
if ((minId = repo.minId()) != null){
repo.lockObject(minId);
return repo.findOne(minId);
}
return null;
}
I suggest to use multiple transactions plus Optimistic Locking.
Make sure your entity has an attribute annotated with #Version.
In the first transaction load the entity, mark it as USED, close the transaction.
This will flush and commit the changes and make sure nobody else touched the entity in the mean time.
In the second transaction you can no do whatever you want to do with the entity.
For these small transactions I find it clumsy to move them to separate methods so I can use #Transactional. I therefore use the TransactionTemplate instead.
Let's say I have a List of entities:
List<SomeEntity> myEntities = new ArrayList<>();
SomeEntity.java:
#Entity
#Table(name = "entity_table")
public class SomeEntity{
#Id
#GeneratedValue(strategy = GenerationType.AUTO)
private long id;
private int score;
public SomeEntity() {}
public SomeEntity(long id, int score) {
this.id = id;
this.score = score;
}
MyEntityRepository.java:
#Repository
public interface MyEntityRepository extends JpaRepository<SomeEntity, Long> {
List<SomeEntity> findAllByScoreGreaterThan(int Score);
}
So when I run:
myEntityRepository.findAllByScoreGreaterThan(10);
Then Hibernate will load all of the records in the table into memory for me.
There are millions of records, so I don't want that. Then, in order to intersect, I need to compare each record in the result set to my List.
In native MySQL, what I would have done in this situation is:
create a temporary table and insert into it the entities' ids from the List.
join this temporary table with the "entity_table", use the score filter and then only pull the entities that are relevant to me (the ones that were in the list in the first place).
This way I gain a big performance increase, avoid any OutOfMemoryErrors and have the machine of the database do most of the work.
Is there a way to achieve such an outcome with Spring Data JPA's query methods (with hibernate as the JPA provider)? I couldn't find in the documentation or in SO any such use case.
I understand you have a set of entity_table identifiers and you want to find each entity_table whose identifier is in that subset and whose score is greater than a given score.
So the obvious question is: how did you arrive to the initial subset of entity_tables and couldn't you just add the criteria of that query to your query that also checks for "score is greater than x"?
But if we ignore that, I think there's two possible solutions. If the list of some_entity identifiers is small (what exactly is "small" depends on your database), you could just use an IN clause and define your method as:
List<SomeEntity> findByScoreGreaterThanAndIdIn(int score, Set<Long) ids)
If the number of identifiers is too large to fit in an IN clause (or you're worried about the performance of using an IN clause) and you need to use a temporary table, the recipe would be:
Create an entity that maps to your temporary table. Create a Spring Data JPA repository for it:
class TempEntity {
#Id
private Long entityId;
}
interface TempEntityRepository extends JpaRepository<TempEntity,Long> { }
Use its save method to save all the entity identifiers into the temporary table. As long as you enable insert batching this should perform all right -- how to enable differs per database and JPA provider, but for Hibernate at the very least set the hibernate.jdbc.batch_size Hibernate property to a sufficiently large value. Also flush() and clear() your entityManager regularly or all your temp table entities will accumulate in the persistence context and you'll still run out of memory. Something along the lines of:
int count = 0;
for (SomeEntity someEntity : myEntities) {
tempEntityRepository.save(new TempEntity(someEntity.getId());
if (++count == 1000) {
entityManager.flush();
entityManager.clear();
}
}
Add a find method to your SomeEntityRepository that runs a native query that does the select on entity_table and joins to the temp table:
#Query("SELECT id, score FROM entity_table t INNER JOIN temp_table tt ON t.id = tt.id WHERE t.score > ?1", nativeQuery = true)
List<SomeEntity> findByScoreGreaterThan(int score);
Make sure you run both methods in the same transaction, so create a method in a #Service class that you annotate with #Transactional(Propagation.REQUIRES_NEW) that calls both repository methods in succession. Otherwise your temp table's contents will be gone by the time the SELECT query runs and you'll get zero results.
You might be able to avoid native queries by having your temp table entity have a #ManyToOne to SomeEntity since then you can join in JPQL; I'm just not sure if you'll be able to avoid actually loading the SomeEntitys to insert them in that case (or if creating a new SomeEntity with just an ID would work). But since you say you already have a list of SomeEntity that's perhaps not a problem.
I need something similar myself, so will amend my answer as I get a working version of this.
You can:
1) Make a paginated native query via JPA (remember to add an order clause to it) and process a fixed amount of records
2) Use a StatelessSession (see the documentation)