I have configured hibernate to batch insert/update entities via the following properties:
app.db.props.hibernate.jdbc.batch_size=50
app.db.props.hibernate.batch_versioned_data=true
app.db.props.hibernate.order_inserts=true
app.db.props.hibernate.order_updates=true
(Ignore the app.db.props prefix, it is removed by Spring) I can confirm that the properties are making it to hibernate because simple batches work as expected, confirmed by logging via the datasource directly. The proxy below produces logging to show that batches are happening.
ProxyDataSourceBuilder.create(dataSource)
.asJson().countQuery()
.logQueryToSysOut().build();
Logs (notice batchSize)...
{"name":"", "connection":121, "time":1, "success":true, "type":"Prepared", "batch":true, "querySize":1, "batchSize":18, "query":["update odm.status set first_timestamp=?, last_timestamp=?, removed=?, conformant=?, event_id=?, form_id=?, frozen=?, group_id=?, item_id=?, locked=?, study_id=?, subject_id=?, verified=? where id=?"], "params":[...]}
However when inserting a more complex object model, involving a hierarchy of 1-* relationships, hibernate is not ordering inserts (and thus not batching). With a model like EntityA -> EntityB -> EntityC, hibernate is inserting each parent and child and then iterating, rather than batching each entity class.
I.e. what I see is multiple inserts for each type...
insert EntityA...
insert EntityB...
insert EntityC...
insert EntityA...
insert EntityB...
insert EntityC...
repeat...
But what I would expect is a single iteration, using a bulk insert, for each type.
It seems like the cascading relationship is preventing the ordering of inserts (and the bulk insert), but I can't figure out why. Hibernate should be capable of understanding that all instances of EntityA can be inserted and once, then EntityB, and so on.
Related
I have an #Entity containing a few #OneToMany relationships, but since they consist of collections of Enums, I'm using #ElementCollection. The entity has an id that gets generated at the database level (MySQL).
Here is a small example I just made up that corresponds to the structure of my entity.
#Entity
public class Student {
#Id #GeneratedValue(strategy = GenerationType.IDENTITY)
private Integer id;
#ElementCollection(targetClass = Language.class)
#CollectionTable(name="student_languages", joinColumns=#JoinColumn(name="student_id"))
private Set<Language> languages;
#ElementCollection(targetClass = Module.class)
#CollectionTable(name="student_modules", joinColumns=#JoinColumn(name="student_id"))
private Set<Module> modules;
#ElementCollection(targetClass = SeatPreference.class)
#CollectionTable(name="student_seats", joinColumns=#JoinColumn(name="student_id"))
private Set<SeatPreference> seatPreference;
[...]
}
I know that GenerationType.IDENTITY deactivates batching, but I thought that would be the case for the main entity only, not for the single properties too. I'm havin to bulk import a few entities (~20k), each with a handful of properties, but Hibernate seems to be generating one insert for each property in the sets, making the import impossibly slow (between 10 and 20 inserts for each record).
I have now spent so long trying to make this faster, that I'm considering just generating an SQL file that I can manually import in the database.
Is there no way to instruct Hibernate to batch inserts the #ElementCollection fields? Am I doing something wrong?
Basically, seem hibernate will not help with #ElementCollection batching but you can use the SQL bulk inserts.
Seems you are on MySQL which does support the bulk inserts and its JDBC driver can automatically modify / rewrite the individual insert statements into one bulk statement if you enable the rewriteBatchedStatements property.
So in your case what you need to do is tell hibernate to enable batching and order the batch inserts and updates.
hibernate.jdbc.batch_size=100
hibernate.order_inserts=true
hibernate.order_updates=true
This will ensure that when inserting the data into DB the inserts statements generated by Hibernate will be executed in a batch and they will be ordered.
So the SQL generated by Hibernate will be something like this:
insert into student_languages (student_id, languages) values (1,1)
insert into student_languages (student_id, languages) values (1,2)
insert into student_languages (student_id, languages) values (1,3)
insert into student_languages (student_id, languages) values (1,4)
Next, you will need to tell the JDBC driver to rewrite the individual inserts into the bulk insert by setting the rewriteBatchedStatements=true
jdbc:mysql://db:3306/stack?useSSL=false&rewriteBatchedStatements=true
So this will instruct the driver to rewrite the inserts into bulk form, so the above several SQL statements will be rewritten into something like this
insert into student_languages (student_id, languages) values (1,1),(1,2),(1,3),(1,4)
Just as an info this may not work if you are using old versions of the MySQL driver and Hibernate.
I tested this both with MySQL and MariaDB and actually Hibernate does batch inserts into the collection table. But it's not visible to the naked eye, you have to use DataSource-Proxy to see it:
INFO com.example.jpa.AddStudents - Adding students
DEBUG n.t.d.l.l.SLF4JQueryLoggingListener -
Name:dataSource, Connection:3, Time:1, Success:True
Type:Prepared, Batch:False, QuerySize:1, BatchSize:0
Query:["insert into student (name) values (?)"]
Params:[(Smith)]
DEBUG n.t.d.l.l.SLF4JQueryLoggingListener -
Name:dataSource, Connection:3, Time:1, Success:True
Type:Prepared, Batch:False, QuerySize:1, BatchSize:0
Query:["insert into student (name) values (?)"]
Params:[(Snow)]
DEBUG n.t.d.l.l.SLF4JQueryLoggingListener -
Name:dataSource, Connection:3, Time:78, Success:True
Type:Prepared, Batch:True, QuerySize:1, BatchSize:6
Query:["insert into student_languages (student_id, language) values (?, ?)"]
Params:[(6,2),(6,0),(6,1),(7,0),(7,4),(7,3)]
INFO com.example.jpa.AddStudents - Added
The SEQUENCE ID generator is considered the best for Hibernate. It doesn't create lock contention as the TABLE generator does and allows for batching. It is unfortunate that MySQL doesn't support sequences still (MariaDB does).
Am I doing something wrong?
Hibernate is optimized for small-scale changes in the database. It maintains a first-level cache and also supports a second-level cache which will only hinder performance for large-scale operations. Therefore, indeed, you might be better off using JDBC or jOOQ for this particular operation as was suggested in the comments.
I used MySQL 8.0.3, MariaDB 10.5.13 and Hibernate 5.6.3.Final.
There is a method which returns entity from database by JPA. This entity has list for other entities, fetch type is LAZY. When I want to add object to this list I got exception:
Caused by: Exception [EclipseLink-7242] (Eclipse Persistence Services - 2.5.2.v20140319-9ad6abd): org.eclipse.persistence.exceptions.ValidationException
Exception Description: An attempt was made to traverse a relationship using indirection that had a null Session. This often occurs when an entity with an uninstantiated LAZY relationship is serialized and that lazy relationship is traversed after serialization. To avoid this issue, instantiate the LAZY relationship prior to serialization.
So in order to overcome this I can initialize this list by doing .size() on it. The thing is I don't really need these objects to be fetched from database so I would like to do something like this:
fetchedEntity.setMyLazyFetchList(new ArrayList<>()); which works fine. I can further access my list, but the problem is as following: set method invokes the same select queries as fetchedEntity.getMyLazyFetchList().size() does. These queries are useless as I set value to a new list, so why are they invoked?
Method fetching entity
public Competitor findAndInitializeEmptyGroups(Integer idCompetitor) {
Competitor entity = em.find(Competitor.class, idCompetitor);
System.out.println("Before set ");
entity.setGroupCompetitorList(new ArrayList<>());
System.out.print("After set lazy list size ");
System.out.print(entity.getGroupCompetitorList().size());
return entity;
}
Lazy fetch list field in entity (Competitor)
#OneToMany(mappedBy = "idCompetitor")
private List<GroupCompetitor> groupCompetitorList = new ArrayList<>();
Second end relationship field (GroupCompetitor)
#JoinColumn(name = "id_competitor", referencedColumnName = "id_competitor")
#ManyToOne(optional = false)
private Competitor idCompetitor;
What logs say:
Info: Before set
Fine: SELECT id_group_competitor, id_competitor, id_group_details FROM group_competitor WHERE (id_competitor = ?)
bind => [43]
Fine: SELECT id_group_details, end_date, start_date, version, id_competition, id_group_name FROM group_details WHERE (id_group_details = ?)
bind => [241]
...
many more SELECTs
Info: After set lazy list size
Info: 0
After replacing line
entity.setGroupCompetitorList(new ArrayList<>());
with
entity.getGroupCompetitorList().size();
And logs (they are the same except the list now consists of fetched entities):
Info: Before set
Fine: SELECT id_group_competitor, id_competitor, id_group_details FROM group_competitor WHERE (id_competitor = ?)
bind => [43]
Fine: SELECT id_group_details, end_date, start_date, version, id_competition, id_group_name FROM group_details WHERE (id_group_details = ?)
bind => [241]
...
many more SELECTs
Info: After set lazy list size
Info: 44
So my question is: why SELECT queries are invoked when I do entity.setGroupCompetitorList(new ArrayList<>());? I don't want them for the performance reasons. Is there any way to eliminate this issue or what exactly causes this behavior?
Using:
EclipseLink JPA 2.1
GlassFish 4.1
Java 8
You can't not fetch a list that is a member of an entity if you want to add an element and have the JPA provider persist it. The JPA provider has to track the owner, the ownee, and handle any cascading (which I don't see you have defined but I doubt there's a different code path for each combination of cascading options). The simplest way would be to have the list in memory and then decide what operation to perform on the DB at commit/flush time.
I believe the cause of your original exception about traversing a LAZY is due to accessing outside on a managed context. Once you return from an EJB method, the entity you're returning becomes detached. You have to reattach it to another EntityManager or make sure all the lazy relationships you're about to use have been loaded before you leave the method. Calling fetchedEntity.getMyLazyFetchList().size() would be an example of that and works fine in a single entity case. If you want to force the load of a LAZY in a list of entities I suggest you read up on LEFT JOIN FETCH clauses. I'm assuming here that your findAndInitializeEmptyGroups() method is in an EJB, judging by what looks to me like an injected EntitManager em in that method, and that the methods will get the default #TransactionAttribute(REQUIRED) treatment since I don't see any annotations to the contrary.
Now, let's go back to your original problem:
I want to add object to this list
The problem you're trying to solve is to add an element to a list without fetching the entire list. You're using a mappedBy attribute, meaning you've created a bidirectional relationship. If getGroupCompetitorList() returns an unordered list (a 'bag' in Hibernate speak) then you don't have to load the list at all. Try something like this:
Change GroupCompetitor's Integer idCompetitor to a #ManyToOne(fetch=FetchType.LAZY) Competitor competitor. Adjust the getters and setters accordingly.
Change Competitor's groupCompetitorList mappedBy to competitor. Add get/set methods.
Then you can add to the list from the child side with a method like this in the EJB:
public void addNewGroupCompetitorToCompetitor(Competitor comp, GroupCompetitor gComp) {
gComp.setCompetitor(comp);
em.persist(gComp);
em.flush();
}
The next time you fetch the Competitor again it and traverse entity.getGroupCompetitorList() (while managed by an EntityManager) it should have the new GroupCompetitor you've added. This kind of thing gets more complicated depending if comp is a new entity that has not been persisted, but that's the basic idea. It might need some adjustment to work correctly with EclipseLink, but I do the same kind of operation with Hibernate as the JPA provider and it works.
This question already has answers here:
Spring Data JPA Update #Query not updating?
(5 answers)
Closed 2 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Let's suppose to have this situation:
We have Spring Data configured in the standard way, there is a Respository object, an Entity object and all works well.
Now for some complex motivations I have to use EntityManager (or JdbcTemplate, whatever is at a lower level than Spring Data) directly to update the table associated to my Entity, with a native SQL query. So, I'm not using Entity object, but simply doing a database update manually on the table I use as entity (it's more correct to say the table from which I get values, see next rows).
The reason is that I had to bind my spring-data Entity to a MySQL view that makes UNION of multiple tables, not directly to the table I need to update.
What happens is:
In a functional test, I call the "manual" update method (on table from which the MySQL view is created) as previously described (through entity-manager) and if I make a simple Respository.findOne(objectId), I get the old object (not updated one). I have to call Entitymanager.refresh(object) to get the updated object.
Why?
Is there a way to "synchronize" (out of the box) objects (or force some refresh) in spring-data? Or am I asking for a miracle?
I'm not ironical, but maybe I'm not so expert, maybe (or probably) is my ignorance. If so please explain me why and (if you want) share some advanced knowledge about this amazing framework.
If I make a simple Respository.findOne(objectId) I get old object (not
updated one). I've to call Entitymanager.refresh(object) to get
updated object.
Why?
The first-level cache is active for the duration of a session. Any object entity previously retrieved in the context of a session will be retrieved from the first-level cache unless there is reason to go back to the database.
Is there a reason to go back to the database after your SQL update? Well, as the book Pro JPA 2 notes (p199) regarding bulk update statements (either via JPQL or SQL):
The first issue for developers to consider when using these [bulk update] statements
is that the persistence context is not updated to reflect the results
of the operation. Bulk operations are issued as SQL against the
database, bypassing the in-memory structures of the persistence
context.
which is what you are seeing. That is why you need to call refresh to force the entity to be reloaded from the database as the persistence context is not aware of any potential modifications.
The book also notes the following about using Native SQL statements (rather than JPQL bulk update):
■ CAUTION Native SQL update and delete operations should not be
executed on tables mapped by an entity. The JP QL operations tell the
provider what cached entity state must be invalidated in order to
remain consistent with the database. Native SQL operations bypass such
checks and can quickly lead to situations where the inmemory cache is
out of date with respect to the database.
Essentially then, should you have a 2nd level cache configured then updating any entity currently in the cache via a native SQL statement is likely to result in stale data in the cache.
In Spring Boot JpaRepository:
If our modifying query changes entities contained in the persistence context, then this context becomes outdated.
In order to fetch the entities from the database with latest record.
Use #Modifying(clearAutomatically = true)
#Modifying annotation has clearAutomatically attribute which defines whether it should clear the underlying persistence context after executing the modifying query.
Example:
#Modifying(clearAutomatically = true)
#Query("UPDATE NetworkEntity n SET n.network_status = :network_status WHERE n.network_id = :network_id")
int expireNetwork(#Param("network_id") Integer network_id, #Param("network_status") String network_status);
Based on the way you described your usage, fetching from the repo should retrieve the updated object without the need to refresh the object as long as the method which used the entity manager to merge has #transactional
here's a sample test
#DirtiesContext(classMode = ClassMode.AFTER_CLASS)
#RunWith(SpringJUnit4ClassRunner.class)
#ContextConfiguration(classes = ApplicationConfig.class)
#EnableJpaRepositories(basePackages = "com.foo")
public class SampleSegmentTest {
#Resource
SampleJpaRepository segmentJpaRepository;
#PersistenceContext
private EntityManager entityManager;
#Transactional
#Test
public void test() {
Segment segment = new Segment();
ReflectionTestUtils.setField(segment, "value", "foo");
ReflectionTestUtils.setField(segment, "description", "bar");
segmentJpaRepository.save(segment);
assertNotNull(segment.getId());
assertEquals("foo", segment.getValue());
assertEquals("bar",segment.getDescription());
ReflectionTestUtils.setField(segment, "value", "foo2");
entityManager.merge(segment);
Segment updatedSegment = segmentJpaRepository.findOne(segment.getId());
assertEquals("foo2", updatedSegment.getValue());
}
}
I'm working with JPA2 and Hibernate 3, using MySQL for a database. There is class a TestB as follows:
#SQLInsert(sql = "INSERT IGNORE INTO testB (....) VALUES (?,?,?,?,?,?) ;")
class TestB{
#GeneratedValue(strategy=GenerationType.IDENTITY)
#ID
private long id;
#Column(unique)
String ccc;
}
For the transaction start:
#Transactional
List<TestB> list = ...
repoitory.save(list);
But unfortunately, this is a bulk insert, so I cannot save all of the data in memory. What I chose to do is, just pass the data to the database, and the database decides what to do.
For pure SQL, INSERT IGNORE works just fine. But for Hibernate, I tried two things.
1. Insert Ignore
2. Insert .. on duplicate update (...)
Neither work. The errors for each are,
1. The database returned no natively generated identity value.
2. NonUniqueObject Exception.
Both happen for the duplicate entry, not the first one.
I assume that the first error occurred because after insert, Hibernate should assign an ID to the proxy object, but it can't. I assume the second error occurred because two objects with same ID cannot existed in the same session.
How can I resolve this issue?
In the first case, hibernate tries to insert data into the table if there is no constraint violations or sql errors. If there are any exceptions (say unique key violations.. ) since you have used INSERT IGNORE DB doesn't insert anything into the DB, so there is no ID which is generated natively; Hibernate throws a system Exception with the error The database returned no natively generated identity value.
One of the Solutions: Well one way of handling this error is to catch HibernateSystemException, which is thrown when insert fails and is ignored by MySQL DB.
Since this is an hibernate system exception, Hibernate internally seems to mark the transaction for rollback even if one of the ignore fails. I am trying to find a solution to this as well.
I have a couple of objects that are mapped to tables in a database using Hibernate, BatchTransaction and Transaction. BatchTransaction's table (batch_transactions) has a foreign key reference to transactions, named transaction_id.
In the past I have used a batch runner that used internal calls to run the batch transactions and complete the reference from BatchTransaction to Transaction once the transaction is complete. After a Transaction has been inserted, I just call batchTransaction.setTransaction(txn), so I have a #ManyToOne mapping from BatchTransaction to Transaction.
I am changing the batch runner so that it executes its transactions through a Web service. The ID of the newly inserted Transaction will be returned by the service and I'll want to update transaction_id in BatchTransaction directly (rather than using the setter for the Transaction field on BatchTransaction, which would require me to load the newly inserted item unnecessarily).
It seems like the most logical way to do it is to use SQL rather than Hibernate, but I was wondering if there's a more elegant approach. Any ideas?
Here's the basic mapping.
BatchQuery.java
#Entity
#Table(name = "batch_queries")
public class BatchQuery
{
#ManyToOne
#JoinColumn(name = "query_id")
public Query getQuery()
{
return mQuery;
}
}
Query.java
#Entity
#Table(name = "queries")
public class Query
{
}
The idea is to update the query_id column in batch_queries without setting the "query" property on a BatchQuery object.
Using a direct SQL update, or an HQL update, is certainly feasible.
Not seeing the full problem, it looks to me like you might be making a modification to your domain that's worth documenting in your domain. You may be moving to having a BatchTransaction that has as a member just the TransactionId and not the full transaction.
If in other activities, the BatchTransaction will still be needing to hydrate that Transaction, I'd consider adding a separate mapping for the TransactionId, and having that be the managing mapping (make the Transaction association update and insert false).
If BatchTransaction will no longer be concerned with the full Transaction, just remove that association after adding a the TransactionId field.
As you have writeen, we can use SQL to achieve solution for above problem. But i will suggest not to update the primary keys via SQL.
Now, as you are changing the key, which means you are creating alltogether a new object, for this, you can first delete the existing object, with the previous key, and then try to insert a new object with the updated key(in your case transaction_id)