I have an #Entity containing a few #OneToMany relationships, but since they consist of collections of Enums, I'm using #ElementCollection. The entity has an id that gets generated at the database level (MySQL).
Here is a small example I just made up that corresponds to the structure of my entity.
#Entity
public class Student {
#Id #GeneratedValue(strategy = GenerationType.IDENTITY)
private Integer id;
#ElementCollection(targetClass = Language.class)
#CollectionTable(name="student_languages", joinColumns=#JoinColumn(name="student_id"))
private Set<Language> languages;
#ElementCollection(targetClass = Module.class)
#CollectionTable(name="student_modules", joinColumns=#JoinColumn(name="student_id"))
private Set<Module> modules;
#ElementCollection(targetClass = SeatPreference.class)
#CollectionTable(name="student_seats", joinColumns=#JoinColumn(name="student_id"))
private Set<SeatPreference> seatPreference;
[...]
}
I know that GenerationType.IDENTITY deactivates batching, but I thought that would be the case for the main entity only, not for the single properties too. I'm havin to bulk import a few entities (~20k), each with a handful of properties, but Hibernate seems to be generating one insert for each property in the sets, making the import impossibly slow (between 10 and 20 inserts for each record).
I have now spent so long trying to make this faster, that I'm considering just generating an SQL file that I can manually import in the database.
Is there no way to instruct Hibernate to batch inserts the #ElementCollection fields? Am I doing something wrong?
Basically, seem hibernate will not help with #ElementCollection batching but you can use the SQL bulk inserts.
Seems you are on MySQL which does support the bulk inserts and its JDBC driver can automatically modify / rewrite the individual insert statements into one bulk statement if you enable the rewriteBatchedStatements property.
So in your case what you need to do is tell hibernate to enable batching and order the batch inserts and updates.
hibernate.jdbc.batch_size=100
hibernate.order_inserts=true
hibernate.order_updates=true
This will ensure that when inserting the data into DB the inserts statements generated by Hibernate will be executed in a batch and they will be ordered.
So the SQL generated by Hibernate will be something like this:
insert into student_languages (student_id, languages) values (1,1)
insert into student_languages (student_id, languages) values (1,2)
insert into student_languages (student_id, languages) values (1,3)
insert into student_languages (student_id, languages) values (1,4)
Next, you will need to tell the JDBC driver to rewrite the individual inserts into the bulk insert by setting the rewriteBatchedStatements=true
jdbc:mysql://db:3306/stack?useSSL=false&rewriteBatchedStatements=true
So this will instruct the driver to rewrite the inserts into bulk form, so the above several SQL statements will be rewritten into something like this
insert into student_languages (student_id, languages) values (1,1),(1,2),(1,3),(1,4)
Just as an info this may not work if you are using old versions of the MySQL driver and Hibernate.
I tested this both with MySQL and MariaDB and actually Hibernate does batch inserts into the collection table. But it's not visible to the naked eye, you have to use DataSource-Proxy to see it:
INFO com.example.jpa.AddStudents - Adding students
DEBUG n.t.d.l.l.SLF4JQueryLoggingListener -
Name:dataSource, Connection:3, Time:1, Success:True
Type:Prepared, Batch:False, QuerySize:1, BatchSize:0
Query:["insert into student (name) values (?)"]
Params:[(Smith)]
DEBUG n.t.d.l.l.SLF4JQueryLoggingListener -
Name:dataSource, Connection:3, Time:1, Success:True
Type:Prepared, Batch:False, QuerySize:1, BatchSize:0
Query:["insert into student (name) values (?)"]
Params:[(Snow)]
DEBUG n.t.d.l.l.SLF4JQueryLoggingListener -
Name:dataSource, Connection:3, Time:78, Success:True
Type:Prepared, Batch:True, QuerySize:1, BatchSize:6
Query:["insert into student_languages (student_id, language) values (?, ?)"]
Params:[(6,2),(6,0),(6,1),(7,0),(7,4),(7,3)]
INFO com.example.jpa.AddStudents - Added
The SEQUENCE ID generator is considered the best for Hibernate. It doesn't create lock contention as the TABLE generator does and allows for batching. It is unfortunate that MySQL doesn't support sequences still (MariaDB does).
Am I doing something wrong?
Hibernate is optimized for small-scale changes in the database. It maintains a first-level cache and also supports a second-level cache which will only hinder performance for large-scale operations. Therefore, indeed, you might be better off using JDBC or jOOQ for this particular operation as was suggested in the comments.
I used MySQL 8.0.3, MariaDB 10.5.13 and Hibernate 5.6.3.Final.
Related
just a quick question please in case something stands out immediately.
We're migrating an EAR/EJB application from Weblogic 11g to latest WS Liberty (22.x) also upgrading several of the frameworks including JPA to 2.2. This also changes JPA implementation to eclipseLink. We came from com.oracle.weblogic.11g.modules:javax.persistence:1.0.0.0_1-0-2. Underlying DB is MS-SQL Server.
And I'm running into some weirdness with regards to related objects not being resolved/queried intermittently.
Just as an example we have entities where the columns hold reference data codes or similar lookups. Say I have an entity called PayemntRecordT and it has a status code which refers to a ref table that also holds a textual description. Something like this:
SQL:
CREATE TABLE [PAYMENT_RECORD_T](
[PAYMENT_ID] [int] NOT NULL,
...
[PAYMENT_STATUS_CD] [CHAR](8) NOT NULL,
...
)
ALTER TABLE [PAYMENT_RECORD_T] WITH CHECK ADD CONSTRAINT [FK_PAYM4] FOREIGN KEY([PAYMENT_STATUS_CD])
REFERENCES [RECORD_STATUS_T] ([REC_STAT_CD])
GO
CREATE TABLE [RECORD_STATUS_T] (
[RECORD_STAT_CD] [CHAR](8) NOT NULL,
[RECORD_STAT_DSC] [VARCHAR](60) NOT NULL
CONSTRAINT [PK_RECORD_STATUS_T] PRIMARY KEY CLUSTERED (
[RECORD_STAT_CD] ASC
)WITH (PAD_INDEX = OFF...) ON [PRIMARY]
) ON [PRIMARY]
GO
Java:
#Table(name = "PAYMENT_RECORD_T")
#Entity
public class PaymentRecordT {
...
#ManyToOne
#PrimaryKeyJoinColumn(name = "payment_status_cd", referencedColumnName = "REC_STAT_CD")
private RecordStatusT recordStatusT;
}
#Table(name = "RECORD_STATUS_T")
#Entity
public class RecordStatusT {
#Column(name = "REC_STAT_CD")
#Id
private String recStatCd;
#Column(name = "REC_STAT_DSC")
#Basic
private String recStatDsc;
}
Others relations in our app might not be primary key relations but loose relations in which case its just #JoinColumn but the pattern would be the same.
My 'weirdness' is the following:
So in this example I have a list of 10 'Payment Records' each of them have such a record status, which is actually NON NULL in the database. When I do the initial retrieval via EJB method it grabs the 10 records and I also get the correctly resolved/queried record statuses.
Then I add a new record via EJB method (TRANSACTION_REQUIERD). After the add method returns I can query the new payment record in the database via SSMS. Its committed and it looks 100% correct and it contains a correct record status code.
Now I run the retrieval method again and I get the 11 records as I would expect. Only the 11th (newly inserted) record will have recordStatusT as null.
When I restart the app all goes well again for the retrieval of all 11 records. But for subsequent additions the outcome seems again 'undefined'.
In JDBC logging I an see that during the original retrieval of the records the record_status_t table was queried but the 2nd time around it was not and I have no explanation why.
I played with FETCHTYPE.EAGER and read up on caching etc but I'm not going anywhere.
Any ideas?
Thanks for your time
Carsten
I solved the problem by ensuring that after inserts/updates the objects arent being queried from the cache.
In the end - rather than doing it with query hint - I disabled caching for the entity involved using the #Chacheable annotation, like so
#Table(name = "PAYMENT_RECORD_T")
#Entity
#Cacheable(false)
public class PaymentRecordT {
...
#ManyToOne
#PrimaryKeyJoinColumn(name = "payment_status_cd", referencedColumnName = "REC_STAT_CD")
private RecordStatusT recordStatusT;
}
I still feel like there should be a better solution. Eclipselink tracks the inserts/updates so it should be able track what needs rereading from the DB and what not. I still feel like I don't fully understand the entire picture, but this works for me and its reasonably clean.
I can leave the considerable amount of read-only data/objects chacheable and the few that are changeable as non-cacheable.
Thanks for reading
Carsten
I have configured hibernate to batch insert/update entities via the following properties:
app.db.props.hibernate.jdbc.batch_size=50
app.db.props.hibernate.batch_versioned_data=true
app.db.props.hibernate.order_inserts=true
app.db.props.hibernate.order_updates=true
(Ignore the app.db.props prefix, it is removed by Spring) I can confirm that the properties are making it to hibernate because simple batches work as expected, confirmed by logging via the datasource directly. The proxy below produces logging to show that batches are happening.
ProxyDataSourceBuilder.create(dataSource)
.asJson().countQuery()
.logQueryToSysOut().build();
Logs (notice batchSize)...
{"name":"", "connection":121, "time":1, "success":true, "type":"Prepared", "batch":true, "querySize":1, "batchSize":18, "query":["update odm.status set first_timestamp=?, last_timestamp=?, removed=?, conformant=?, event_id=?, form_id=?, frozen=?, group_id=?, item_id=?, locked=?, study_id=?, subject_id=?, verified=? where id=?"], "params":[...]}
However when inserting a more complex object model, involving a hierarchy of 1-* relationships, hibernate is not ordering inserts (and thus not batching). With a model like EntityA -> EntityB -> EntityC, hibernate is inserting each parent and child and then iterating, rather than batching each entity class.
I.e. what I see is multiple inserts for each type...
insert EntityA...
insert EntityB...
insert EntityC...
insert EntityA...
insert EntityB...
insert EntityC...
repeat...
But what I would expect is a single iteration, using a bulk insert, for each type.
It seems like the cascading relationship is preventing the ordering of inserts (and the bulk insert), but I can't figure out why. Hibernate should be capable of understanding that all instances of EntityA can be inserted and once, then EntityB, and so on.
I have a project to maintain, The persistance layer of this project uses JPA and Hibernate and it is running on a MySQL server, the database is not relational and the engine is MyISAM on all tables.
I have some foreignkey relationshps mapped as #ManyToOne relationship on my entities.
Now the problem is that some of those columns are supposed to be foreignkeys in order to be mapped right, but they aren't (since the engine is MyISAM, and the DB is only relational on theory), some of these columns have wrong values like (negative ones -1 , 0 , inexistant dead parents).
#Entity
public class EntityA {
#ManyToOne
#JoinColumn(name="COL_FK")
private EntityB b;
}
In the DB, Possible values for COL_FK are : 0,-1,DEAD PARENTS
I can't neither change the db structure nor edit the
the data within the columns.All I can do is change the code.
How can I tell Hibernate to ignore those values and not throw a RuntimeException while I'm getting list just because one of its element contains a wrong foreingkey value.
Thanks.
UPDATE:
#Embeddable
public class EntityA {
#ManyToOne()
#JoinColumn(name = "idClient")
#NotFound(action = NotFoundAction.IGNORE)
private ClientBO idClient;
}
StackTrace :
AVERTISSEMENT: org.springframework.orm.jpa.JpaObjectRetrievalFailureException: Unable to find xx.xxx.xx.xxx.ClientBO with id 210; nested exception is javax.persistence.EntityNotFoundException: Unable to find xx.xx.xx.xxx.ClientBO with id 210
Annotate your association with
#NotFound(action=NotFoundAction.IGNORE)
Note that this is one more hack on top of an already ugly solution though. Hibernate heavily relies on transactions (as it should) and MyISAM, AFAIK, doesn't support transactions. I guess you already knows it, but fixing the database would be a much better choice.
ALTER TABLE ... ENGINE=InnoDB.
MyISAM accepts the syntax for FOREIGN KEYs, but does not implement them. It also ignores any commands (like COMMIT) relating to transactions.
MyISAM does handle "relations". It handles INDEXes and JOINs. It just doesn't do the extra stuff that FOREIGN KEYs provide.
I'm working with JPA2 and Hibernate 3, using MySQL for a database. There is class a TestB as follows:
#SQLInsert(sql = "INSERT IGNORE INTO testB (....) VALUES (?,?,?,?,?,?) ;")
class TestB{
#GeneratedValue(strategy=GenerationType.IDENTITY)
#ID
private long id;
#Column(unique)
String ccc;
}
For the transaction start:
#Transactional
List<TestB> list = ...
repoitory.save(list);
But unfortunately, this is a bulk insert, so I cannot save all of the data in memory. What I chose to do is, just pass the data to the database, and the database decides what to do.
For pure SQL, INSERT IGNORE works just fine. But for Hibernate, I tried two things.
1. Insert Ignore
2. Insert .. on duplicate update (...)
Neither work. The errors for each are,
1. The database returned no natively generated identity value.
2. NonUniqueObject Exception.
Both happen for the duplicate entry, not the first one.
I assume that the first error occurred because after insert, Hibernate should assign an ID to the proxy object, but it can't. I assume the second error occurred because two objects with same ID cannot existed in the same session.
How can I resolve this issue?
In the first case, hibernate tries to insert data into the table if there is no constraint violations or sql errors. If there are any exceptions (say unique key violations.. ) since you have used INSERT IGNORE DB doesn't insert anything into the DB, so there is no ID which is generated natively; Hibernate throws a system Exception with the error The database returned no natively generated identity value.
One of the Solutions: Well one way of handling this error is to catch HibernateSystemException, which is thrown when insert fails and is ignored by MySQL DB.
Since this is an hibernate system exception, Hibernate internally seems to mark the transaction for rollback even if one of the ignore fails. I am trying to find a solution to this as well.
I have a couple of objects that are mapped to tables in a database using Hibernate, BatchTransaction and Transaction. BatchTransaction's table (batch_transactions) has a foreign key reference to transactions, named transaction_id.
In the past I have used a batch runner that used internal calls to run the batch transactions and complete the reference from BatchTransaction to Transaction once the transaction is complete. After a Transaction has been inserted, I just call batchTransaction.setTransaction(txn), so I have a #ManyToOne mapping from BatchTransaction to Transaction.
I am changing the batch runner so that it executes its transactions through a Web service. The ID of the newly inserted Transaction will be returned by the service and I'll want to update transaction_id in BatchTransaction directly (rather than using the setter for the Transaction field on BatchTransaction, which would require me to load the newly inserted item unnecessarily).
It seems like the most logical way to do it is to use SQL rather than Hibernate, but I was wondering if there's a more elegant approach. Any ideas?
Here's the basic mapping.
BatchQuery.java
#Entity
#Table(name = "batch_queries")
public class BatchQuery
{
#ManyToOne
#JoinColumn(name = "query_id")
public Query getQuery()
{
return mQuery;
}
}
Query.java
#Entity
#Table(name = "queries")
public class Query
{
}
The idea is to update the query_id column in batch_queries without setting the "query" property on a BatchQuery object.
Using a direct SQL update, or an HQL update, is certainly feasible.
Not seeing the full problem, it looks to me like you might be making a modification to your domain that's worth documenting in your domain. You may be moving to having a BatchTransaction that has as a member just the TransactionId and not the full transaction.
If in other activities, the BatchTransaction will still be needing to hydrate that Transaction, I'd consider adding a separate mapping for the TransactionId, and having that be the managing mapping (make the Transaction association update and insert false).
If BatchTransaction will no longer be concerned with the full Transaction, just remove that association after adding a the TransactionId field.
As you have writeen, we can use SQL to achieve solution for above problem. But i will suggest not to update the primary keys via SQL.
Now, as you are changing the key, which means you are creating alltogether a new object, for this, you can first delete the existing object, with the previous key, and then try to insert a new object with the updated key(in your case transaction_id)