JPA/Hibernate Bulk/Batch Insert using Mysql with Database Generated ID's - java

Okay, I've searched forever and I can't seem to find a good way of accomplishing batch inserts with JPA/Hibernate and MySql.
I want to be able to save/insert many records at once using JPA, but by default batching behavior is disabled if you use GenerationType.IDENTITY. I'm aware that you can switch to GenerationType.SEQUENCE, but that isn't available on MySql and creating new tables and using GenerationType.TABLE is not an option in my scenario.
So in the end, I need an efficient way of doing batch/bulk inserts using JPA/Hibernate, MySQL, and database generated IDs. I know it's possible to do this efficiently because I can do it with a JDBC connection, but I'd really like to not have to write my own JDBC queries for each of my repositories.
Anyone know how to accomplish this?
I'm okay if I'm unable to get the updated entities with the IDs back (think void saveAll() instead of List<User> saveAll()). My main requirement is this happens in one/two big queries instead of saving iteratively each entity like it does now when I call saveAll.
I can include more if needed, but my entity looks like this:
#Entity
#Builder
#Getter
#Setter
#With
#AllArgsConstructor
#NoArgsConstructor
#EqualsAndHashCode(callSuper = false, exclude = "id")
#Table(name = "user")
#ToString(callSuper = true, onlyExplicitlyIncluded = true)
public class User {
#Id
#ToString.Include
#GeneratedValue(strategy = GenerationType.IDENTITY)
#Column(name = "uID")
private long id;
private String name;
}

There is no way to accomplish JDBC batching on insert with Hibernate when using the identity generation strategy, because for Hibernate, every entity must have a PK value assigned after a persist/insert.
You can use Hibernate SPIs to implement this yourself though. Take a look at how Hibernate implements inserts here org.hibernate.persister.entity.AbstractEntityPersister#insert(java.lang.Object, java.lang.Object[], java.lang.Object, org.hibernate.engine.spi.SharedSessionContractImplementor). You can reduce the complexity if you want to implement this only for a few known entities that only use a handful of features.

IDENTITY generator disables JDBC batch insert of hibernate and JPA. since the sequence is not supported in MySQL, there is no way to bulk/batch insert records using MySQL and spring data JPA. Please read my blog on that. This is not the end of the road. we can use the JDBC template or query DSL-SQL. To see how to implement using query DSL-SQL click here. For the JDBC template click here.
If you need type-safe, easy to code choose query DSL-SQL else choose JDBC template

Related

Weblogic to Liberty w JPA upgrade - related entities intermittently not being queried

just a quick question please in case something stands out immediately.
We're migrating an EAR/EJB application from Weblogic 11g to latest WS Liberty (22.x) also upgrading several of the frameworks including JPA to 2.2. This also changes JPA implementation to eclipseLink. We came from com.oracle.weblogic.11g.modules:javax.persistence:1.0.0.0_1-0-2. Underlying DB is MS-SQL Server.
And I'm running into some weirdness with regards to related objects not being resolved/queried intermittently.
Just as an example we have entities where the columns hold reference data codes or similar lookups. Say I have an entity called PayemntRecordT and it has a status code which refers to a ref table that also holds a textual description. Something like this:
SQL:
CREATE TABLE [PAYMENT_RECORD_T](
[PAYMENT_ID] [int] NOT NULL,
...
[PAYMENT_STATUS_CD] [CHAR](8) NOT NULL,
...
)
ALTER TABLE [PAYMENT_RECORD_T] WITH CHECK ADD CONSTRAINT [FK_PAYM4] FOREIGN KEY([PAYMENT_STATUS_CD])
REFERENCES [RECORD_STATUS_T] ([REC_STAT_CD])
GO
CREATE TABLE [RECORD_STATUS_T] (
[RECORD_STAT_CD] [CHAR](8) NOT NULL,
[RECORD_STAT_DSC] [VARCHAR](60) NOT NULL
CONSTRAINT [PK_RECORD_STATUS_T] PRIMARY KEY CLUSTERED (
[RECORD_STAT_CD] ASC
)WITH (PAD_INDEX = OFF...) ON [PRIMARY]
) ON [PRIMARY]
GO
Java:
#Table(name = "PAYMENT_RECORD_T")
#Entity
public class PaymentRecordT {
...
#ManyToOne
#PrimaryKeyJoinColumn(name = "payment_status_cd", referencedColumnName = "REC_STAT_CD")
private RecordStatusT recordStatusT;
}
#Table(name = "RECORD_STATUS_T")
#Entity
public class RecordStatusT {
#Column(name = "REC_STAT_CD")
#Id
private String recStatCd;
#Column(name = "REC_STAT_DSC")
#Basic
private String recStatDsc;
}
Others relations in our app might not be primary key relations but loose relations in which case its just #JoinColumn but the pattern would be the same.
My 'weirdness' is the following:
So in this example I have a list of 10 'Payment Records' each of them have such a record status, which is actually NON NULL in the database. When I do the initial retrieval via EJB method it grabs the 10 records and I also get the correctly resolved/queried record statuses.
Then I add a new record via EJB method (TRANSACTION_REQUIERD). After the add method returns I can query the new payment record in the database via SSMS. Its committed and it looks 100% correct and it contains a correct record status code.
Now I run the retrieval method again and I get the 11 records as I would expect. Only the 11th (newly inserted) record will have recordStatusT as null.
When I restart the app all goes well again for the retrieval of all 11 records. But for subsequent additions the outcome seems again 'undefined'.
In JDBC logging I an see that during the original retrieval of the records the record_status_t table was queried but the 2nd time around it was not and I have no explanation why.
I played with FETCHTYPE.EAGER and read up on caching etc but I'm not going anywhere.
Any ideas?
Thanks for your time
Carsten
I solved the problem by ensuring that after inserts/updates the objects arent being queried from the cache.
In the end - rather than doing it with query hint - I disabled caching for the entity involved using the #Chacheable annotation, like so
#Table(name = "PAYMENT_RECORD_T")
#Entity
#Cacheable(false)
public class PaymentRecordT {
...
#ManyToOne
#PrimaryKeyJoinColumn(name = "payment_status_cd", referencedColumnName = "REC_STAT_CD")
private RecordStatusT recordStatusT;
}
I still feel like there should be a better solution. Eclipselink tracks the inserts/updates so it should be able track what needs rereading from the DB and what not. I still feel like I don't fully understand the entire picture, but this works for me and its reasonably clean.
I can leave the considerable amount of read-only data/objects chacheable and the few that are changeable as non-cacheable.
Thanks for reading
Carsten

Best Way to SELECT subset of properties for associated #Entity

I am trying to choose the properties of associated entities that will be loaded.
For example:
#Entity
#Getter #Setter
public class Book {
#Id #GeneratedValue(strategy = GenerationType.AUTO)
private Long id;
#Column(name = "first")
private String first;
#Column(name = "second")
private String second;
#OneToMany(mappedBy = "book", cascade = CascadeType.ALL)
private List<Page> pages = new ArrayList();
}
#Entity
#Getter #Setter
public class Page {
#Id #GeneratedValue(strategy = GenerationType.AUTO)
private Long id;
#Column(name = "number")
private Integer number;
#Column(name = "content")
private String content;
#ManyToOne(fetch = FetchType.LAZY)
#JoinColumn(name = "book_id")
private Book book;
}
I would like to have just a SELECT on the Book's first and the content of all associated Page entities
book
- first
- pages
- content
For example, in PostgreSQL this might look something like
SELECT book.first,
array_agg(page.content || ' ')
FROM book
LEFT JOIN page
ON page.book_id = book.id
GROUP BY book.first
I've done my research on how you could go about doing this, most mentioned solution is to use Spring DATA JPA Projections (ref)
I did the following:
public interface FindAllBookProjection {
String getFirst();
List<PageProjection> getPages();
interface PageProjection {
Integer getNumber();
}
}
The issue with projections is that they cause the famous N+1 select problem as the pages are loaded lazily.
I also couldn't find a way to use projections with #Query in the #Repository for the nested properties.
There are other mentions online to use #EntityGraph. From what I understand #EntityGraph will not work as it also selects all properties, specifying only what Entity associations should be loaded.
There are also suggestions on using other libraries like Blaze Persistence (ref) or Hibernate's ResultTransformer (ref) . I would prefer to use only Spring DATA JPA instead of introducing another library for this functionality and writing ResultTransformers seems like adding a lot of boilerplate code to the mix.
To summarize my question is what is the best way to choose what properties are selected for #Entity associations. The main goal is to avoid pulling unnecessary amount of data from the database. The given example above is for demonstration, the data that I am working on includes over 10 columns and spans across 3-4 entities. Having control over the data means better performance.
I am the creator of Blaze-Persistence and I can tell you that if there were an easy way to do this, I would not have created Entity-Views. You can read some of the articles I wrote about this and you will realize that doing this yourself, like you already figured, will require lots of boilerplate code. Doing this efficiently, will require even more code and in the end, you will probably end up with a solution that is inferior to Blaze-Persistence Entity-Views in one way or another and has probably no documentation.
Spring Data Projections is just limited and as I tried to outline many times before, Blaze-Persistence Entity-Views is like Spring Data Projections on steroids.
If you have "just" one or two simple mappings, you might be able to get this done by introducing special #Immutable #Entity, maybe even with #Subselect in this particular case to model what you need, but believe me, this only works good on a small scale. Apart from that, Blaze-Persistence which works on top of JPA/Hibernate enables the use of a lot of advanced SQL features, which you usually can't use from within plain JPA/Hibernate.
In my opinion, ORM libraries should use whole objects, that means loading all data into the program and then transforming/filtering according to logic.
For specific use-cases, where performance is really hindered, I'd use entities in
the database, such as Views/Procedures.
In your case, i'd create a View:
CREATE VIEW book_content as
SELECT book.first as [first],
array_agg(page.content || ' ') as [content]
FROM book
LEFT JOIN page
ON page.book_id = book.id
GROUP BY book.first
And then create a #Repository and #Entity for it in spring.

Spring Boot Entity DynamicUpdate

I am writing API on Spring Boot have an issue with partial update of entity. When I want to update for example just name of user, spring sees other fields as null and replaces data with nulls in Database. As i read in documentation #DynamicUpdate must fix this issue but it is not working for me.
Here is my user Entity.
#Entity
#Table(name="users")
#DynamicUpdate
public class User {
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
//other fields...
}
Use merge instead
Entity en = sessionFactory.getCurrentSession().get(id);
en.setName("abc");
sessionFactory.getCurrentSession().merge(en);
Performance issue with Entity(
dynamicUpdate = true
)
In a large table with many columns (legacy design) or contains large data volumes, update some unmodified columns are absolutely unnecessary and great impact on the system performance.

Hibernate #Synchronize does not seem to be working

I have 2 Entitites, one maps to a database table, the other to a database view.
The data of the view depends on the table.
#javax.persistence.Table(name = "BOOKING_INFO", schema = "BOOKING")
#Entity
public class BookingInfo extends AbstractBooking {
#javax.persistence.Table(name = "BOOKING_VIEW", schema = "BOOKING")
#Entity
#Immutable
public class BookingView extends AbstractBooking {
This works fine in most cases, however when we write (insert or update) the Booking entity and then do queries (in my case a count) on the BookingView entity, we get stale data.
Why this happens is clear to me (hibernate caching, it only flushes when it detects that a select needs some data flushed).
So if I would do a query on the Booking entity, it would trigger a flush.
I have found the #Synchronize Annotation in Hibernate which sounds like it should fix this problem, like this:
#javax.persistence.Table(name = "BOOKING_VIEW", schema = "BOOKING")
#Entity
#Immutable
#Synchronize("BOOKING.BOOKING_INFO")
public class BookingView extends AbstractBooking {
However this does not make any difference (flush only happens at the end of the transaction). Also the documentation I have found about this annotation is quite lacking and not very helpful.
EDIT: I also tried #Synchronize("BOOKING_INFO") (without the schema name, and also lowercase, but that made no difference)
The docs say that it is mostly used with #Subselect but it is not a must (I don't want that).
Has anyone ever successfully used this annotation?
Is there any other way to handle database views in Hibernate?
Am I missing something else?
Thanks to a colleague we were able to debug and fix this, the problem was that our Hibernate naming-strategy lowercased our table-names, so the correct annotaiton is:
#Synchronize("BOOKING.booking_info")
How to debug this:
set breakpoints in Hibernates ActionQueue class in the areTablesToBeUpdated methods.
There we saw that it compared "BOOKING.BOOKING_VIEW" to "BOOKING.booking_view".
We think this is a bug in hibernate because it should either apply the naming-strategies also to the values from #Synchronize or compare these case-insensitive (which could theoretically lead to too many flushes if you have a crazy database which uses tables with the same name only differentiated by casing).
Created a Hibernate issue: https://hibernate.atlassian.net/browse/HHH-10002

Why does Hibernate re-implement functionality that databases already have?

For example:
#Table(name = "stock", catalog = "mkyong", uniqueConstraints = {
#UniqueConstraint(columnNames = "STOCK_NAME"),
#UniqueConstraint(columnNames = "STOCK_CODE") })
or
#Column(name = "STOCK_NAME", unique = true, nullable = false, length = 20)
Constraints like 'unique', 'nullable', even field length are core database features. Why include this here? Also (although this may hurt some) I'd also wager that a database's implementation of such constraints, particularly mainstream commercial DBs like Oracle, is probably better than whatever the OSS Hibernate devs can come up with.
Is it wise to use this type of stuff in Hibernate, or is it a better practice to put constraints and such in the database? It seems that if you utilize these Hibernate features, you're practically treating the database as a file system, so what's the point? Usage of this is everywhere but I've yet to find the documentation explaining why you'd do this.
It does not implement them - it has the option to validate the data model against the schema, or create it.
The hibernate.hbm2ddl.auto configuration property is the one that allows you to create the schema based on the mappings.
Automatically validates or exports schema DDL to the database when the SessionFactory is created. With create-drop, the database schema will be dropped when the SessionFactory is closed explicitly.
e.g. validate | update | create | create-drop
This is very useful, if you want your data model to be in the central place, rather than the database structure
Hibernate can create a database schema based on those annotations for you.

Categories

Resources