Optimize JPA performance on inserting many entities with embedded primary key - java

I naively implemented a web service, that consumes a list of Json objects and stores them in a SQL database using
springframework.data.jpa (JPA & Hibernate). However, the solution has a low performance and the profiler gave me a hint that the main problem lies in creating the entities from the Json objects one by one.
The code below is simplified, but basically: for each Json object in the incoming list there are two entities created: DataEntity and IdentityEntity. The former holds the data of interest and uses the latter as FK, which has a compound PK of a time and a person.
I'd like to speed up the storage process. I've determined with a profiler, that there are too many flush operations that are being done after insert of each new entity. Since I need to insert thousands of records at a given time, this causes the performance issue. Can I perhaps do the insert in one transaction or what are other ways of optimizing it?
Data class (I have many similar classes):
#Entity
public class DataEntity {
#EmbeddedId
private IdentityEntity identity;
private Double data;
}
Embeddable entity:
#Embeddable
public class IdentityEntity implements Serializable {
#NonNull
private Long personId;
#NonNull
private Long datetimeId;
}
JPA repository:
#Repository
public interface DataRepository extends JpaRepository<DataEntity, IdentityEntity> {}
Simplified controller:
public class DataController{
#Autowired
private DataRepository dataRepository;
#Autowired
private DatetimeRepository datetimeRepository;
#PostMapping("/upload")
public void upload(...List<DataJson> items) {
PersonEntity person = getPerson(...); // fast enough
for (DataJson i : items) { // begin transaction here?
saveNewEntity(i, person.getId());
}
}
private void saveNewEntity(DataJson json, Long personId) {
TimeEntity savedDatetime = datetimeRepository.save(new TimeEntity(json.getDatetime()));
IdentityEntity mi = IdentityEntity(personId, savedDatetime.getId());
DataEntity entry = new DataEntity(mi, json.getData());
dataRepository.save(entry);
}
}
Edit: After further digging into the profiler, I've discovered that another time-consuming operation might be the transaction management itself. Although I haven't implemented or configured any transaction behavior, I suspect that the Spring Boot has something default configured for the Hibernate ORM. I'm beginning to think that a transaction is now being created in every iteration of the loop, being the 1st performance issue and also causing the 2nd issue, where at the end of the transaction everything is flushed and written into the DB.

Yep. All the methods in the SimpleJpaRepository are annotated with #Transactional.
Simply add a #Transactional annotation to your upload method.
... or
First create all of the objects and save them in one go using the save(Iterable<S> entities) method.

Related

Persisting nested or related objects for testing created with builder pattern and JPA/Hibernate

Let's take the following classes which are a simplification of more complex classes and their relationships.
#Data
#Builder
public class UserAccount {
private String username;
private String password;
private Language contactLanguage;
public static UserAccount.UserAccountBuilder defaultUserAccount() {
return UserAccount.builder()
.username("default_username")
.password("default_password")
.contactLanguage(defaultLanguage().build());
}
}
#Data
#Builder
public class Language {
private String name;
private String iso2;
private String iso3;
public static Language.LanguageBuilder defaultLanguage() {
return Language.builder()
.name("default_language_name")
.iso2("default_iso2")
.iso3("default_iso3");
}
}
Using Lombok's #Builder annotation, I can easily construct an object like this, especially for testing:
UserAccount.builder()
.username("foo")
.password("bar")
.contactLanguage(Language.builder()
.name("English")
.iso2("EN")
.iso3("ENG")
.build())
.build();
// Or even like this...
defaultUserAccount().build();
This works fine for unit tests or any tests where such generated objects are only required to exist in memory.
However I'd also like to use this approach for integration tests with an underlying database (using Spring Boot 2.4 + JPA + Hibernate). And this is where some issues come up I couldn't solve so far. Let's have a look:
Each UserAccount needs to have a contactLanguage, but Language lives on its own. Other entities might use it as well. When constructing a user account with defaultUserAccount().build(), then persisting this entity fails because the Language object has not been persisted yet. There is no persist cascade on contactLanguage because I don't want "any" Language being created upon creating a UserAccount.
My only idea would be to use defaultLanguage().build() and persist this before defaultUserAccount().build(). But I feel that this will become complex and flaky as soon as there are more levels of nested builders or relationship to other entites.
Another thing is: Even if I managed to persist the defaultLanguge, I would run into a collision as soon as another test calls defaultUserAccount().build() because then the langauge already exists and cannot be inserted again.
Are there any patterns or approaches for persisting such test data objects?
Update #1
After more searching, I found this question on SO which looks almost identical.

Returning Entity from Service method is a bad practice?

I've heard when you want to return some object from a service method, you have to define a DTO object (or POJO object generated with JSON Schema) instead of using an Entity.
To make it clear, here is the example:
We have an entity and a jpa repository for it:
#Data
#Entity
#Table(name = "tables")
public class Table {
#Id
private Long id;
private String brand;
}
This is a bad practice:
#Service
public class MyService {
#Autowired
private TableRepository tableRepository;
#Transactional
public Table create() {
Table table = new Table();
// Some logic for creating and saving table
return table;
}
}
This is a good practice:
#Service
public class MyService {
#Autowired
private TableRepository tableRepository;
#Transactional
public TableDTO create() {
Table table = new Table();
// Some logic for creating and saving table
// Logic for converting Table object to TableDTO object
return tableDTO;
}
}
Why is this so?
Thank you!
Probably you mean a DTO (Data Transfer Object), not DAO (Data Access Object). Let me clarify this:
Data Transfer Object:
A Pojo that represents a piece of information. Usually it has aggregated data in it.
Data Access Object:
An object that performs access to some kind of persistence storage for retrieving information, someone considers it a synonim of Repository, someone not.
Entity:
An object that represents data that has been retrieved from the database.
Why is returning an Entity from the Service considered a bad practice?
The reason is that the Entity is something that is very close to the database. It contains primary key, someone could guess your database structure from it and the set of the data in case of query can be verbose. Hence, it is preferable to have some kind of logic, usually a mapper, that hides primary key and aggregates data to be less verbose and to not expose the db structure. Also, while the Entity is built on the table structure, the DTO can be customized in base of caller needs. Usually it contains exactly the data that is needed for some action and nothing more than this. Suppose you have thirdy party software that calls your backend services: you should not expose the db structure (Entities) to this service. It is better to define a contract, with the minimal information needed for this thirdy party service to operate, and expose only this part of the information, hiding all the rest.
Hope that's a little bit more clear now.
Edit:
Of course there are other good reasons for using DTOs instead of Entities, this is only an introductory explanation to the subject.

Getting data from View using CrudRepository

For this question,I am not looking for a solution, but looking for a direction from where I can take myself ahead, hence not sharing any code.
I am preparing a REST API and I have postgresql database setup locally, which has 2 tables and one view from those 2 tables.
Normally when I want to get any data from DB, I use following code(for the sake of clarity):
DataRepository class:
public interface DataRepository extends CrudRepository<Data, String>{}
DataService class:
#Service
public class DataService {
#Autowired
private DataRepository repo;
public Data getData(String id){
return repo.findById(id).orElse(null);
}
}
DataController class:
#RestController
public class DataController{
#Autowired
private DataService service;
#RequestMapping("/{id}")
public Data getData(String id){
return service.getData(id);
}
}
Data class:
#Entity
public class Data{
#Id
private String id;
private String name;
//respective getter and setter methods
}
Now I want to retrieve data from a view, so, what should be the approach for that?
Should we use the same approach of creating Model, Service, Ctonroller and Repository classes?
Can we use CrudRepository to achieve the same?
I searched in a lot of places, but didn't find anything useful.
Let me know if anyone has any clue on this.
The reading methods of a CrudRepository should work fine with a view. For the writing methods, the view needs to be updatableenter link description here.
If you only want to read, but not to write to the repository, you can create a ReadOnlyRepository by copying the source code of the CrudRepository and removing all the writing methods.
Note that JPA will still try to persist changes made to managed entities.
To avoid that and also avoid the cost of dirty checking you can mark your entities as immutable if you are using Hibernate.

Wicket - Serialization of persisted and non-persisted JPA entities

I know that when using Wicket with JPA frameworks it is not advisable to serialize entities that have already been persisted to the database (because of problems with lazy fields and to save space). In such cases we are supposed to use LoadableDetachableModel. But what about the following use-case?
Suppose we want to create a new entity (say, a Contract) which will consist, among other things, of persisted entities (say, a Client which is selected from a list of clients stored in the DB). The entity under creation is a model object of some Wicket component (say, a Wizard). In the end (when we finish our wizard) we save the new entity to the DB. So my question is: what is the best generic solution to the serialization problem of such model objects? We can't use LDM because the entity is not in the DB yet but we don't want our inner entities (like Client) to be serialized wholly, too.
My idea was to implement a custom wicket serializer that checks if the object is an entity and if it is persisted. If so, store only its id, otherwise use the default serialization. Similarly, when deserializing use the stored id and get the entity from the DB or deserialize using the default mechanism. Not sure, though, how to do that in a generic way. My next thought was that if we can do it, then we do not need any LDM anymore, we can just store all our entities in simple org.apache.wicket.model.Model models and our serialization logic will take care of them, right?
Here's some code:
#Entity
Client {
String clientName;
#ManyToOne(fetch = FetchType.LAZY)
ClientGroup group;
}
#Entity
Contract {
Date date;
#ManyToOne(fetch = FetchType.LAZY)
Client client;
}
ContractWizard extends Wizard {
ContractWizard(String markupId, IModel<Contract> model) {
super(markupId);
setDefaultModel(model);
}
}
Contract contract = DAO.createEntity(Contract.class);
ContractWizard wizard = new ContractWizard("wizard", ?);
How to pass the contract? If we just say Model.of(contract) the whole contract will be serialized along with inner client (and it can be big), moreover if we access contract.client.group after deserialization we can bump into the problem: https://en.wikibooks.org/wiki/Java_Persistence/Relationships#Serialization.2C_and_Detaching
So I wonder how people go about solving such issues, I'm sure it's a fairly common problem.
I guess there are 2 approaches to your problem:
a.) Only save the stuff the user actually sees in Models. In your example that might be "contractStartDate", "contractEndDate", List of clientIds. That's the main approach if you don't want your DatabaseObjects in your view.
b.) Write your own LoadableDetachableModel and make sure you only serialize transient objects. For example like: (assuming that any negative id is not saved to the database)
public class MyLoadableDetachableModel extends LoadableDetachableModel {
private Object myObject;
private Integer id;
public MyLoadableDetachableModel(Object myObject) {
this.myObject = myObject;
this.id = myObject.getId();
}
#Override
protected Object load() {
if (id < 0) {
return myObject;
}
return myObjectDao.getMyObjectById(id);
}
#Override
protected void onDetach() {
super.onDetach();
id = myObject.getId();
if (id >= 0) {
myObject = null;
}
}
}
The downfall of this is that you'll have to make your DatabaseObjects Serializable which is not really ideal and can lead to all kind of problems. You would also need to decouple the references to other entities from the transient object by using a ListModel.
Having worked with both approaches I personally prefer the first. From my expierence the whole injecting dao objects into wicket can lead to disaster. :) I would only use this in view-only projects that aren't too big.
Most projects I know of just accept serializing referenced entities (e.g. your Clients) along with the edited entity (Contract).
Using conversations (keeping a Hibernate/JPA session open over several requests) is a nice alternative for applications with complex entity relations:
The Hibernate session and its entities is kept separate from the page and is never serialized. The component just keeps an identifier to fetch its conversation.

Safe embedded entity with objectify

I have two entities.
#Entity
public class Recipe {
#Id
private Long id;
private List<Step> steps;
}
#Entity
public class Step {
#Id
private Long id;
private String instruction;
}
And the following Clound Endpoint
#ApiMethod(
name = "insert",
path = "recipe",
httpMethod = ApiMethod.HttpMethod.POST)
public Recipe insert(Recipe recipe) {
ofy().save().entities(recipe.getSteps()).now(); //superfluous?
ofy().save().entity(recipe).now();
logger.info("Created Recipe with ID: " + recipe.getId());
return ofy().load().entity(recipe).now();
}
I'm wondering how do I skip the step where I have to save the emebedded entity first. The Id of neither entity is set. I want objectify to automatically create those. But if don't save the embedded entity I get an exception.
com.googlecode.objectify.SaveException: Error saving com.devmoon.meadule.backend.entities.Recipe#59e4ff19: You cannot create a Key for an object with a null #Id. Object was com.devmoon.meadule.backend.entities.Step#589a3afb
Since my object structure will get a lot more complex, I need to find a way to skip this manual step.
I presume you are trying to create real embedded objects, not separate objects stored in the datastore and linked. Your extra save() is actually saving separate entities. You don't want that.
You have two options:
Don't give your embedded object an id. Don't give it #Entity and don't give it an id field (or at least eliminate #Id). It's just a POJO. 90% of the time, this is what people want with embedded objects.
Allocate the id yourself with the allocator, typically in your (non-default) constructor.
Assuming you want a true embedded entity with a real key, #2 is probably what you should use. Keep in mind that this key is somewhat whimsical since you can't actually load it; only the container object can be looked up in the datastore.
I suggest going one step further and never use automatic id generation for any entities ever. Always use the allocator in the (non-default) constructor of your entities. This ensures that entities always have a valid, stable id. If you always allocate the id before a transaction start, it fixes duplicate entities that can be created when a transaction gets retried. Populating null ids is just a bad idea all around and really should not have been added to GAE.
The concept of the embedded is that the embedded content is persisted inside the main entity.
Is this the behaviour you are trying to configure?
The default behaviour of a Collection (List) of #Entity annoted class is to refer them instead of embed them. As you current configuration, the List<Step> variable does not have any annotation to override the default configuration, which is a different entity related to another one.
The error you are getting is because Objectify, when it saves the recipe entity, is trying to get the key of each step to create the relationship (and save them in the recipe entity), but if the entity step is not saved yet on the datastore, does not have a key
If you are trying to persist the steps inside the recipe entity, you need to setup objectify like this
#Entity
public class Recipe {
#Id
private Long id;
private List<Step> steps;
}
public class Step {
private Long id;
private String instruction;
}
As you can see, I removed the #Id annotation (an embedded Entity does not require an ID because is inside another entity) and the #Entity from the Step class. With this configuration, Objectify save the step entities inside the recipe entity
Source: https://code.google.com/p/objectify-appengine/wiki/Entities#Embedded_Object_Native_Representation

Categories

Resources