Why does Hibernate #OneToOne execute multiple select queries instead of one? - java

Here are my entity:
#Entity
public class ProductStateEntity {
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
#Column(name = "id", nullable = false)
private Integer id;
#OneToOne
#JoinColumn(name = "product_id", nullable = false)
private ProductEntity product;
// other fields
}
#Entity
public class ProductEntity {
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
#Column(name = "id", nullable = false)
private Integer id;
// other fields
}
If I make a request this way:
session.get(ProductStateEntity.class, 10);
SQL is formed this way:
SELECT product_states.id, product_states.product_id, products.id, -- other columns
FROM product_states
INNER JOIN products ON product_states.product_id=products.id
WHERE product_states.id=10
So far, so good, using INNER JOIN.
If you make a request in this way:
session.createQuery("from ProductStateEntity where id = :id")
.setParameter("id", 10)
.list()
SQL is formed this way:
SELECT product_states.id, product_states.product_id, -- other columns
FROM product_states
WHERE product_states.id=10;
SELECT products.id, -- other columns
FROM products
WHERE products.id=10
In this case, 2 requests are made. First a query is made in product_states, and then in products.
That's not all, now we will make such a request, which receives 4 records of 4 id at once:
session.createQuery("from ProductStateEntity where id in :ids")
.setParameter("ids", Arrays.asList(10, 11, 12, 13))
.list();
SQL is formed this way:
SELECT product_states.id, product_states.product_id, -- other columns
FROM product_states
WHERE product_states.id IN (10, 11, 12, 13);
SELECT products.id, -- other columns
FROM products
WHERE products.id=10;
SELECT products.id, -- other columns
FROM products
WHERE products.id=11;
SELECT products.id, -- other columns
FROM products
WHERE products.id=12;
SELECT products.id, -- other columns
FROM products
WHERE products.id=13;
In this case, 5 requests are made. First, a request is made in the product_states, the id of all products is obtained, and then it is done on 1 request to receive each of 4 products.
Add join fetch to the previous query:
session.createQuery("from ProductStateEntity p join fetch p.product where p.id in :ids")
.setParameter("ids", Arrays.asList(10, 11, 12, 13))
.list();
SQL is formed this way:
SELECT product_states.id, products.id, product_states.product_id, -- other columns
FROM product_states
INNER JOIN products ON product_states.product_id=products.id
WHERE product_states.id IN (10, 11, 12, 13)
Thus, only 1 request is made with INNER JOIN, which is what I want to achieve.
And so the questions are:
Why do you need to specify join fetch explicitly in createQuery? Could this default behavior be made? After all, a single query with join is better than a lot.
Why, without specifying join fetch, additional select queries are not combined into one with id in (...)? Instead, Hibernate makes selects one at a time. Can this be customized?

n+1 fetching strategy is default one for Hibernate - just because, as stated in documentation
These defaults make sense for most associations in the majority of applications
to change this behaviour globally you can set hibernate.default_batch_fetch_size, and you will find some topics in the internet about how to set proper value and why
And one more thing - it's common opinion that fetch join is a solution for every problem but it's not. We must remember about Cartesian Product Problem.
The fetching strategy depends on how our application works, what's the environment setup (e.g. latency in DB connections), what data model do we use and many many other things. There is no one good solution for everyone, that's why we have many fetching strategies in Hibernate

Related

Spring JPA - Hibernate: Batch insert execute too much select nextval (‘sequence’)

Now I'm trying to enhance performance of my web application, I use spring JPA 2.3.0- Hibernate 5.4.15.Final, Postgres 12 and manage transaction by #Transaction. The web app is deployed on aws beanstalk, run multiple instances at the same time, but the database instance is not scalable. And I use bigSerial type for ID of tables.
For instance, I have a STUDENTS table, ID is bigSerial and some other columns.
I got a problems when using
#GeneratedValue(strategy = GenerationType.IDENTITY)
,
Hibernate couldn't batch insert when saving a list of entities.
And I try to use
#GeneratedValue(strategy = GenerationType.AUTO, generator = "students_id_seq")
#SequenceGenerator(name = "students_id_seq", sequenceName = "students_id_seq")
hibernate.id.new_generator_mappings=false
hibernate.jdbc.batch_size=10
hibernate.order_inserts=true
hibernate.order_updates=true
hibernate.batch_versioned_data=true
It seem Hibernate could batch insert, but the problem is Hibernate execute select nextval ('students_id_seq') multiple times. If an entity list has 30 items, Hibernate execute select nextval 30 times, and 3 times for batch insert query.
Some statistics:
If using GenerationType.IDENTITY
save(entity):
insert into ... : execute once
saveAll(n entities)
insert into ... : execute n times
If using GenerationType.SEQUENCE/ GenerationType.AUTO
save(entity):
select nextval ('students_id_seq'): execute once
insert into ... : execute once
saveAll(n entities):
select nextval ('students_id_seq'): execute n times
insert into ... : execute n/batch_size times
In conclusion, If using GenerationType.AUTO or GenerationType.SEQUENCE with allocationSize = 1:
when insert one entity, application increases 100% times to execute
queries ( from one insert query only increase to 2 queries: select
nextval, and insert query )
when batch insert, application increase more than 10% if batch_size = 10
My question is, is there anyway to batch insert but not execute to many select nextval query ? Something likes GenerationType.IDENTITY, not execute select nextval, just batch insert and IDs will be handled by sequence in the database.
When I test with GenerationType.SEQUENCE and allocationSize=1 (GenerationType.AUTO), the application executes too much select nextval query, I think It is even worse than the IDENTITY strategy.
And for some reasons, I don't want to use allocationSize, it may lead to duplicate primary key error when run insert query manual or when migrate data or some other cases.
After some research, I found a way to get a value list of a sequence:
select nextval ('students_id_seq') from generate_series(1,10);
We can replace 10 by entityList.size() or number of entities doesn't have ID in the entityList when batch insert, just get enough to use, don't create too much gap between IDs, but I'm not sure whether or not Hibernate supported, if yes, please share me the documentation to reference.
Thank you
https://discourse.hibernate.org/t/batch-insert-execute-too-much-select-nextval-sequence/4232
What you are looking for is the HiLo algorithm for id generation.
For every id generated from a sequence it generates a number of ids on the client without accessing the database.
You configure it on your entity as this:
#GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "hilo_sequence_generator")
#GenericGenerator(
name = "hilo_sequence_generator",
strategy = "org.hibernate.id.enhanced.SequenceStyleGenerator",
parameters = {
#Parameter(name = "sequence_name", value = "hilo_seqeunce"),
#Parameter(name = "initial_value", value = "1"),
#Parameter(name = "increment_size", value = "3"),
#Parameter(name = "optimizer", value = "hilo")
})
#Id
private Long id;
I would say that I have some experience of that point. I was doing insertion for more than 128,000 records.
And my target was to enhance the time-consuming to do that. I would try to summarize the case as below :
The code did not use any persist() or save() methods. These records were saved when the #Transactionl method exits
I was using hibernate batch inserts below are the settings in the config map
spring.jpa.properties.hibernate.jdbc.batch_size: "40"
spring.jpa.properties.hibernate.order_inserts: "true"
spring.jpa.properties.hibernate.order_updates: "true"
spring.main.allow-bean-definition-overriding: "true"
I modified the allocation size in my entity Id configuration as below:
#Id
#GeneratedValue(strategy = GenerationType.SEQUENCE, generator ="mappingentry_sql_generator")
#SequenceGenerator(name = "mappingentry_sql_generator", sequenceName ="mappingentry_id_seq", allocationSize = 40)
private Long id;
Notes: I set the "allocationSize" in the sequence generator to be equal to the "batch_size" value in the setting
Also, I altered my sequence "mappingentry_id_seq" to increment by 40 as well
After doing these changes the time was reduced from 55 seconds to 20 seconds which was a great impact
The only thing that I did not understand is that when I checked the value of the id column that was generated by the sequence I did not find any value gap. Every ID value exceeded the previous by 1, not 40. That is what I am currently trying to understand

How to filter child collection in JPQL query?

I've the following DB model:
Category -< ProductCategory >- Product -< Variant
(Category has many-to-many relationship with Product and Product has one-to-many relationship with Variant)
Now I need to get all Category records that have product with active variants. I'm getting these objects via the following JPQL query:
#Query("select distinct c from Category c join c.products as p join p.variants as pv where pv.active = true")
It works well - returns categories accurately - however every single Category contains all the products - not only these with active variants.
How can I filter out the products (or variants) that are inactive in a single query?
Here's a postgres script that with database struct and sample data. For given data two categories (CAT 1, CAT 2), two products (PROD 1, PROD 2) and three variants (VAR 1, VAR 2, VAR 3) should be returned.
I had exactly the same problem and it took me a while to find out how this works. The child list should be filtered when you add FETCH after your JOIN like this:
SELECT DISTINCT c FROM Category c JOIN FETCH c.products as p join p.variants as pv where pv.active = true
I have the same problem on it, and I found that the FetchType in the #OneToMany annotation is important. It need to be set as Lazy mode. If it is Eager mode, jpa will create the sql query to fetch the data from the table for your child collection and won't filter it for you.

Hibernate: multiple selects #OneToMany

I have three entity:
Users(id, username, password...); BoughtItems(id, userid, itemcat, itemid); ItemCustomizations(id(id of item), slot(id of color), color)
When i fetch like this in BoughtItems.java:
#OneToMany(fetch = FetchType.EAGER, mappedBy = "id")
#OrderBy("slot")
private List<ItemCustomization> customColors = new ArrayList<>();
Hibernate asks database for each item:
Hibernate: select customcolo0_.itemid as itemid1_2_0_, customcolo0_.itemid
as itemid1_3_0_, customcolo0_.slot as slot2_3_0_, customcolo0_.itemid as
itemid1_3_1_, customcolo0_.slot as slot2_3_1_, customcolo0_.color as
color3_3_1_ from shop_customizations customcolo0_ where
customcolo0_.itemid=? order by customcolo0_.slot
My question
How to proper optimize my mappings to reduce mysql server performance hit?
Can I somehow optimize query/item to 1 query?
I believe you are looking for batch size feature of hibernate. See example here.
Batch size defines how many collections should be loaded. For example if you have 10 BoughtItems in session with batch size =5 , only two sql queries will be fired to fetch customColors collections for all possible BoughtItems
See Possible fetching strategies in hibernate

Java JPQL, join table, many to many relation

We are working on a project in school and we trying to use entity JPA.
We have several tables in our Mysql database and the one we are using is taxonomy, term, campaign and term_campaign_relationship.
We have made an entity for campaign, term, and taxonomy. We would like to get all the terms that are related to a campaign and filtered by taxonomy id.
We have used the #ManyToMany JPQL with #JoinTable and joinColumns
With that we have received all the terms that are related with the campaign.
But how do we get the result filtered by the taxonomy id only?
The sql statement below shows the result we are looking for...
SELECT
t.id, t.term_name, t.taxonomy_id
FROM
term t
INNER JOIN
taxonomy ty ON ty.id = t.taxonomy_id
INNER JOIN
campaign c ON c.id IN (SELECT
tc.campaign_id
FROM term_campaign_relationship tc
WHERE tc.term_id = t.id
)
WHERE c.id = 1 AND ty.id = 1;
This is from our campaign entity in java
#ManyToMany()
#JoinTable(
name="term_campaign_relationship",
joinColumns={#JoinColumn(name="campaign_id", referencedColumnName="id")},
inverseJoinColumns={#JoinColumn(name="term_id", referencedColumnName="id")})
private Collection<Term> programTypes;
Please advise if we are missing anything...
Is this the solution you looked for?
Query query = entityManager.createQuery("Select tax from taxonomy tax where tax.id = :arg1");
query.setParameter("arg1", 1);

Hibernate JPA: one table has one-to-many with more than 10 tables, and cause performance

we have now, one basic table, T_RESOURCE, which contains the basic resource information, and some other tables which has relationship with T_RESOURCE, like T_RESOURCE_TAGS, which contains tags on resource in T_RESOURCE, a resource could have more than one tags, this is one kind of this table; there is another kind of this relationship, connected with a middle class, like T_FILES is connected with T_RESOURCE with T_RESOURCE_FILES, indeed T_RESOURCE_TAGS is also a middle table, but no table T_TAGS just like T_FILES. We totally have 11 relationships with these 2 kinds formats(or forms).
and we have a entity class for T_RESOURCE, in which all the related tables are connected with something like below:
#Field(index=Index.YES, analyze=Analyze.NO, store=Store.NO)
#IndexedEmbedded(indexNullAs = Field.DEFAULT_NULL_TOKEN)
#LazyCollection(LazyCollectionOption.FALSE)
#OneToMany
#JoinTable(name="T_RESOURCE_FILES",joinColumns={ #JoinColumn(name="RESOURCE_ID", referencedColumnName="ID") },inverseJoinColumns={ #JoinColumn(name="FILE_ID", referencedColumnName="ID", unique=true) })
private List<TFILE> files;
#IndexedEmbedded(indexNullAs = Field.DEFAULT_NULL_TOKEN)
#LazyCollection(LazyCollectionOption.FALSE)
#ElementCollection
#CollectionTable(name = "T_RESOURCE_TAG", joinColumns = #JoinColumn(name = "RESOURCE_ID"))
#Column(name = "tag")
private List<String> tags;
and we will get all the resource information in T_RESOURCE with all the resource ids(RESOURCE_ID) that fit for the user's search criteria. we got the results using a native query:
em.createNativeQuery(sql.toString(),TRESOUCE.class).setFirstResult(firstResult).setMaxResults(pageSize).getResultList();
the sql is something like
"select * from T_RESOURCE where id in ('1','2','3','4') order by ..."
It works in this way, but it cost much time to get tags from T_RESOURCE_TAGS, from T_RESOURCE_FILES, from the other 9 relationships, it works in way:
select * from T_RESOURCE_TAGS where asset_id = '1';
select * from T_RESOURCE_TAGS where asset_id = '2';
select * from T_RESOURCE_TAGS where asset_id = '3';
select * from T_RESOURCE_TAGS where asset_id = '4';
and in this way, we will find that more time cost if more resources are fit for the search criteria.
I also tried with join fetch, but seems there is limit on join fetch numbers, not work.
I also want to seperate these relationships from TRESOUCE class, and try to get them just like get TRESOURCE basic information. for information, for the middle table, there is no unique id for each record, only resouce_id with related id/or directly name for files, tags, etc.
Could you please give your ideas, thanks very much.
Some DB like Postgres has very useful array type. You can store tags without additional table inside one column. Hibernate supports mapping custom data types like arrays to Java objects. It can significantly improve performance.
But Oracle array is different case - you have to specify array length when creating table.
You an improve performance by using the annotation org.hibernate.annotations.BatchSize on your collection.
#Column(name = "tag")
#BatchSize(size=20)
private List<String> tags;
will cause it to execute the following statement when loading the object:
select * from T_RESOURCE_TAGS where asset_id IN ('1', '2', '3', '4');
This will save several round trips to the database and speed up your application.

Categories

Resources