Hibernate - Why shouldn't we use FETCH JOIN with scroll()?

Hibernate - Why shouldn't we use FETCH JOIN with scroll()? - java

I'm reading the Hibernate Documentation (version 5.1), and I'm falling on this sentence :
Fetch joins should not be used in paged queries (e.g. setFirstResult()
or setMaxResults()), nor should they be used with the scroll() or
iterate() features.
(http://docs.jboss.org/hibernate/orm/5.1/userguide/html_single/Hibernate_User_Guide.html#hql-explicit-join)
I've recently work on a project with a big database (2To) and for many batch treatments we doesn't have any other solution than to use FETCH JOIN with the scroll() method (or we have to leave Hibernate and go back to full SQL query).
I understand that paged queries can't garanty that we have a complete result (while fetching a collection for example).
But when using scroll() (which rely on databases cursors if I'm not making a mistake), if we sort the result by the root entites Ids, I can't see a reason why Hibernate can't garantie the result's coherence and completeness.
What is the reason of that restriction ?
What is the risk for the data fetched ?
Is there a way to prevent it ? (like sorting by ids root entities ?)

Related

Can apache ignite key-value API use an index to delete records with a transaction?

Assume a use case where we have 3 caches, called companies, employees and customers. A company record owns multiple employee records and multiple customer records. Employees have their own primary key and a foreign key companyId which references their owner company record. We use ignite to place an index on the Employee companyId field, either using SQL Create Index or the #QuerySqlField(index = true) annotation. The customer cache is setup in exactly the same way as the employee cache.
Let's say we want to delete the company, so we want to run the following SQL statements within a single transaction to ensure the company and its child records are either entirely deleted or entirely not deleted if the transaction fails (so we keep referential integrity).
DELETE FROM companies where id=companyId2Delete;
DELETE FROM employees where companyId=companyId2Delete;
DELETE FROM customers where companyId=companyId2Delete;
We therefore need to do this within a transaction in ignite, as multiline SQL statements don't work in ignite. However for transactions to work with ignite using SQL, the 3 caches need to be defined with atomicity=TRANSACTIONAL_SNAPSHOT (see here and here). The docs also say that TRANSACTIONAL_SNAPSHOT is in beta release and should not be used for production.
If we use the key-value API only, we can use atomicity type TRANSACTIONAL instead, which is fully supported (see here). However the SQL delete statements which delete from employees and customers use the index we created in each cache on companyId, so the delete is efficient. If we delete these records using the key-value API only within a transaction, I don't see any way of using the index on companyId (which appears to only be available to the SQL). Presumably we'd have to scan the entire employees table instead, and same for customers, which would be very slow. We could use an affinity key but that would only ensure that entities within the same company are stored on the same ignite node, it presumably wouldn't use the index properly.
What's the best way to do this?
Is there a way to get the key-value API to use the index on companyId when removing the records? (bearing in mind companyId is not the primary key for employees and customers?)
Within a single transaction - i.e. between tx =ignite.transactions().txStart() and tx.commit() - could I (1) use SQL queries to get all the primary keys in customers and employees with companyId=companyId2Delete (which uses the index), (2) hold these keys temporarily in memory and (3) use IgniteCache.removeAll(keys) to delete these keys? Would this be supported with atomicity TRANSACTIONAL because I'm only using SQL to query and not manipulate data? What happens if another request tries to add an employee to the company at the same time?
Could I use 100% SQL with TRANSACTIONAL_SNAPSHOT and hope it comes out of beta soon?
Edit. It looks like ignite are removing Multiversion Concurrency Control (MVCC), see here and here. MVCC is the functionality that powers TRANSACTIONAL_SNAPSHOT (see here). It therefore looks to me that ignite is going to drop support for transactions with SQL.

Unfortunately now it is not possible to use SQL index from Cache API operations and performing SQL queries inside cache API transaction as well.
Concerning MVCC removing, looks like its true only for 2.x generation. AFAIK in Ignite 3.x this features (MVCC transactions, SQL index sharing with Cache API) should be present.

How to turn off select before SaveAll()?

I need to turn off the select before saveAll() in a spring boot application with Hibernate and Jpa to boost performance with high number of records.
I've found a method with JPQL with good performance (delete + save of 10k records in 30s), but i'd like to stay with hibernate and jpa.
My expectations are that when i run my code written in java, i have to deleteAll the record of a table, then saveAll records from another one. When i do that in classic way (deleteAll(), findAll() and then saveAll()), i got low performance during the saveAll() because it does a select of all records got on the list before saving them.
I'd like to avoid the code to execute all selects before saving the records. Is that possibile without using EntityManager or EntityManagerFactory?

Code two native query: DELETE and INSERT SELECT using the #Query annotation on your repository class.
It's the best way to resolve these issues. If you only has to copy records from a table to another, it's no sense use jpa and loading thousands of objects. Using findAll could throw out of memory errors.

JPA entity graphs and pagination

In my current project we have multiple search pages in the system where we fetch a lot of data from the database to be shown in a large table element in the UI. We're using JPA for data access (our provider is Hibernate). The data for most of the pages is gathered from multiple database tables - around 10 in many cases - including some aggregate data from OneToMany relationships (e.g. "number of associated entities of type X"). In order to improve performance, we're using result set pagination with TypedQuery.setFirstResult() and TypedQuery.setMaxResults() to lazy-load additional rows from the database as the user scrolls the table. As the searches are very dynamic, we're using the JPA CriteriaQuery API to build the queries. However, we're currently somewhat suffering from the N+1 SELECT problem. It's pretty bad in some cases actually, as we might be iterating through 3 levels of nested OneToMany relationships, where on each level the data is lazy-loaded. We can't really declare those collections as eager loaded in the entity mappings, as we're only interested in them in some of our pages. I.e. we might fetch data from the same table in several different pages, but we're showing different data from the table and from different associated tables in different pages.
In order to alleviate this, we started experimenting with JPA entity graphs, and they seem to help a lot with the N+1 SELECT problem. However, when you use entity graphs, Hibernate apparently applies the pagination in-memory. I can somewhat understand why it does that, but this behavior negates a lot (if not all) of the benefits of the entity graphs in many cases. When we didn't use entity graphs, we could load data without applying any WHERE restrictions (i.e. considering the whole table as the result set), no matter how many millions of rows the table had, as only a very limited amount of rows were actually fetched due to the pagination. Now that the pagination is done in-memory, Hibernate basically fetches the whole database table (plus all relationships defined in the entity graph), and then applies the pagination in-memory, throwing the rest of the rows away. Not good.
So the question is, is there an efficient way to apply both pagination and entity graphs with JPA (Hibernate)? If JPA does not offer a solution to this, Hibernate-specific extensions are also acceptable. If that's not possible either, what are the other alternatives? Using database Views? Views would be a bit cumbersome, as we support several database vendors. Creating all of the necessary views for different vendors would increase development effort quite a bit.
Another idea I've had would be to apply both the entity graphs and pagination as we currently do, and simply not trigger any queries if they would return too many rows. I already need to do COUNT queries to get the lazy-loading of rows to work properly in the UI.

I'm not sure I fully understand your problem but we faced something similar: We have paged lists of entities that may contain data from multiple joined entities. Those lists might be sorted and filtered (some of those sorts/filters have to be applied in memory due missing capabilities in the dbms but that's just a side note) and the paging should be applied afterwards.
Keeping all that data in memory doesn't work well so we took the following approach (there might be better/more standard ones):
Use a query to load the primary keys (simple longs in our case) of the main entities. Join only what is needed for sorting and filtering to make the query as simple as possible.
In our case the query would actually load more data to apply sorts and filters in memory where necessary but that data is released asap and only the primary keys are kept.
When displaying a specific page we extract the corresponding primary keys for a page and use a second query to load everything that is to be displayed on that page. This second query might contain more joins and thus be more complex and slower than the one in step 1 but since we only load data for that page the actual burden on the system is quite low.

Named Query Or Native Query or Query Which one is better in performance point of view?

Which one is better among following(EJB 3 JPA)
//Query
a). getEntityManager().createQuery("select o from User o");
//Named Query where findAllUser is defined at Entity level
b). getEntityManager().createNamedQuery("User.findAllUser");**
//Native Query
c). getEntityManager().createNativeQuery("SELECT * FROM TBLMUSER ");
Please explain me which approach is better in which case?.

createQuery()
It should be used for dynamic query creation.
//Example dynamic query
StringBuilder builder = new StringBuilder("select e from Employee e");
if (empName != null) {
builder.append(" where e.name = ?");
}
getEntityManager().createQuery(builder.toString());
createNamedQuery()
It is like a constant variable which can be reused by name. You should use it in common database calls, such as "find all users", "find by id", etc.
createNativeQuery()
This creates a query that depends completely on the underlying database's SQL scripting language support. It is useful when a complex query is required and the JPQL syntax does not support it.
However, it can impact your application and require more work, if the underlying database is changed from one to another. An example case would be, if your development environment is in MySQL, and your production environment is using Oracle. Plus, the returned result binding can be complex if there is more than a single result.

For me, the better is obviously the first two one, that is JPQL Queries - the second meaning the entity manager will compile the queries (and validate them) while loading the persistence unit, while the first would only yield errors at execution time.
You can also get support in some IDE, and it support the object notation (eg: select b from EntityA a left join a.entityB b) and some other oddities introduced by the object-relational mapping (like collections, index, etc).
On the other hand, use Native queries in last resort in corner case of JPQL (like window function, such as select id, partition by (group_id) from table)

Native SQL is not necessarily faster than Hibernate/JPA Query. Hibernate/JPA Query finally also is translated into SQL. In some cases it can happen Hibernate/JPA does not generate the most efficient statements, so then native SQL can be faster - but with native SQL your application loses the portability from one database to another, so normally is better to tune the Hibernate/JPA Query mapping and the HQL statement to generate more efficient SQL statements. On the other side with native SQL you're missing the Hibernate cache - as a consequence in some cases native SQL can be slower than Hibernate/JPA Query.
I am not with performance, in most cases for the performance it is irrelevant if your load all columns or only the needed columns. In database access the time is lost when searching the row, and not when transferring the data into your application. When you read only the necessary columns.

Simple Answer:
1) createQuery() - When you want your queries to be executed at runtime.
2) createNamedQuery() - When you want to send common database calls like findBy<attribute>, findAll,..
3)createNativeQuery() - Used when you want your queries to be database vendor-specific. This brings a challenge of portability.

Named queries are the same as queries. They are named only to let them be reusable + they can be declared in various places eg. in class mappings, conf files etc. (so you can change query without changing actual code)
Native queries are just native queries. You have to do all the things that JPA Queries do for you eg. Binding and quoting values etc. + they use DBMP independent syntax (JPQL in your case) so changing database system (lets say from MySQL to Postgresql or H2) will require less work as it does not (not always) require to rewrite native queries.

Named Query:
All the required queries are written in one place related to that entity and they are differentiated by name and we can use them based on the name, no need to write entiry query each time just use the name of the query
For example:
#NamedQuery(name="User_detailsbyId",Query="from UserDetails where UserId=:UserId)

IBM DB2 9.7 Error code 840, select statement list is too large

We are using Hibernate with IBM DB2 9.7. The database gives error about Hibernate generated too large select statement list (including a lot of joins). The error code is 840. Can something be done to fix this? I know the generated select list is very long, but can Hibernate be set to split it into parts or something?
Edit: I reopened this since the problem seems to be a bit larger. So there is a JIRA issue (now rejected) at https://hibernate.onjira.com/browse/ANN-140.
So the problem is that with Hibernate Annotations, it is not possible to add discriminator with Join strategy. XML configuration however does support this.
Pavel nicely states the problem in the above link discussion like this:
"It would be nice to see how the problem with the multiple joins is faced when the
underlying DB has restriction on the number of joins one can execute in a single SQL?
For instance MySQL seems to allow only 31 joins. What happens if the class hierarchy
has more than 31 sub-classes?"
An the above is the very problem I am having. We are using annotations and the subclasses are quite a few, creating massive amounts of joins, breaking the DB2 statement.
Any comments on this? I could not find a direct solution either.

Hibernate has few fetching strategies to optimize the Hibernate generated select statement, so that it can be as efficient as possible. The fetching strategy is declared in the mapping relationship to define how Hibernate fetch its related collections and entities.
Fetching Strategies
There are four fetching strategies
fetch-”join” = Disable the lazy loading, always load all the collections and entities.
fetch-”select” (default) = Lazy load all the collections and entities.
batch-size=”N” = Fetching up to ‘N’ collections or entities, Not record.
fetch-”subselect” = Group its collection into a sub select statement.
For detail explanation, you can check on the Hibernate documentation.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.