Loading multiple entities by id efficiently in Hibernate - java

So, I'm getting a number of instances of a particular entity by id:
for(Integer songId:songGroup.getSongIds()) {
session = HibernateUtil.getSession();
Song song = (Song) session.get(Song.class,id);
processSong(song);
}
This generates a SQL query for each id, so it occurred to me that I should do this in one, but I couldn't find a way to get multiple entities in one call except by running a query. So I wrote a query
return (List) session.createCriteria(Song.class)
.add(Restrictions.in("id",ids)).list();
But, if I enable 2nd level caching doesn't that mean that my old method would be able to return the objects from the 2nd level cache (if they had been requested before) but my query would always go to the database.
What the correct way to do this?

What you're asking to do here is for Hibernate to do special case handling for your Criteria, which is kind of a lot to ask.
You'll have to do it yourself, but it's not hard. Using SessionFactory.getCache(), you can get a reference to the actual storage for cached objects. Do something like the following:
for (Long id : allRequiredIds) {
if (!sessionFactory.getCache().containsEntity(Song.class, id)) {
idsToQueryDatabaseFor.add(id)
} else {
songs.add(session.get(Song.class, id));
}
}
List<Song> fetchedSongs = session.createCriteria(Song.class).add(Restrictions.in("id",idsToQueryDatabaseFor).list();
songs.addAll(fetchedSongs);
Then the Songs from the cache get retrieved from there, and the ones that are not get pulled with a single select.

If you know that the IDs exist, you can use load(..) to create a proxy without actually hitting the DB:
Return the persistent instance of the given entity class with the given identifier, obtaining the specified lock mode, assuming the instance exists.
List<Song> list = new ArrayList<>(ids.size());
for (Integer id : ids)
list.add(session.load(Song.class, id, LockOptions.NONE));
Once you access a non-identifier accessor, Hibernate will check the caches and fallback to DB if needed, using batch-fetching if configured.
If the ID doesn't exists, a ObjectNotFoundException will occur once the object is loaded. This might be somewhere in your code where you wouldn't really expect an exception - you're using a simple accessor in the end. So either be 100% sure the ID exists or at least force a ObjectNotFoundException early where you'd expect it, e.g. right after populating the list.

There is a difference between hibernate 2nd level cache to hibernate query cache.
The following link explains it really well: http://www.javalobby.org/java/forums/t48846.html
In a nutshell,
If you are using the same query many times with the same parameters then you can reduce database hits using a combination of both.

Another thing that you could do is to sort the list of ids, and identify subsequences of consecutive ids and then query each of those subsequences in a single query. For example, given List<Long> ids, do the following (assuming that you have a Pair class in Java):
List<Pair> pairs=new LinkedList<Pair>();
List<Object> results=new LinkedList<Object>();
Collections.sort(ids);
Iterator<Long> it=ids.iterator();
Long previous=-1L;
Long sequence_start=-1L;
while (it.hasNext()){
Long next=it.next();
if (next>previous+1) {
pairs.add(new Pair(sequence_start, previous));
sequence_start=next;
}
previous=next;
}
pairs.add(new Pair(sequence_start, previous));
for (Pair pair : pairs){
Query query=session.createQuery("from Person p where p.id>=:start_id and p.id<=:end_id");
query.setLong("start_id", pair.getStart());
query.setLong("end_id", pair.getEnd());
results.addAll((List<Object>)query.list());
}

Fetching each entity one by one in a loop can lead to N+1 query issues.
Therefore, it's much more efficient to fetch all entities at once and do the processing afterward.
Now, in your proposed solution, you were using the legacy Hibernate Criteria, but since it's been deprecated since Hibernate 4 and will probably be removed in Hibernate 6, so it's better to use one of the following alternatives.
JPQL
You can use a JPQL query like the following one:
List<Song> songs = entityManager
.createQuery(
"select s " +
"from Song s " +
"where s.id in (:ids)", Song.class)
.setParameter("ids", songGroup.getSongIds())
.getResultList();
Criteria API
If you want to build the query dynamically, then you can use a Criteria API query:
CriteriaBuilder builder = entityManager.getCriteriaBuilder();
CriteriaQuery<Song> query = builder.createQuery(Song.class);
ParameterExpression<List> ids = builder.parameter(List.class);
Root<Song> root = query
.from(Song.class);
query
.where(
root.get("id").in(
ids
)
);
List<Song> songs = entityManager
.createQuery(query)
.setParameter(ids, songGroup.getSongIds())
.getResultList();
Hibernate-specific multiLoad
List<Song> songs = entityManager
.unwrap(Session.class)
.byMultipleIds(Song.class)
.multiLoad(songGroup.getSongIds());
Now, the JPQL and Criteria API can benefit from the hibernate.query.in_clause_parameter_padding optimization as well, which allows you to increase the SQL statement caching mechanism.
For more details about loading multiple entities by their identifier, check out this article.

Related

How to use ScrollableResults for Hibernate Queries when joining many different entities

I am using Spring Boot endpoints to return results from database queries. It works fine when using getResultList() on the TypedQuery. However I know I will have to managed very large data sets. I am looking into using ScrollableResults via hibernate but I cannot figure out how to actually reference the contents of each row.
StatelessSession session = ((Session) entityManager.getDelegate()).getSessionFactory().openStatelessSession();
criteriaQuery.multiselect(selections);
criteriaQuery.where(predicates.toArray(new Predicate[]{}));
Query<?> query = session.createQuery(criteriaQuery);
query.setMaxResults(5);
query.setFetchSize(1000);
query.setReadOnly(true);
ScrollableResults results = query.scroll(ScrollMode.FORWARD_ONLY);
while(results.next()){
Object row = results.get();
}
results.close();
session.close();
I have tried results.get(0), results.get(0)[0], results.getLong(0), Object[] row vs Object row, etc. With and without toString() on all of the options. Nothing I do gets more out of the row than the java object reference. I've tried casting as well and get a "cannot cast error". Sometimes I get an error, "query specifies a holder class". Not sure what that means because my criteria query is built by joining 1 or more entities where the entities and selected columns are not known before hand. So I am not actually specifying a class. They entities and selects are specified by user input. Any thoughts? Thanks!
UPDATE:
I can do System.out.println(scroll.getType(0)); and in this case observe a long.
But when I try to save that long (.getLong(0)) I get the error, "query specifies a holder class". Or again the cannot cast error.
Got it figured out. queryDetails is a CriteriaQuery<Tuple>
StatelessSession session = entityManagers.get("DatasourceName").unwrap(Session.class).getSessionFactory().openStatelessSession();
Stream<Tuple> resultStream = session.createQuery(queryDetails)
.setReadOnly(true)
.setMaxResults(100)
.setFetchSize(1000)
.setCacheable(false)
.getResultStream();
Iterator<Tuple> itr = resultStream.iterator();
while (itr.hasNext()){
//Get the next row:
Tuple row = itr.next();
}
A CriteriaQuery that uses multiselect produces an Object[] or javax.persistence.Tuple as result type. Maybe you should try debugging to see the actual object type and from there you can work further.
If you are processing and returning all rows anyway, there is no need to use the ScrollableResults API as you will have to create objects for all rows anyway. If your use case is to do some kind of aggregation, I would recommend you use an aggregate function instead and let the database do the aggregation.

Retrieve value of a DB column after I update it

Sorry in advance for the long post. I'm working with a Java WebApplication which uses Spring (2.0, I know...) and Jpa with Hibernateimplementation (using hibernate 4.1 and hibernate-jpa-2.0.jar). I'm having problems retrieving the value of a column from a DB Table (MySql 5) after i update it. This is my situation (simplified, but that's the core of it):
Table KcUser:
Id:Long (primary key)
Name:String
.
.
.
Contract_Id: Long (foreign key, references KcContract.Id)
Table KcContract:
Id: Long (primary Key)
ColA
.
.
ColX
In my server I have something like this:
MyController {
myService.doSomething();
}
MyService {
private EntityManager myEntityManager;
#Transactional(readOnly=true)
public void doSomething() {
List<Long> IDs = firstFetch(); // retrieves some users IDs querying the KcContract table
doUpdate(IDs); // updates a column on KcUser rows that matches the IDs retrieved by the previous query
secondFecth(IDs); // finally retrieves KcUser rows <-- here the returned rows contains the old value and not the new one i updated in the previous method
}
#Transactional(readOnly=true)
private List<Long> firstFetch() {
List<Long> userIDs = myEntityManager.createQuery("select c.id from KcContract c" ).getResultList(); // this is not the actual query, there are some conditions in the where clause but you get the idea
return userIDs;
}
#Transactional(readOnly=false, propagation=Propagation.REQUIRES_NEW)
private void doUpdate(List<Long> IDs) {
Query hql = myEntityManager().createQuery("update KcUser t set t.name='newValue' WHERE t.contract.id IN (:list)").setParameter("list", IDs);
int howMany = hql.executeUpdate();
System.out.println("HOW MANY: "+howMany); // howMany is correct, with the number of updated rows in DB
Query select = getEntityManager().createQuery("select t from KcUser t WHERE t.contract.id IN (:list)" ).setParameter("list", activeContractIDs);
List<KcUser> users = select.getResultList();
System.out.println("users: "+users.get(0).getName()); //correct, newValue!
}
private void secondFetch(List<Long> IDs) {
List<KcUser> users = myEntityManager.createQuery("from KcUser t WHERE t.contract.id IN (:list)").setParameter("list", IDs).getResultList()
for(KcUser u : users) {
myEntityManager.refresh(u);
String name = u.getName(); // still oldValue!
}
}
}
The strange thing is that if i comment the call to the first method (myService.firstFetch()) and call the other two methods with a constant list of IDs, i get the correct new KcUser.name value in secondFetch() method.
Im not very expert with Jpa and Hibernate, but I thought it might be a cache problem, so i've tried:
using myEntityManager.flush() after the update
clearing the cache with myEntityManager.clear() and myEntityManager.getEntityManagerFactory().evictAll();
clearing the cache with hibernate Session.clear()
using myEntityManager.refresh on KcUser entities
using native queries (myEntityManager.createNativeQuery("")), which to my understanding should not involve any cache
Nothing of that worked and I always got returned the old KcUser.name value in secondFetch() method.
The only things that worked so far are:
making the firstFetch() method public and moving its call outside of myService.doSomething(), so doing something like this in MyController:
List<Long> IDs = myService.firstFetch();
myService.doSomething(IDs);
using a new EntityManager in secondFetch(), so doing something like this:
EntityManager newEntityManager = myEntityManager.getEntityManagerFactory().createEntityManager();
and using it to execute the subsequent query to fetch users from DB
Using either of the last two methods, the second select works fine and i get users with the updated value in "name" column.
But I'd like to know what's actually happening and why noone of the other things worked: if it's actually a cache problem a simply .clear() or .refresh() should have worked i think. Or maybe i'm totally wrong and it's not related to the cache at all, but then i'm bit lost to what might actually be happening.
I fear there might be something wrong in the way we are using hibernate / jpa which might bite us in the future.
Any idea please? Tell me if you need more details and thanks for your help.
Actions are performed in following order:
Read-only transaction A opens.
First fetch (transaction A)
Not-read-only transaction B opens
Update (transaction B)
Transaction B closes
Second fetch (transaction A)
Transaction A closes
Transaction A is read-only. All subsequent queries in that transaction see only changes that were committed before the transaction began - your update was performed after it.

The query in HibernateSearch does not sort by updated data

I'm using HibernateSearch 5.7.1.Final with ElasticSearch from Java.
I have two main types of objects in my model,
Book and Journal, which inherit from class Record
and share some common properties: rank and title.
The typical rank for a book is 1 and typical value for a journal
is 2, but sometimes it can be different.
I created some data with default rank, but for one book
I wanted to test if updates to the rank value will be respected
in the query results.
I first set it value to 0 before saving:
...
Book book03 = new Book("Special Book", prus);
book03.setRank(0);
session.save(book03);
Then I obtain the object using Hibernate (not HibernateSearch)
mechanisms, and update the value:
session.clear();
session.beginTransaction();
Criteria specialCriteria = session.createCriteria(Book.class);
Book special = (Book) specialCriteria.add(Restrictions.eq("title", "Special Book")).uniqueResult();
special.setRank(6);
session.saveOrUpdate(special);
session.getTransaction().commit();
Now I want to obtain all books and journals and sort them by rank.
I combine a query for books and a query for journals using a boolean query.
I don't know how to obtain all objects of a specific type, without including any criteria, so I query
for objects that have anything in a title.
session.clear();
sessionFactory.getCache().evictEntityRegions();
FullTextSession fullTextSession = Search.getFullTextSession(session);
fullTextSession.clear();
QueryBuilder bookQueryBuilder = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity(Book.class).get();
QueryBuilder journalQueryBuilder = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity(Journal.class).get();
QueryBuilder generalQueryBuilder = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity(Record.class).get();
org.apache.lucene.search.Query bookLuceneQuery
= bookQueryBuilder.keyword().wildcard().onField("title").matching("*").createQuery();
org.apache.lucene.search.Query journalLuceneQuery
= journalQueryBuilder.keyword().wildcard().onField("title").matching("*").createQuery();
org.apache.lucene.search.Query combinedLuceneQuery = generalQueryBuilder
.bool()
.should( bookLuceneQuery )
.should( journalLuceneQuery )
.createQuery();
FullTextQuery combinedQuery = fullTextSession.createFullTextQuery(combinedLuceneQuery, Record.class);
Sort sort = generalQueryBuilder.sort()
.byField("rank").asc()
.createSort();
combinedQuery.setSort(sort);
List result = combinedQuery.list();
System.out.println("\n\nSearch results for books and journals:");
for(Object object : result)
{
Record record = (Record) object;
System.out.println(record.getTitle());
System.out.println(record.getRank());
System.out.println();
}
The problem is that I obtain the following results:
Special Book
6
book01
1
book02
1
book04
1
journal01
2
journal02
2
It looks like the value finally updated to 6
(this value is displayed when printing the object),
but during sorting the value 0 is used
(that's why the Special Book got to the top).
As you can see I tried resetting the session so that nothing gets cached,
but it didn't help.
My another hypothesis is that the rank is not the only element
that affects the sorting, but also some relevance is taken into consideration,
because the problem does not occur when I don't use the boolean
query. When I only ask for books, Special Book is the last
in the result list. But I would prefer to use the combined query
so that I can paginate the results.
By default Elasticsearch (and Hibernate Search, when it uses Elasticsearch) is near real-time, meaning there will be a slight delay (1 second by default) before your updates affect the results of search queries. The moment when updates are taken into account is called a "refresh".
Did you try adding a slight pause (e.g. 2s) after your update?
If it solves the issue, then the refresh delay was causing it. In real-world situation the delay may not be a problem, because you will rarely perform an update and then do a query just after that requires the index to be perfectly up to date.
If you do have such requirements though, you can set the property hibernate.search.default.elasticsearch.refresh_after_write to true. Be aware that this will impact performance negatively, though.

Update all objects in JPA entity

I'm trying to update all my 4000 Objects in ProfileEntity but I am getting the following exception:
javax.persistence.QueryTimeoutException: The datastore operation timed out, or the data was temporarily unavailable.
this is my code:
public synchronized static void setX4all()
{
em = EMF.get().createEntityManager();
Query query = em.createQuery("SELECT p FROM ProfileEntity p");
List<ProfileEntity> usersList = query.getResultList();
int a,b,x;
for (ProfileEntity profileEntity : usersList)
{
a = profileEntity.getA();
b = profileEntity.getB();
x = func(a,b);
profileEntity.setX(x);
em.getTransaction().begin();
em.persist(profileEntity);
em.getTransaction().commit();
}
em.close();
}
I'm guessing that I take too long to query all of the records from ProfileEntity.
How should I do it?
I'm using Google App Engine so no UPDATE queries are possible.
Edited 18/10
In this 2 days I tried:
using Backends as Thanos Makris suggested but got to a dead end. You can see my question here.
reading DataNucleus suggestion on Map-Reduce but really got lost.
I'm looking for a different direction. Since I only going to do this update once, Maybe I can update manually every 200 objects or so.
Is it possible to to query for the first 200 objects and after it the second 200 objects and so on?
Given your scenario, I would advice to run a native update query:
Query query = em.createNativeQuery("update ProfileEntity pe set pe.X = 'x'");
query.executeUpdate();
Please note: Here the query string is SQL i.e. update **table_name** set ....
This will work better.
Change the update process to use something like Map-Reduce. This means all is done in datastore. The only problem is that appengine-mapreduce is not fully released yet (though you can easily build the jar yourself and use it in your GAE app - many others have done so).
If you want to set(x) for all object's, better to user update statement (i.e. native SQL) using JPA entity manager instead of fetching all object's and update it one by one.
Maybe you should consider the use of the Task Queue API that enable you to execute tasks up to 10min. If you want to update such a number of entities that Task Queues do not fit you, you could also consider the user of Backends.
Put the transaction outside of the loop:
em.getTransaction().begin();
for (ProfileEntity profileEntity : usersList) {
...
}
em.getTransaction().commit();
Your class behaves not very well - JPA is not suitable for bulk updates this way - you just starting a lot of transaction in rapid sequence and produce a lot of load on the database. Better solution for your use case would be scalar query setting all the objects without loading them into JVM first ( depending on your objects structure and laziness you would load much more data as you think )
See hibernate reference:
http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/batch.html#batch-direct

Is there are way to scroll results with JPA/hibernate?

I found some hint in Toplink
Query query = em.createQuery("SELECT e FROM Employee e ORDER BY e.lastName ASC, e.firstName ASC");
query.setHint("eclipselink.cursor.scrollable", true);
ScrollableCursor scrollableCursor = (ScrollableCursor)query.getSingleResult();
List<Employee> emps = scrollableCursor.next(10);
is there are jpa/hibernate alternative?
To my knowledge, there is nothing standard in JPA for that.
With Hibernate, the closest alternative I'm aware of would be the Query / ScrollableResults APIs. From the documentation:
10.4.1.6. Scrollable iteration
If your JDBC driver supports
scrollable ResultSets, the Query
interface can be used to obtain a
ScrollableResults object that allows
flexible navigation of the query
results.
Query q = sess.createQuery("select cat.name, cat from DomesticCat cat " +
"order by cat.name");
ScrollableResults cats = q.scroll();
if ( cats.first() ) {
// find the first name on each page of an alphabetical list of cats by name
firstNamesOfPages = new ArrayList();
do {
String name = cats.getString(0);
firstNamesOfPages.add(name);
}
while ( cats.scroll(PAGE_SIZE) );
// Now get the first page of cats
pageOfCats = new ArrayList();
cats.beforeFirst();
int i=0;
while( ( PAGE_SIZE > i++ ) && cats.next() ) pageOfCats.add( cats.get(1) );
}
cats.close()
Note that an open database connection
and cursor is required for this
functionality. Use
setMaxResult()/setFirstResult() if you
need offline pagination functionality.
Judging from the other answers JPA does not support scrolling directly, but if you use Hibernate as JPA implementation you can do
javax.persistence.Query query = entityManager.createQuery("select foo from bar");
org.hibernate.Query hquery = query.unwrap(org.hibernate.Query);
ScrollableResults results = hquery.scroll(ScrollMode.FORWARD_ONLY);
That accesses the underlying Hibernate api for the scrolling but you can use all the features of JPA querying. (At least for criteria queries the JPA api has some features that are not in the old Hibernate api.)
When processing large number of entities in a large project code based on List<E> instances,
I has to write a really limited List implementation with only Iterator support to browse a ScrollableResults without refactoring all services implementations and method prototypes using List<E>.
This implementation is available in my IterableListScrollableResults.java Gist
It also regularly flushes Hibernate entities from session. Here is a way to use it, for instance when exporting all non archived entities from DB as a text file with a for loop:
Criteria criteria = getCurrentSession().createCriteria(LargeVolumeEntity.class);
criteria.add(Restrictions.eq("archived", Boolean.FALSE));
criteria.setReadOnly(true);
criteria.setCacheable(false);
List<E> result = new IterableListScrollableResults<E>(getCurrentSession(),
criteria.scroll(ScrollMode.FORWARD_ONLY));
for(E entity : result) {
dumpEntity(file, entity);
}
With the hope it may help
In JPA you can use query.setFirstResult and query.setMaxResults
Also using Spring Data would be an option. There you can specify the query and pass, as a parameter, a "PageRequest" in which you indicate the page size and the page number:
Page<User> users = repository.findAll(new PageRequest(1, 20));
For this you need to extend a PagingAndSortingRepository.
Just as another alternative for paging over the results.
Of course, underneath, it's using Hibernate, Toplink or whatever JPA implementation you configure.

Categories

Resources