Duplicate records with hibernate joins and pagination - java

I'm using hibernate 3.6.3 Final version (I know it's pretty old, but that is like that for now on this project I'm currently working on).
I have a problem with doing joins and pagination in order that I am getting one record duplicated in results which is caused by hibernate.
This is the code I have:
public Page<T> findByCriteriaPaginated(PageParams params, Criteria countCriteria, Criteria listCriteria, String[] joins) {
Page<T> page = new Page<T>(params);
// count criteria
countCriteria.setProjection(Projections.rowCount());
page.setTotalCount(((Long) countCriteria.uniqueResult()).intValue());
// fetch criteria
listCriteria.setFirstResult(params.getFirstResultIdx());
listCriteria.setMaxResults(params.getPageSize());
if (params.getOrdering() != null && params.getOrdering().getSize() > 0) {
for (Iterator<String> it = params.getOrdering().getKeyIterator(); it.hasNext();) {
String key = it.next();
if (params.getOrdering().isAscending(key)) {
listCriteria.addOrder(Order.asc(key));
} else {
listCriteria.addOrder(Order.desc(key));
}
}
}
if (joins != null && joins.length > 0) {
for (String s : joins) {
listCriteria.setFetchMode(s, FetchMode.JOIN);
}
}
page.setResults(listCriteria.list());
return page;
}
When I hit the query that is generated and run it on DB server, then I don't have this duplicate record. But debuging my code, listCriteria.list() returns data set with duplicate. Also when I comment out these two lines listCriteria.setFirstResult(params.getFirstResultIdx());
listCriteria.setMaxResults(params.getPageSize());, then listCriteria.list() has no duplicate, it is fine.
So, this indicates to me that there is some problem with pagination and the rest of criteria I'm having (with joins and ordering).
Does anybody have idea how to fix this? Is this hibernate bug? Would increasing hibernate version to the latest (5.2.9.Final) help? Are there any potential problems with such large version upgrade?
Thank you for any kind of help.
mismas

Two things:
If you see duplicate rows in the same page, then you have issues with your joins. Try to log the SQL query, then execute it manually. The best way to solve this issue is with criteria.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY);
For your pagination, if getOrdering() is empty, you do not add any order by. Although, to paginate correctly, you absolutely need an order by clause. Add listCriteria.addOrder(Order.asc("id")); to your code so the Id is a last resort ordering.

Related

Retrieve value of a DB column after I update it

Sorry in advance for the long post. I'm working with a Java WebApplication which uses Spring (2.0, I know...) and Jpa with Hibernateimplementation (using hibernate 4.1 and hibernate-jpa-2.0.jar). I'm having problems retrieving the value of a column from a DB Table (MySql 5) after i update it. This is my situation (simplified, but that's the core of it):
Table KcUser:
Id:Long (primary key)
Name:String
.
.
.
Contract_Id: Long (foreign key, references KcContract.Id)
Table KcContract:
Id: Long (primary Key)
ColA
.
.
ColX
In my server I have something like this:
MyController {
myService.doSomething();
}
MyService {
private EntityManager myEntityManager;
#Transactional(readOnly=true)
public void doSomething() {
List<Long> IDs = firstFetch(); // retrieves some users IDs querying the KcContract table
doUpdate(IDs); // updates a column on KcUser rows that matches the IDs retrieved by the previous query
secondFecth(IDs); // finally retrieves KcUser rows <-- here the returned rows contains the old value and not the new one i updated in the previous method
}
#Transactional(readOnly=true)
private List<Long> firstFetch() {
List<Long> userIDs = myEntityManager.createQuery("select c.id from KcContract c" ).getResultList(); // this is not the actual query, there are some conditions in the where clause but you get the idea
return userIDs;
}
#Transactional(readOnly=false, propagation=Propagation.REQUIRES_NEW)
private void doUpdate(List<Long> IDs) {
Query hql = myEntityManager().createQuery("update KcUser t set t.name='newValue' WHERE t.contract.id IN (:list)").setParameter("list", IDs);
int howMany = hql.executeUpdate();
System.out.println("HOW MANY: "+howMany); // howMany is correct, with the number of updated rows in DB
Query select = getEntityManager().createQuery("select t from KcUser t WHERE t.contract.id IN (:list)" ).setParameter("list", activeContractIDs);
List<KcUser> users = select.getResultList();
System.out.println("users: "+users.get(0).getName()); //correct, newValue!
}
private void secondFetch(List<Long> IDs) {
List<KcUser> users = myEntityManager.createQuery("from KcUser t WHERE t.contract.id IN (:list)").setParameter("list", IDs).getResultList()
for(KcUser u : users) {
myEntityManager.refresh(u);
String name = u.getName(); // still oldValue!
}
}
}
The strange thing is that if i comment the call to the first method (myService.firstFetch()) and call the other two methods with a constant list of IDs, i get the correct new KcUser.name value in secondFetch() method.
Im not very expert with Jpa and Hibernate, but I thought it might be a cache problem, so i've tried:
using myEntityManager.flush() after the update
clearing the cache with myEntityManager.clear() and myEntityManager.getEntityManagerFactory().evictAll();
clearing the cache with hibernate Session.clear()
using myEntityManager.refresh on KcUser entities
using native queries (myEntityManager.createNativeQuery("")), which to my understanding should not involve any cache
Nothing of that worked and I always got returned the old KcUser.name value in secondFetch() method.
The only things that worked so far are:
making the firstFetch() method public and moving its call outside of myService.doSomething(), so doing something like this in MyController:
List<Long> IDs = myService.firstFetch();
myService.doSomething(IDs);
using a new EntityManager in secondFetch(), so doing something like this:
EntityManager newEntityManager = myEntityManager.getEntityManagerFactory().createEntityManager();
and using it to execute the subsequent query to fetch users from DB
Using either of the last two methods, the second select works fine and i get users with the updated value in "name" column.
But I'd like to know what's actually happening and why noone of the other things worked: if it's actually a cache problem a simply .clear() or .refresh() should have worked i think. Or maybe i'm totally wrong and it's not related to the cache at all, but then i'm bit lost to what might actually be happening.
I fear there might be something wrong in the way we are using hibernate / jpa which might bite us in the future.
Any idea please? Tell me if you need more details and thanks for your help.
Actions are performed in following order:
Read-only transaction A opens.
First fetch (transaction A)
Not-read-only transaction B opens
Update (transaction B)
Transaction B closes
Second fetch (transaction A)
Transaction A closes
Transaction A is read-only. All subsequent queries in that transaction see only changes that were committed before the transaction began - your update was performed after it.

Hibernate query fails with "this_.id must appear in GROUP BY" when using projection.count or projection.rowcount

Edited to provide more detailed information.
I'm building a web service using Spring 2.5.6 and Hibernate 4. I'm dynamically building a criteria query based on input from a client. We are using Spring 2.5.6 because it is legacy code, previous attempts to upgrade to later versions of Spring by updating the versions in the Maven POM file fail, and our deadline is too tight to provide time to fully transition the project to a later version of Spring.
The web service is searching for observations made by sensors using filters sent to the service by a client over SOAP. The users of this web service have need to create filters that result in several thousand observations returned. The service is taking so long to return a response, the users' clients are timing out. What I am attempting to do to resolve this performance issue is first query the database for how many observations would be returned by the users' filters, then splitting the work off into several threads of execution using a cached thread pool. Each thread will query the database for a portion of the results. Then using a thread-safe queue provided by Java's java.util.concurrent package, I have each thread encode the responses into the proper JAXB objects and add these objects to the queue. Finally, the web service returns the entire response to the client.
I'm hoping this design will reduce response time (it obviously assumes the database will handle the multiple parallel queries just fine and that returning the results from the database in pieces along several connections is faster than one bulk return on a single connection). However, when attempting to get that initial count required before creating my threads, I get an error from the database.
I'm using Hibernate criteria queries and a Hibernate projection to get the count. The criteria is generated by the code below:
Criteria criteria = session.createCriteria(Observation.class);
if (!filter.isSetService())
{
throw new JpaCriteriaFactoryException("Service required for ObservationFilter.");
}
criteria.createAlias("sensor", "sensor").add(Restrictions.eq("sensor.service", filter.getService()));
criteria = criteria.setMaxResults(filter.getMax());
criteria = criteria.setFirstResult(filter.getStart());
criteria = criteria.addOrder(Order.desc("id"));
if (filter.isSetOffering())
{
// offerings will be implemented later
}
if (filter.isTemplate())
{
criteria = criteria.add(Restrictions.eq("template", true));
}
else
{
criteria = criteria.add(Restrictions.eq("template", false));
}
if (filter.isSetProcedures())
{
criteria = criteria.add(Restrictions.in("sensor.uniqueId", filter.getProcedures()));
}
if (filter.isSetPhenomenons())
{
criteria = criteria.createAlias("phenomenon", "phenom")
.add(Restrictions.in("phenom.id", filter.getPhenomenons()));
}
if (filter.isSetTemporalFilter())
{
criteria = criteria.add(createTemporalCriteria(filter.getTemporalFilter()));
}
if (filter.isSetGeospatialFilter())
{
criteria = criteria.createAlias("featureOfInterest", "foi")
.add(createGeospatialCriteria("foi.geometry",
filter.getGeospatialFilter(), geoFac));
}
if (filter.isSetScalarFilter())
{
try
{
criteria = criteria.createAlias(RESULTS_ALIAS, RESULTS_ALIAS)
.add(ScalarCriterionFactory.createScalarCriterion(filter.getScalarFilter(), RESULTS_ALIAS));
}
catch (ScalarCriterionFactoryException ex)
{
throw new JpaCriteriaFactoryException("Failed to build criterion for scalar filter!", ex);
}
}
return criteria;
Then, to get the count of the results, rather than the results themselves, criteria.setProjection(Projections.rowCount()) is added. However, this results in the following exception:
org.postgresql.util.PSQLException: ERROR: column "this_.id" must appear in the GROUP BY clause or be used in an aggregate function
In hibernate, I added the following settings:
<property name="hibernate.show_sql" value="true"/>
<property name="hibernate.format_sql" value="true"/>
<property name="hibernate.use_sql_comments" value="true"/>
and got the following output:
/* criteria query */ select
count(*) as y0_
from
Observation this_
inner join
Sensor sensor1_
on this_.sensor_id=sensor1_.id
where
sensor1_.service_id=?
and this_.template=?
and sensor1_.uniqueId in (
?
)
order by
this_.id desc limit ?
Using the exact same filter to generate a criteria, but not adding criteria.setProjection(Projections.rowCount()), I get the exact results I'm expecting. So I do not feel that the criteria is being created incorrectly. I cannot use criteria.list().size() because the whole point of this is to get the results back in parallel rather than in serial.
Can someone please help me to resolve this issue? If a better solution then my "threading" solution is available, I am also open to suggestions.
The problem is the statement:
criteria = criteria.addOrder(Order.desc("id"));
It should be removed, as the generated 'order by' clause is not required when setting the projection:
criteria.setProjection(Projections.rowCount());
Proceeding to add the proposed 'group by' clause isn't the way to go, as it'll result in a list, where each row will have an 'id' value and a row count value of 1 (for the columns).
In the end the statement:
criteria.uniqueResult();
Will throw an exception for multiple result values.
I've identified the solution to this.
What I really needed was a Hibernate equivalent of SELECT count(*) FROM (SELECT ... ) Per https://docs.jboss.org/hibernate/orm/4.2/manual/en-US/html/ch16.html#queryhql-subqueries, this is not allowed in HQL. Also, based on https://docs.jboss.org/hibernate/orm/4.2/manual/en-US/html/ch17.html#querycriteria-detachedqueries, it appears that legacy Hibernate Criteria does not support this function either, since the way of creating subqueries there is to use DetachedCriteria adding these via the Subqueries class. It does appear to be doable using formal JPA CriteriaBuilder per http://docs.jboss.org/hibernate/orm/4.2/devguide/en-US/html/ch12.html#querycriteria-from, however due to the architecture currently employed in my service, I am unable at this time to use this feature.

Update all objects in JPA entity

I'm trying to update all my 4000 Objects in ProfileEntity but I am getting the following exception:
javax.persistence.QueryTimeoutException: The datastore operation timed out, or the data was temporarily unavailable.
this is my code:
public synchronized static void setX4all()
{
em = EMF.get().createEntityManager();
Query query = em.createQuery("SELECT p FROM ProfileEntity p");
List<ProfileEntity> usersList = query.getResultList();
int a,b,x;
for (ProfileEntity profileEntity : usersList)
{
a = profileEntity.getA();
b = profileEntity.getB();
x = func(a,b);
profileEntity.setX(x);
em.getTransaction().begin();
em.persist(profileEntity);
em.getTransaction().commit();
}
em.close();
}
I'm guessing that I take too long to query all of the records from ProfileEntity.
How should I do it?
I'm using Google App Engine so no UPDATE queries are possible.
Edited 18/10
In this 2 days I tried:
using Backends as Thanos Makris suggested but got to a dead end. You can see my question here.
reading DataNucleus suggestion on Map-Reduce but really got lost.
I'm looking for a different direction. Since I only going to do this update once, Maybe I can update manually every 200 objects or so.
Is it possible to to query for the first 200 objects and after it the second 200 objects and so on?
Given your scenario, I would advice to run a native update query:
Query query = em.createNativeQuery("update ProfileEntity pe set pe.X = 'x'");
query.executeUpdate();
Please note: Here the query string is SQL i.e. update **table_name** set ....
This will work better.
Change the update process to use something like Map-Reduce. This means all is done in datastore. The only problem is that appengine-mapreduce is not fully released yet (though you can easily build the jar yourself and use it in your GAE app - many others have done so).
If you want to set(x) for all object's, better to user update statement (i.e. native SQL) using JPA entity manager instead of fetching all object's and update it one by one.
Maybe you should consider the use of the Task Queue API that enable you to execute tasks up to 10min. If you want to update such a number of entities that Task Queues do not fit you, you could also consider the user of Backends.
Put the transaction outside of the loop:
em.getTransaction().begin();
for (ProfileEntity profileEntity : usersList) {
...
}
em.getTransaction().commit();
Your class behaves not very well - JPA is not suitable for bulk updates this way - you just starting a lot of transaction in rapid sequence and produce a lot of load on the database. Better solution for your use case would be scalar query setting all the objects without loading them into JVM first ( depending on your objects structure and laziness you would load much more data as you think )
See hibernate reference:
http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/batch.html#batch-direct

Loading multiple entities by id efficiently in Hibernate

So, I'm getting a number of instances of a particular entity by id:
for(Integer songId:songGroup.getSongIds()) {
session = HibernateUtil.getSession();
Song song = (Song) session.get(Song.class,id);
processSong(song);
}
This generates a SQL query for each id, so it occurred to me that I should do this in one, but I couldn't find a way to get multiple entities in one call except by running a query. So I wrote a query
return (List) session.createCriteria(Song.class)
.add(Restrictions.in("id",ids)).list();
But, if I enable 2nd level caching doesn't that mean that my old method would be able to return the objects from the 2nd level cache (if they had been requested before) but my query would always go to the database.
What the correct way to do this?
What you're asking to do here is for Hibernate to do special case handling for your Criteria, which is kind of a lot to ask.
You'll have to do it yourself, but it's not hard. Using SessionFactory.getCache(), you can get a reference to the actual storage for cached objects. Do something like the following:
for (Long id : allRequiredIds) {
if (!sessionFactory.getCache().containsEntity(Song.class, id)) {
idsToQueryDatabaseFor.add(id)
} else {
songs.add(session.get(Song.class, id));
}
}
List<Song> fetchedSongs = session.createCriteria(Song.class).add(Restrictions.in("id",idsToQueryDatabaseFor).list();
songs.addAll(fetchedSongs);
Then the Songs from the cache get retrieved from there, and the ones that are not get pulled with a single select.
If you know that the IDs exist, you can use load(..) to create a proxy without actually hitting the DB:
Return the persistent instance of the given entity class with the given identifier, obtaining the specified lock mode, assuming the instance exists.
List<Song> list = new ArrayList<>(ids.size());
for (Integer id : ids)
list.add(session.load(Song.class, id, LockOptions.NONE));
Once you access a non-identifier accessor, Hibernate will check the caches and fallback to DB if needed, using batch-fetching if configured.
If the ID doesn't exists, a ObjectNotFoundException will occur once the object is loaded. This might be somewhere in your code where you wouldn't really expect an exception - you're using a simple accessor in the end. So either be 100% sure the ID exists or at least force a ObjectNotFoundException early where you'd expect it, e.g. right after populating the list.
There is a difference between hibernate 2nd level cache to hibernate query cache.
The following link explains it really well: http://www.javalobby.org/java/forums/t48846.html
In a nutshell,
If you are using the same query many times with the same parameters then you can reduce database hits using a combination of both.
Another thing that you could do is to sort the list of ids, and identify subsequences of consecutive ids and then query each of those subsequences in a single query. For example, given List<Long> ids, do the following (assuming that you have a Pair class in Java):
List<Pair> pairs=new LinkedList<Pair>();
List<Object> results=new LinkedList<Object>();
Collections.sort(ids);
Iterator<Long> it=ids.iterator();
Long previous=-1L;
Long sequence_start=-1L;
while (it.hasNext()){
Long next=it.next();
if (next>previous+1) {
pairs.add(new Pair(sequence_start, previous));
sequence_start=next;
}
previous=next;
}
pairs.add(new Pair(sequence_start, previous));
for (Pair pair : pairs){
Query query=session.createQuery("from Person p where p.id>=:start_id and p.id<=:end_id");
query.setLong("start_id", pair.getStart());
query.setLong("end_id", pair.getEnd());
results.addAll((List<Object>)query.list());
}
Fetching each entity one by one in a loop can lead to N+1 query issues.
Therefore, it's much more efficient to fetch all entities at once and do the processing afterward.
Now, in your proposed solution, you were using the legacy Hibernate Criteria, but since it's been deprecated since Hibernate 4 and will probably be removed in Hibernate 6, so it's better to use one of the following alternatives.
JPQL
You can use a JPQL query like the following one:
List<Song> songs = entityManager
.createQuery(
"select s " +
"from Song s " +
"where s.id in (:ids)", Song.class)
.setParameter("ids", songGroup.getSongIds())
.getResultList();
Criteria API
If you want to build the query dynamically, then you can use a Criteria API query:
CriteriaBuilder builder = entityManager.getCriteriaBuilder();
CriteriaQuery<Song> query = builder.createQuery(Song.class);
ParameterExpression<List> ids = builder.parameter(List.class);
Root<Song> root = query
.from(Song.class);
query
.where(
root.get("id").in(
ids
)
);
List<Song> songs = entityManager
.createQuery(query)
.setParameter(ids, songGroup.getSongIds())
.getResultList();
Hibernate-specific multiLoad
List<Song> songs = entityManager
.unwrap(Session.class)
.byMultipleIds(Song.class)
.multiLoad(songGroup.getSongIds());
Now, the JPQL and Criteria API can benefit from the hibernate.query.in_clause_parameter_padding optimization as well, which allows you to increase the SQL statement caching mechanism.
For more details about loading multiple entities by their identifier, check out this article.

Hibernate ScrollableResults Do Not Return The Whole Set of Results

Some of the queries we run have 100'000+ results and it takes forever to load them and then send them to the client. So I'm using ScrollableResults to have a paged results feature. But we're topping at roughly 50k results (never exactly the same amount of results).
I'm on an Oracle9i database, using the Oracle 10 drivers and Hibernate is configured to use the Oracle9 dialect. I tried with the latest JDBC driver (ojdbc6.jar) and the problem was reproduced.
We also followed some advice and added an ordering clause, but the problem was reproduced.
Here is a code snippet that illustrates what we do:
final int pageSize = 50;
Criteria crit = sess.createCriteria(ABC.class);
crit.add(Restrictions.eq("property", value));
crit.setFetchSize(pageSize);
crit.addOrder(Order.asc("property"));
ScrollableResults sr = crit.scroll();
...
...
ArrayList page = new ArrayList(pageSize);
do{
for (Object entry : page)
sess.evict(entry); //to avoid having our memory just explode out of proportion
page.clear();
for (int i =0 ; i < pageSize && ! metLastRow; i++){
if (sr.next())
page.add(sr.get(0));
else
metLastRow = true;
}
metLastRow = metLastRow?metLastRow:sr.isLast();
sendToClient(page);
}while(!metLastRow);
So, why is it that I get the result set to tell me its at the end when it should be having so much more results?
Your code snippet is missing important pieces, like the definitions of resultSet and page. But I wonder anyway, shouldn't the line
if (resultSet.next())
be rather
if (sr.next())
?
As a side note, AFAIK cleaning up superfluous objects from the persistence context could be achieved simply by calling
session.flush();
session.clear();
instead of looping through the collection of object to evict each separately. (Of course, this requires that the query is executed in its own independent session.)
Update: OK, next round of guesses :-)
Can you actually check what rows are sent to the client and compare that against the result of the equivalent SQL query directly against the DB? It would be good to know whether this code retrieves (and sends to the client all rows up to a certain limit, or only some rows (like every 2nd) from the whole resultset, or ... that could shed some light on the root cause.
Another thing you could try is
crit.setFirstResults(0).setMaxResults(200000);
As I had the same issue with a large project code based on List<E> instances,
I wrote a really limited List implementation with only iterator support to browse a ScrollableResults without refactoring all services implementations and method prototypes.
This implementation is available in my IterableListScrollableResults.java Gist
It also regularly flushes Hibernate entities from session. Here is a way to use it, for instance when exporting all non archived entities from DB as a text file with a for loop:
Criteria criteria = getCurrentSession().createCriteria(LargeVolumeEntity.class);
criteria.add(Restrictions.eq("archived", Boolean.FALSE));
criteria.setReadOnly(true);
criteria.setCacheable(false);
List<E> result = new IterableListScrollableResults<E>(getCurrentSession(),
criteria.scroll(ScrollMode.FORWARD_ONLY));
for(E entity : result) {
dumpEntity(file, entity);
}
With the hope it may help

Categories

Resources