Should I include distinct in this JPQL query?

Should I include distinct in this JPQL query? - java

Background
I have seen multiple answers and questions here on SO and in many popular blogs about the necessity of the distinct keyword in JPQL JOIN FETCH queries and about the PASS_DISTINCT_THROUGH query hint.
For example, see these two questions
How does DISTINCT work when using JPA and Hibernate
Select DISTINCT on JPA
and these blog posts
The best way to use the JPQL DISTINCT keyword with JPA and Hibernate
The DISTINCT pass-through Hibernate Query Hint
Hibernate Tips: How To Apply DISTINCT to Your JPQL But Not Your SQL Query
What I am missing
Now my problem is that I cannot fully understand when exactly the distinct keyword must be included in the JPQL query. More specifically, if it depends on which method is used to execute the query (getResultList or getSingleResult).
The following is an example to clarify what I mean.
Everything I am writing from now on was tested on Ubuntu Linux 18.04, with Java 8, Hibernate 5.4.13 and an in-memory H2 database (version 1.4.200).
Suppose I have a Department entity which has a lazy bidirectional one to many relationship with a DepartmentDirector entity:
// Department.java
#Entity
public class Department {
// ...
private Set<DepartmentDirector> directors;
#OneToMany(mappedBy = "department", fetch = FetchType.LAZY)
public Set<DepartmentDirector> getDirectors() {
return directors;
}
// ...
}
// DepartmentDirector.java
#Entity
public class DepartmentDirector {
// ...
private Department department
#ManyToOne
#JoinColumn(name = "department_fk")
public Department getDepartment() {
return department;
}
// ...
}
Suppose that my database currently contains one department (department1) and two directors associated with it.
Now I want to retrieve the department by its uuid (the primary key), along with all its directors. This can be done with the following JOIN FETCH JPQL query:
String query = "select department from Department department left join fetch "
+ "department.directors where department.uuid = :uuid";
As the preceding query uses a join fetch with a child collection, I expected it to return two duplicated departments when issued: however this only happens when using the query with the getResultList method and not when using the getSingleResult method. This is somehow reasonable, but I have found that the Hibernate implementation of getSingleResult uses getResultList behind the curtains so I expected a NonUniqueResultException to be thrown.
I also briefly went through JPA 2.2 specification but no distinction in treating the duplicates between the two methods is mentioned, and every code sample concerning this issue uses the getResultList method.
Conclusion
In my example I found out that JOIN FETCH queries executed with getSingleResult do not suffer the duplicated entities problem explained in the resources I linked in the section Background.
If the above claim would be correct, it would mean that the same JOIN FETCH query would need distinct if executed with getResultList, but would not need it when executed with getSingleResult.
I need someone to explain me if this is expected or if I misunderstood something.
Appendix
Results of the two queries:
Query ran with the getResultList method. I get two duplicated departments as expected (this was done just to test the behaviour of the query, getSingleResult should be used instead for this):
List<Department> resultList = entityManager.createQuery(query, Department.class)
.setParameter("uuid", department1.getUuid())
.getResultList();
assertThat(resultList).containsExactly(department1, department1); // passes
Query ran with the getSingleResult method. I would expect the same duplicated departments to be retrieved, and thus a NonUniqueResultException to be thrown. Instead, a single department is retrieved and everything works nice:
Department singleResult = entityManager.createQuery(query, Department.class)
.setParameter("uuid", department1.getUuid())
.getSingleResult();
assertThat(singleResult).isEqualTo(department1); // passes

Interesting question.
First of all let me point out that getSingleResult() was meant for queries that due to their nature always return a single result (meaning: mostly aggregate queries like SELECT SUM(e.id) FROM Entity e). A query that you think, based on some business domain-specific rule, should return a single result, does not really qualify.
That being said, the JPA Spec states that getSingleResult() should throw NonUniqueResultException when the query returns more than one result:
The NonUniqueResultException is thrown by the persistence provider when Query.getSingleResult or TypedQuery.getSingleResult is invoked and there is more than one result from the query. This exception will not cause the current transaction, if one is active, to be marked for rollback.
However, looking at the Hibernate implementation:
#Override
public R getSingleResult() {
try {
final List<R> list = list();
if ( list.size() == 0 ) {
throw new NoResultException( "No entity found for query" );
}
return uniqueElement( list );
}
catch ( HibernateException e ) {
if ( getProducer().getFactory().getSessionFactoryOptions().isJpaBootstrap() ) {
throw getExceptionConverter().convert( e );
}
else {
throw e;
}
}
}
public static <R> R uniqueElement(List<R> list) throws NonUniqueResultException {
int size = list.size();
if ( size == 0 ) {
return null;
}
R first = list.get( 0 );
for ( int i = 1; i < size; i++ ) {
if ( list.get( i ) != first ) {
throw new NonUniqueResultException( list.size() );
}
}
return first;
}
it turns out Hibernate's interpretation of 'more than one result' seems to be 'more than one unique result'.
In fact, I tested your scenario with all JPA providers, and it turns out that:
Hibernate does indeed return duplicates from getResultList(), but does not throw the exception due to the peculiar way getSingleResult() is implemented
EclipseLink is the only one that does not suffer from the duplicate result bug in getResultList() and consequently, getSingleResult() does not throw an exception, either (to me, this behaviour is only logical, but as it turns out, it is all a matter of interpretation)
OpenJPA and DataNucleus both return duplicate results from getResultList() and throw an exception from getSingleResult()
Tl;DR
I need someone to explain me if this is expected or if I misunderstood something.
It really boils down to how you interpret the specification

Related

Hibernate native SQL query - how to get distinct root entities with eagerly initialized one-to-many association

I have two entities Dept and Emp (real case changed and minimized). There is 1:n association between them, i.e. properties Dept.empList and Emp.dept exist with respective annotations. I want to get List<Dept> whose elements are distinct and have collection empList eagerly initialized, using native SQL query.
session.createSQLQuery("select {d.*}, {e.*} from dept d join emp e on d.id = e.dept_id")
.addEntity("d", Dept.class)
.addJoin("e", "d.empList")
//.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY)
.list();
This query returns List<Object[2]> with instances of Dept, Emp (in that order) in the array and field Dept.empList properly initialized. That's ok.
To get distinct Depts, I thought setting transformer DISTINCT_ROOT_ENTITY (uncommenting the line) would be enough. Unfortunately, the DistinctRootEntityResultTransformer is based on RootEntityResultTransformer which treats the last element in tuple as root entity (it's hardwired). Since the order of entities is determined by sequence of addEntity, addJoin calls, the transformer mistakenly treats Emp as root entity and returns list with all Emps of all Depts.
Is there any clean way how to make Hibernate recognized the Dept as root entity, even though it is not last in entity list?
Note 1: I tried to switch order to .addJoin("e", "d.empList").addEntity("d", Dept.class). Does not work since d.empList requires d defined. Fails on HibernateSystemException : Could not determine fetch owner somewhere in Hibernate internals (org.hibernate.loader.Loader).
Note 2: I tried to define order as .addEntity("e", Emp.class).addJoin("d", "e.dept"). This seemingly works but the association is actually filled only from the "many" side. Hence the collection Dept.empList is some uninitialized proxy until requested, which invokes explicit SQL query and thus does not utilize the join in my query.
Note 3: The custom transformer looking for hard-wired index works:
.setResultTransformer(new BasicTransformerAdapter() {
public Object transformTuple(Object[] tuple, String[] aliases) {
return tuple[0];
}
public List transformList(List list) {
return DistinctResultTransformer.INSTANCE.transformList( list );
}
})
though I'm hesitant to accept such easy task could have such complicated solution.
Hibernate version: 3.6.10 (I know - legacy project :-) though I looked into source code of latest version and seems the key points don't differ).

Finally found https://stackoverflow.com/a/17210746/653539 - make duplicate call of .addEntity to force root entity on the end of list:
.addEntity("d", Dept.class)
.addJoin("e", "d.empList")
.addEntity("d", Dept.class)
It's still workaround but cleaner than mine and - based on 36 upvotes - it seems as idiomatic solution.

Spring Data JPA - simulate a "create + join" query for an existing collection

Let's say I have a List of entities:
List<SomeEntity> myEntities = new ArrayList<>();
SomeEntity.java:
#Entity
#Table(name = "entity_table")
public class SomeEntity{
#Id
#GeneratedValue(strategy = GenerationType.AUTO)
private long id;
private int score;
public SomeEntity() {}
public SomeEntity(long id, int score) {
this.id = id;
this.score = score;
}
MyEntityRepository.java:
#Repository
public interface MyEntityRepository extends JpaRepository<SomeEntity, Long> {
List<SomeEntity> findAllByScoreGreaterThan(int Score);
}
So when I run:
myEntityRepository.findAllByScoreGreaterThan(10);
Then Hibernate will load all of the records in the table into memory for me.
There are millions of records, so I don't want that. Then, in order to intersect, I need to compare each record in the result set to my List.
In native MySQL, what I would have done in this situation is:
create a temporary table and insert into it the entities' ids from the List.
join this temporary table with the "entity_table", use the score filter and then only pull the entities that are relevant to me (the ones that were in the list in the first place).
This way I gain a big performance increase, avoid any OutOfMemoryErrors and have the machine of the database do most of the work.
Is there a way to achieve such an outcome with Spring Data JPA's query methods (with hibernate as the JPA provider)? I couldn't find in the documentation or in SO any such use case.

I understand you have a set of entity_table identifiers and you want to find each entity_table whose identifier is in that subset and whose score is greater than a given score.
So the obvious question is: how did you arrive to the initial subset of entity_tables and couldn't you just add the criteria of that query to your query that also checks for "score is greater than x"?
But if we ignore that, I think there's two possible solutions. If the list of some_entity identifiers is small (what exactly is "small" depends on your database), you could just use an IN clause and define your method as:
List<SomeEntity> findByScoreGreaterThanAndIdIn(int score, Set<Long) ids)
If the number of identifiers is too large to fit in an IN clause (or you're worried about the performance of using an IN clause) and you need to use a temporary table, the recipe would be:
Create an entity that maps to your temporary table. Create a Spring Data JPA repository for it:
class TempEntity {
#Id
private Long entityId;
}
interface TempEntityRepository extends JpaRepository<TempEntity,Long> { }
Use its save method to save all the entity identifiers into the temporary table. As long as you enable insert batching this should perform all right -- how to enable differs per database and JPA provider, but for Hibernate at the very least set the hibernate.jdbc.batch_size Hibernate property to a sufficiently large value. Also flush() and clear() your entityManager regularly or all your temp table entities will accumulate in the persistence context and you'll still run out of memory. Something along the lines of:
int count = 0;
for (SomeEntity someEntity : myEntities) {
tempEntityRepository.save(new TempEntity(someEntity.getId());
if (++count == 1000) {
entityManager.flush();
entityManager.clear();
}
}
Add a find method to your SomeEntityRepository that runs a native query that does the select on entity_table and joins to the temp table:
#Query("SELECT id, score FROM entity_table t INNER JOIN temp_table tt ON t.id = tt.id WHERE t.score > ?1", nativeQuery = true)
List<SomeEntity> findByScoreGreaterThan(int score);
Make sure you run both methods in the same transaction, so create a method in a #Service class that you annotate with #Transactional(Propagation.REQUIRES_NEW) that calls both repository methods in succession. Otherwise your temp table's contents will be gone by the time the SELECT query runs and you'll get zero results.
You might be able to avoid native queries by having your temp table entity have a #ManyToOne to SomeEntity since then you can join in JPQL; I'm just not sure if you'll be able to avoid actually loading the SomeEntitys to insert them in that case (or if creating a new SomeEntity with just an ID would work). But since you say you already have a list of SomeEntity that's perhaps not a problem.
I need something similar myself, so will amend my answer as I get a working version of this.

You can:
1) Make a paginated native query via JPA (remember to add an order clause to it) and process a fixed amount of records
2) Use a StatelessSession (see the documentation)

NamedEntityGraph - JPA / Hibernate throwing org.hibernate.loader.MultipleBagFetchException: cannot simultaneously fetch multiple bags

We have a project where we need to lazily load collections of an entity, but in some cases we need them loaded eagerly. We have added a #NamedEntityGraph annotation to our entity. In our repository methods we add a "javax.persistence.loadgraph" hint to eagerly load 4 of attributes defined in said annotation. When we invoke that query, Hibernate throws org.hibernate.loader.MultipleBagFetchException: cannot simultaneously fetch multiple bags.
Funnily, when I redefine all of those collection as eagerly fetched Hibernate does fetch them eagerly with no MultipleBagFetchException.
Here is the distilled code.
Entity:
#Entity
#NamedEntityGraph(name = "Post.Full", attributeNodes = {
#NamedAttributeNode("comments"),
#NamedAttributeNode("plusoners"),
#NamedAttributeNode("sharedWith")
}
)
public class Post {
#OneToMany(cascade = CascadeType.ALL, mappedBy = "postId")
private List<Comment> comments;
#ElementCollection
#CollectionTable(name="post_plusoners")
private List<PostRelatedPerson> plusoners;
#ElementCollection
#CollectionTable(name="post_shared_with")
private List<PostRelatedPerson> sharedWith;
}
Query method (all cramped together to make it postable):
#Override
public Page<Post> findFullPosts(Specification<Post> spec, Pageable pageable) {
CriteriaBuilder builder = entityManager.getCriteriaBuilder();
CriteriaQuery<Post> query = builder.createQuery(Post.class);
Root<Post> post = query.from(Post.class);
Predicate postsPredicate = spec.toPredicate(post, query, builder);
query.where(postsPredicate);
EntityGraph<?> entityGraph = entityManager.createEntityGraph("PlusPost.Full");
TypedQuery<GooglePlusFullPost> typedQuery = entityManager.createQuery(query);
typedQuery.setHint("javax.persistence.loadgraph", entityGraph);
query.setFirstResult(pageable.getOffset());
query.setMaxResults(pageable.getPageSize());
Long total = QueryUtils.executeCountQuery(getPostCountQuery(specification));
List<P> resultList = total > pageable.getOffset() ? query.getResultList() : Collections.<P>emptyList();
return new PageImpl<P>(resultList, pageable, total);
}
Any hints on why is this working with eager fetches on entity level, but not with dynamic entity graphs?

I'm betting the eager fetches you think were working, were actually working incorrectly.
When you eager fetch more than one "bag" (an unorder collection allowing duplicates), the sql used to perform the eager fetch (left outer join) will return multiple results for the joined associations as explained by this SO answer. So while hibernate does not throw the org.hibernate.loader.MultipleBagFetchException when you have more than one List eagerly fetched it would not return accurate results for the reason given above.
However, when you give the query the entity graph hint, hibernate will (rightly) complain. Hibernate developer, Emmanuel Bernard, addresses the reasons for this exception to be thrown:
eager fetching is not the problem per se, using multiple joins in one SQL query is. It's not limited to the static fetching strategy; it has never been supported (property), because it's conceptually not possible.
Emmanuel goes on to say in a different JIRA comment that,
most uses of "non-indexed" List or raw Collection are erroneous and should semantically be Sets.
So bottom line, in order to get the multiple eager fetching to work as you desire:
use a Set rather than a List
persist the List index using JPA 2's #OrderColumn annotation,
if all else fails, fallback to Hibernate specific fetch annotations (FetchMode.SELECT or FetchMode.SUBSELECT)
EDIT
related:
https://stackoverflow.com/a/17567590/225217
https://stackoverflow.com/a/24676806/225217

jpa criteria query duplicate values in fetched list

I'm observing what I think is an unexpected behaviour in JPA 2 when fetching a list attribute with a criteria query.
My query is as follows (an extract of it):
CriteriaBuilder b = em.getCriteriaBuilder();
CriteriaQuery<MainObject> c = b.createQuery(MainObject.class);
Root<MainObject> root = c.from(MainObject.class);
Join<MainObject, FirstFetch> firstFetch = (Join<MainObject, FirstFetch>) root.fetch(MainObject_.firstFetch);
firstFetch.fetch(FirstFetch_.secondFetch); //secondFetch is a list
c.select(root).distinct(true);
(So let's say I'm fetching a list as a property of the property of an object.)
The thing is when the query returns multiple results, secondFetch values are duplicated as many times as rows are returned. Each firstFetch should have just one secondFetch but has n instead.
The only particularity i see in this case is all MainObjects happen to have the same FirstFetch instance.
So my guess is the join is being crossed, which is normal, but then JPA fails to assign its secondFetch object to each one of the firstFetchs.
Mappings shouldn't be too special, the're more or less like this
#Entity
#Table(name="mainobject")
public class MainObject{
//...
private FirstFetch firstFetch;
#ManyToOne(fetch=FetchType.LAZY)
#JoinColumn(name="mainObject_column")
public FirstFetch getFirstFetch() {
return firstFetch;
}
}
and
#Entity
#Table(name="firstFetch")
public class FirstFetch{
//...
private List<SecondFetch> secondFetch;
#OneToMany(mappedBy="secondFetch")
public List<SecondFetch> getSecondFetch() {
return secondFetch;
}
}
& finally
#Entity
#Table(name="secondFetch")
public class SecondFetch {
//....
private FirstFetch firstFetch; //bidirectional
#ManyToOne
#JoinColumn(name="column")
public FirstFetch getFirstFetch() {
return firstFetch;
}
}
I've been looking for some sort of distinct sentence to apply to the fetch but there's none (would have been a 'patch' anyway...)
If i change
List<SecondFetch>
for
Set<SecondFetch>
i'll get the expected result thanks to Sets' Keys, so I do feel this is kind of a misbehaviour in JPA's lists.
I'm not an expert, though, so i could perfectlly be making some mistake in the mappings or query.
Any feeback is very welcome to help clear this out.
Thanks.

I had the exact same problem though I was using JPA criteria API to do the query.
After some research I found a solution which you already mentioned (but was not available, since your not using criteria API): Using distinct.
With JPA criteria it would look like this:
CriteriaQuery<FirstFetch> query = cb.createQuery(FirstFetch.class);
Root<AbschnittC> root = query.from(FirstFetch.class);
root.fetch(FirstFetch_.secondFetch, JoinType.LEFT);
query.distinct(true);
Without using query.distinct(true); the resultset was multiplied with the amount of objects in the secondFetch list.
Hibernate does have something like DISTINCT_ROOT_ENTITY which sound more adequate than just setting a query distinct. But I have not further investigated this. I am also using Hibernate as the JPA provider. Maybe setting the query distinct in JPA ends up using the same code as Hibernates DISTINCT_ROOT_ENTITY would?

JPA eager fetch does not join

What exactly does JPA's fetch strategy control? I can't detect any difference between eager and lazy. In both cases JPA/Hibernate does not automatically join many-to-one relationships.
Example: Person has a single address. An address can belong to many people. The JPA annotated entity classes look like:
#Entity
public class Person {
#Id
public Integer id;
public String name;
#ManyToOne(fetch=FetchType.LAZY or EAGER)
public Address address;
}
#Entity
public class Address {
#Id
public Integer id;
public String name;
}
If I use the JPA query:
select p from Person p where ...
JPA/Hibernate generates one SQL query to select from Person table, and then a distinct address query for each person:
select ... from Person where ...
select ... from Address where id=1
select ... from Address where id=2
select ... from Address where id=3
This is very bad for large result sets. If there are 1000 people it generates 1001 queries (1 from Person and 1000 distinct from Address). I know this because I'm looking at MySQL's query log. It was my understanding that setting address's fetch type to eager will cause JPA/Hibernate to automatically query with a join. However, regardless of the fetch type, it still generates distinct queries for relationships.
Only when I explicitly tell it to join does it actually join:
select p, a from Person p left join p.address a where ...
Am I missing something here? I now have to hand code every query so that it left joins the many-to-one relationships. I'm using Hibernate's JPA implementation with MySQL.
Edit: It appears (see Hibernate FAQ here and here) that FetchType does not impact JPA queries. So in my case I have explicitly tell it to join.

JPA doesn't provide any specification on mapping annotations to select fetch strategy. In general, related entities can be fetched in any one of the ways given below
SELECT => one query for root entities + one query for related mapped entity/collection of each root entity = (n+1) queries
SUBSELECT => one query for root entities + second query for related mapped entity/collection of all root entities retrieved in first query = 2 queries
JOIN => one query to fetch both root entities and all of their mapped entity/collection = 1 query
So SELECT and JOIN are two extremes and SUBSELECT falls in between. One can choose suitable strategy based on her/his domain model.
By default SELECT is used by both JPA/EclipseLink and Hibernate. This can be overridden by using:
#Fetch(FetchMode.JOIN)
#Fetch(FetchMode.SUBSELECT)
in Hibernate. It also allows to set SELECT mode explicitly using #Fetch(FetchMode.SELECT) which can be tuned by using batch size e.g. #BatchSize(size=10).
Corresponding annotations in EclipseLink are:
#JoinFetch
#BatchFetch

"mxc" is right. fetchType just specifies when the relation should be resolved.
To optimize eager loading by using an outer join you have to add
#Fetch(FetchMode.JOIN)
to your field. This is a hibernate specific annotation.

The fetchType attribute controls whether the annotated field is fetched immediately when the primary entity is fetched. It does not necessarily dictate how the fetch statement is constructed, the actual sql implementation depends on the provider you are using toplink/hibernate etc.
If you set fetchType=EAGER This means that the annotated field is populated with its values at the same time as the other fields in the entity. So if you open an entitymanager retrieve your person objects and then close the entitymanager, subsequently doing a person.address will not result in a lazy load exception being thrown.
If you set fetchType=LAZY the field is only populated when it is accessed. If you have closed the entitymanager by then a lazy load exception will be thrown if you do a person.address. To load the field you need to put the entity back into an entitymangers context with em.merge(), then do the field access and then close the entitymanager.
You might want lazy loading when constructing a customer class with a collection for customer orders. If you retrieved every order for a customer when you wanted to get a customer list this may be a expensive database operation when you only looking for customer name and contact details. Best to leave the db access till later.
For the second part of the question - how to get hibernate to generate optimised SQL?
Hibernate should allow you to provide hints as to how to construct the most efficient query but I suspect there is something wrong with your table construction. Is the relationship established in the tables? Hibernate may have decided that a simple query will be quicker than a join especially if indexes etc are missing.

Try with:
select p from Person p left join FETCH p.address a where...
It works for me in a similar with JPA2/EclipseLink, but it seems this feature is present in JPA1 too:

If you use EclipseLink instead of Hibernate you can optimize your queries by "query hints". See this article from the Eclipse Wiki: EclipseLink/Examples/JPA/QueryOptimization.
There is a chapter about "Joined Reading".

to join you can do multiple things (using eclipselink)
in jpql you can do left join fetch
in named query you can specify query hint
in TypedQuery you can say something like
query.setHint("eclipselink.join-fetch", "e.projects.milestones");
there is also batch fetch hint
query.setHint("eclipselink.batch", "e.address");
see
http://java-persistence-performance.blogspot.com/2010/08/batch-fetching-optimizing-object-graph.html

I had exactly this problem with the exception that the Person class had a embedded key class.
My own solution was to join them in the query AND remove
#Fetch(FetchMode.JOIN)
My embedded id class:
#Embeddable
public class MessageRecipientId implements Serializable {
#ManyToOne(targetEntity = Message.class, fetch = FetchType.LAZY)
#JoinColumn(name="messageId")
private Message message;
private String governmentId;
public MessageRecipientId() {
}
public Message getMessage() {
return message;
}
public void setMessage(Message message) {
this.message = message;
}
public String getGovernmentId() {
return governmentId;
}
public void setGovernmentId(String governmentId) {
this.governmentId = governmentId;
}
public MessageRecipientId(Message message, GovernmentId governmentId) {
this.message = message;
this.governmentId = governmentId.getValue();
}
}

Two things occur to me.
First, are you sure you mean ManyToOne for address? That means multiple people will have the same address. If it's edited for one of them, it'll be edited for all of them. Is that your intent? 99% of the time addresses are "private" (in the sense that they belong to only one person).
Secondly, do you have any other eager relationships on the Person entity? If I recall correctly, Hibernate can only handle one eager relationship on an entity but that is possibly outdated information.
I say that because your understanding of how this should work is essentially correct from where I'm sitting.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.