Hibernate Search IndexedEmbedded Out Of Memory

Hibernate Search IndexedEmbedded Out Of Memory - java

I have two entities (entityA and entityB), one which contain another with #IndexedEmbedded, because when I search, I want to query by entityB as well, and return entityA.
The relationship (simplified) is as such:
public class EntityA {
private String name;
#OneToMany
#IndexedEmbedded
private List<EntityB> children;
}
public class EntityB {
#ManyToOne
#ContainedIn
private EntityA parent;
private String childName;
}
I am having issues now because relationally, the "children" in entityA can have up to 100k items. This causes OutOfMemory issue no matter if I am using FullTextSession.index or the MassIndexer.
I could actually remove the #OneToMany mapping in EntityA because when I want to access EntityB, I will usually do a query with some filtering and pagination, but if I remove the #OneToMany, then Hibernate Search will not index my EntityB.
Is there anyway to get the FullTextSession.index to perform indexing based on batch on the "children"?

I feel you should remove that relationship like one to many and many to one and write a join query to fetch the results of desired size, fetching 100k rows kills the mem.
Thnx,
Subhash

Depending on the queries you want to effectively execute, you might be able to reverse the indexing.
Instead of parking EntityA as #Indexed, you mark EntityB as #Indexed and mark its parent association with #IndexedEmbedded.
That way, you won't suffer from having to load the 100k elements.
On the query side, you will need to reverse your query as well to something like childName:Emmanuel AND parent.name:Hardy
What you will receive is EntityB instances but you can reach EntityA by simple navigation, or by using a ResultTransformer.
Note that if your link between EntityB and EntityA is lazy, you can ask Hibernate Search to adjust its fetching strategy:
Criteria criteria =
s.createCriteria(EntityB.class).setFetchMode("parent", FetchMode.JOIN);
s.createFullTextQuery(luceneQuery).setCriteriaQuery(criteria);

Related

How to avoid creating linking table using #ElementCollection?

I have two entities - Person and Car. Instead of mapping them with #OneToMany relationship, I decided to use #ElementCollection (storing car ids only), so that I could have less load on my DB, additionaly fetching cars whenever I fetch Person.
But Hibernate keeps creating new linking table like 'person_car_ids'. So my question is - how do I map Person id in 'car' table?
#Entity
public class Person implements Serializable {
#Id
private String id;
#ElementCollection
private Set<Long> carIds = new HashSet<>();

From the JPA Wiki:
An ElementCollection can be used to define a one-to-many relationship to an Embeddable object, or a Basic value (such as a collection of Strings). [...] The ElementCollection values are always stored in a separate table.
It doesn't say anywhere that it won't use a different table to store your basic values, just makes it more convenient without having to use a wrapper.
To your question:
I decided to use #ElementCollection (storing car ids only), so that I could have less load on my DB, additionaly fetching cars whenever I fetch Person.
Not sure I got it right, but if you want to fetch cars only when you need them instead of loading them when you fetch a person entity, you can use Lazy loading, which is the default when you use OneToMany annotation.

Should a large relationship be modeled in an entity in terms of readability and performance

I have an entity that has a relation to a very large table ( > 100k Rows ). Now I am asking myself if I should express that relation really in my channel entity because would never call getProducts directly to load all products into memory. I would rather go through a product repositiory and query for a very specific subset for that channel.
I have only put it there so it's more readable from a client perspective.
#Entity
#Table(name = "Channel")
public class Channel {
#OneToMany(fetch = FetchType.LAZY)
#JoinColumn(name = "product_id")
private List<Product> products;
public void setProducts(List<Product> products) {
this.products= products;
}
public List<Product> getProducts() {
return this.products;
}
}

Don't put that in a entity definition at all. You see what is often missed is that name #OneToMany is very misleading. As there is no possibility to limit the number of items you are loading. It can practically be used only when Many means Few :). The Hibernate Guru explains it better then me
Of course the join is executed lazily meaning that as long as it is not used it will do no harm. But why tempt faith. Writing query in a repo gives you ability for pagination and you can narrow the results to the one you will truly need.
The other side of the equation is more useful. So if you have on the child entity annotation #ManyToOne leave it there. It is good if you update parent from context of a child.

In short : no.
I think what you are asking may be opinion-based, but here is what I tend to do : remove all "non-navigable" relationships.
If you are using some generator to get JPA entities from a database model, the generator will create only bi-directional relationships. You need to remove all the relationships you will not used through the ORM mechanisms (for readability, maintainability and performance (in case someone would have the idea to remove the "lazy" indication or to call a getter on the collection in a transaction)).
For example, a User will be linked to a Language (many-to-one), but you don't want the (one-to-many) relationship between Language and User, even if at some point you will need to know in a report screen how many user speak which language.

What is the use of the Hibernate #LazyCollection annotation

I have 2 entities as Parent and Child as OneToMany relation as
#Entity
public class Parent {
#Id
#GeneratedValue(strategy = GenerationType.AUTO)
private Integer id;
private String name;
#OneToMany(mappedBy = "parent", fetch = FetchType.LAZY)
#IndexColumn(name = "index", base = 1)
#Cascade(org.hibernate.annotations.CascadeType.ALL)
#LazyCollection(LazyCollectionOption.EXTRA)
private List<Child> childs = new ArrayList<Child>();
// getter and setter
}
So here what is use of #LazyCollection(LazyCollectionOption.EXTRA) and when does it will come in picture, like for which operation with child list, it will be beneficial ?

EXTRA = .size() and .contains() won't initialize the whole collection
TRUE = initialize the whole collection on first access
FALSE = Eager-Loading

There's actually no reason to use #LazyCollection.
The TRUE and FALSE values are not needed since the same behavior can be obtained with the JPA FetchType.LAZY or FetchType.EAGER.
The EXTRA value has no equivalent in JPA and was designed for very large collections. When you access an EXTRA lazy collection for the first time, the collection is not entirely loaded, as it's usually the case with any JPA collection.
Instead, each element is fetched one by one, using a secondary SELECT. This might sound like an optimization, but it's not because EXTRA lazy collections are prone to N+1 query issues.
Note that this only works for ordered collections, either List(s) that are annotated with #OrderColumn or Map(s). For bags (e.g. regular List(s) of entities that do not preserve any certain ordering), the #LazyCollection(LazyCollectionOption.EXTRA) behaves just like any other LAZY collection (the collection is fetched entirely upon being accessed for the first time).
If you have a very large collection, then you should not map it at all. Instead, you should map only the #ManyToOne side, and, instead of a parent-side collection, you should use a paginated JPQL query.
JPQL queries are much easier to tune because you can apply any filtering criteria, and you can paginate the result set.

To give you a hint, it's mainly for performance reasons, you can start reading the following links:
Second Level Cache
Hibernate Documentation

Filter JPA Entities without removing them from database

i have a database table "viewmodule" with a FK to itself (parent_id) to allow recursive structures.
CREATE TABLE viewmodule (
id,
type,
parent_id,
hide);
My Java application uses JPA/Hibernate to map the entities on that table. We have fixed entity hirachy which is solved by a #Discriminator annotation that uses the "type" column of the table.
public class ViewModule implements Serializable {
private long id;
private String type;
private ViewModule parent;
private Boolean hide;
#OneToMany( targetEntity = ViewModule.class, cascade = javax.persistence.CascadeType.ALL, mappedBy = "parent" )
#Cascade( { org.hibernate.annotations.CascadeType.ALL,
org.hibernate.annotations.CascadeType.DELETE_ORPHAN } )
private Set<ViewModules> children;
(...)
}
My task is now to load all elements from this table (in deep) but left out the ones which have the field "hide" set to true.
Its a apparently simple filter mechanism. My first approach was to use the Hibernate Filter annotation, which works well on the first layer (all viewmodules with parent_id = null). But the filter does not work on the "children" relation. (In my real life model, i have an inheritance structure for the different types of the ViewModules)
Therefore i've written a small function that recursively walks through the viewModule object tree and removes the viewModules from the children relation that have hide=true;
But, as all objects are still under observation of the jpa/hibernate entityManager, every remove from a collection is directly executed as delete in the database. So my filter function removes the entity from the database, and that is a bad thing.
I tried to use the "evict" method from the hibernate session to detach the entities before filtering but that leads to a LazyInitialisationException.
So, to prevent cloning all of my object my question is how to solve this problem? Is there a way to detach the object in way that all collections are initialized? Or is there a special Kung-Fu Chuck-Norris JPA Annotation that can filter the collections?
Thanks in advance

use native query
em.createNativeQuery("select * from viewmodule where hide = false", ViewModule.class).getResultList();
This works: Filter list contained in entity returned by jpa/hibernate query

Make a new collection and add only the elements that have hide=false. You won't be able to distribute that collection together with the object, so you'd have to return it from a separate method call. For example: dao.getVisibleItems(module)
Another thing - you can remove the Cascade.DELETE (i.e. list all cascades except delete) and the orphan removal, if you don't need them.

How can I retrieve the foreign key from a JPA ManyToOne mapping without hitting the target table?

I have the following two annotated classes that I use to build a graph:
#Entity
#Table(name = "Edge")
public class Edge
{
/* some code omitted for brevity */
#ManyToOne
#JoinColumn(name = "ixNodeFrom", nullable = false)
private Node _nodFrom;
#ManyToOne
#JoinColumn(name = "ixNodeTo", nullable = false)
private Node _nodTo;
/* some code omitted for brevity */
}
#Entity
#Table(name = "Node")
public class Node
{
/* some code omitted for brevity */
#OneToMany(mappedBy = "_nodTo")
private Set<Edge> _rgInbound;
#OneToMany(mappedBy = "_nodFrom")
private Set<Edge> _rgOutbound;
/* some code omitted for brevity */
}
Now, when I build the graph, I issue two queries to fetch all rows from either table and set up the child / parent references, for which I need the ids stored in the Edge table.
Because I have defined the relation between the two tables in JPA, accessing the edge object to get the two nodes' ids triggers two SQL statements per edge, when the JPA provider lazily * loads the associated nodes. Since I already have the node objects, and the ids have already been loaded from the edge table, I want to skip those queries, as they take an awfully long time for larger graphs.
I tried adding these lines to the Edge class, but then my JPA provider wants me to make one mapping read-only, and I can't seem to find a way how to do that:
#Column(name = "ixNodeTo")
private long _ixNodeTo;
#Column(name = "ixNodeFrom")
private long _ixNodeFrom;
I'm using Eclipselink and MySQL, if it matters.
**The default behaviour for #ManyToOne actually is eager loading, see Pascal's answer*

I got three good answers that were equally helpful, and by now none percolated to the top by public vote, so I'm merging them together here for a single comprehensive answer:
a) Change the query
You can load the whole graph at once by changing the query, thereby giving the JPA provider a chance to realize that it already has everything in memory and doesn't need to go back to the DB:
List<Node> nodes = em.createQuery(
"SELECT DISTINCT n FROM Node n LEFT JOIN FETCH n._rgOutbound")
.getResultList();
(via axtavt)
b) Use read-only fields for the FKs
Loading the FKs into their own fields, as described in the question, will also work if, as the JPA provider is demanding, the fields are declared to be readonly, which is done like this:
#Column(name = "ixNodeTo", insertable = false, updatable = false)
(via bravocharlie)
c) Use property access
If you are using property access instead of field access, the JPA provider also gets a chance to realize it already has the FK and doesn't need to fetch the referenced object. In short, property access means that you put the JPA annotations on the getter, thereby "promising" the JPA provider that your getter won't go and access the rest of the object. More details in this question. This will work for Hibernate, and for Eclipselink, it will work (assumed in the original answer, experimentally confirmed by me) with weaving enabled. (via Pascal Thivent)
Additionally, as Pascal points out in his answer, #ManyToOne, contrary to my original post, is not lazy-loading, but eager-loading by default, and changing that will require weaving as well.

Have you tried
#Column(name = "ixNodeTo", insertable = false, updatable = false)

How can I retrieve the foreign key from a JPA ManyToOne mapping without hitting the target table?
In theory, a JPA provider should be able to not trigger a query when calling
someEdge.getNodeFrom().getId()
as it already has the id (as FK).
I'm 100% sure Hibernate can (assuming you're using property access). In the case of EclipseLink, I don't know (if it does, it will probably requires weaving).
Because I have defined the relation between the two tables in JPA, accessing the edge object to get the two nodes' ids triggers two SQL statements per edge, when the JPA provider lazily loads the associated nodes. Since I already have the node objects, and the ids have already been loaded from the edge table, I want to skip those queries, as they take an awfully long time for larger graphs.
Note that #ManyToOne uses an EAGER strategy by default. If you want to make it LAZY, you have to decalre it explicitly (but again, this will require weaving of your classes with EclipseLink).

I think you should try to optimize your query rather than change the mapping. For example, the following query fetches the whole graph at once (tested in Hibernate):
List<Node> nodes = em.createQuery(
"SELECT DISTINCT n FROM Node n LEFT JOIN FETCH n._rgOutbound")
.getResultList();

How about using getReference()?
For example:
Node fkNode = em.getReference(edge.getNodeFrom()); // [1]
fkNode.getId()
[1] This will not trigger a SQL query to retrieve the nodeFrom

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Hibernate Search IndexedEmbedded Out Of Memory - java

I feel you should remove that relationship like one to many and many to one and write a join query to fetch the results of desired size, fetching 100k rows kills the mem. Thnx, Subhash

Related

How to avoid creating linking table using #ElementCollection?

Should a large relationship be modeled in an entity in terms of readability and performance

What is the use of the Hibernate #LazyCollection annotation

Filter JPA Entities without removing them from database

How can I retrieve the foreign key from a JPA ManyToOne mapping without hitting the target table?

Categories

Resources