I have some objects and I have two ways of storing them: using hibernate-search (lucene) with indexes, or as hibernate entities (postgres records).
I don't need full text search per se, just searching by the properties values (exact match) and maybe range.
Is there any performance comparison available for these two use cases?
The question is like comparing apple and pears. If you need persistent storage and the ability to query and retrieve entities you need to use hibernate core.
Hibernate Search is primarily an extension to Hibernate Core offering (full text) search. Yes is is possible to store the entity values in the index as well and retrieve them via projections. In this case, however, you are not dealing with entities anymore, but with object arrays. You are missing out on the life cycle gurantees Hibernate offers.
My guess would be that you want Hibernate Core (ORM) plus Criteria queries.
Related
In my current project we have multiple search pages in the system where we fetch a lot of data from the database to be shown in a large table element in the UI. We're using JPA for data access (our provider is Hibernate). The data for most of the pages is gathered from multiple database tables - around 10 in many cases - including some aggregate data from OneToMany relationships (e.g. "number of associated entities of type X"). In order to improve performance, we're using result set pagination with TypedQuery.setFirstResult() and TypedQuery.setMaxResults() to lazy-load additional rows from the database as the user scrolls the table. As the searches are very dynamic, we're using the JPA CriteriaQuery API to build the queries. However, we're currently somewhat suffering from the N+1 SELECT problem. It's pretty bad in some cases actually, as we might be iterating through 3 levels of nested OneToMany relationships, where on each level the data is lazy-loaded. We can't really declare those collections as eager loaded in the entity mappings, as we're only interested in them in some of our pages. I.e. we might fetch data from the same table in several different pages, but we're showing different data from the table and from different associated tables in different pages.
In order to alleviate this, we started experimenting with JPA entity graphs, and they seem to help a lot with the N+1 SELECT problem. However, when you use entity graphs, Hibernate apparently applies the pagination in-memory. I can somewhat understand why it does that, but this behavior negates a lot (if not all) of the benefits of the entity graphs in many cases. When we didn't use entity graphs, we could load data without applying any WHERE restrictions (i.e. considering the whole table as the result set), no matter how many millions of rows the table had, as only a very limited amount of rows were actually fetched due to the pagination. Now that the pagination is done in-memory, Hibernate basically fetches the whole database table (plus all relationships defined in the entity graph), and then applies the pagination in-memory, throwing the rest of the rows away. Not good.
So the question is, is there an efficient way to apply both pagination and entity graphs with JPA (Hibernate)? If JPA does not offer a solution to this, Hibernate-specific extensions are also acceptable. If that's not possible either, what are the other alternatives? Using database Views? Views would be a bit cumbersome, as we support several database vendors. Creating all of the necessary views for different vendors would increase development effort quite a bit.
Another idea I've had would be to apply both the entity graphs and pagination as we currently do, and simply not trigger any queries if they would return too many rows. I already need to do COUNT queries to get the lazy-loading of rows to work properly in the UI.
I'm not sure I fully understand your problem but we faced something similar: We have paged lists of entities that may contain data from multiple joined entities. Those lists might be sorted and filtered (some of those sorts/filters have to be applied in memory due missing capabilities in the dbms but that's just a side note) and the paging should be applied afterwards.
Keeping all that data in memory doesn't work well so we took the following approach (there might be better/more standard ones):
Use a query to load the primary keys (simple longs in our case) of the main entities. Join only what is needed for sorting and filtering to make the query as simple as possible.
In our case the query would actually load more data to apply sorts and filters in memory where necessary but that data is released asap and only the primary keys are kept.
When displaying a specific page we extract the corresponding primary keys for a page and use a second query to load everything that is to be displayed on that page. This second query might contain more joins and thus be more complex and slower than the one in step 1 but since we only load data for that page the actual burden on the system is quite low.
Recently I am working on a bilingual project and for some reference tables I need to sort the data. But because it is bilingual, the data is coming from different languages (in my case English and French) and I like to sort them all together, for example, Île comes before Inlet.
Ordinary Order By will put Île at the end of the list. I finally came up with using nativeQuery and sort the data using database engine's function (in oracle is about using NLS_SORT)
But I am tight with database engine and version, so for example if I change my database to postgres then the application will break. I was looking for native JPA solution (if exists) or any other solutions.
To archive this, without use native query JPA definition, I just can see two ways:
Create a DB view which includes escaped/translated columns based on DB functions. So, the DB differences will be on the create view sentence. You can define a OneToOne relation property to original entity.
Create extra column which stores the escaped values and sort by it. The application can perform the escape/translate before store data in DB using JPA Entity Listeners or in the persist/merge methods.
Good luck!
I am using Hibernate Search built on top of Lucene indexing. If indexes are created against database table the performance will be good in returning the results.
My question is, once indexes are created, if we query for the results does Hibernate Search fetch results from the original database table using the created indexes? or does it not need to hit the database to fetch the results?
Thanks!
Unless you use Projections the indexes are used only to identify the set of primary keys matching the query, these are then used to load the entities from the Database.
There are many good reasons for this:
As you pointed out, we don't store all data in the index: a larger index is a slower index
Adding all needed metadata to the index would make indexing a very expensive operation
Value extraction from the index is not efficient at all: it's good at queries, no more
Relational databases are very good at loading data by primary key
If you DB isn't good enough, second level cache is excellent to load by primary key
By loading from the DB we guarantee consistency especially with async indexing
By loading from the DB you have entities participate in Transactions and isolation
That said, if you don't need fully managed entities you can use Projections to load the fields you annotated as Stored.YES. A common pattern is to provide preview of matches using projections, and then when the user clicks for details to load the full entity matching that result.
By default, every time an object is inserted, updated or deleted through Hibernate, Hibernate Search updates the according Lucene index as per documentation
Hence, the further searches will yeild the data through lucene indexes only.
Another Question explaining how Indexes work
In an app using Wicket+Spring+JPA/Hibernate stack, I have an Inbox/Search page which should have quite complex search capabilities, where records saved in a database are filtered using a myriad of filtering options. So far I've used JPA Criteria API to build the database query but it's getting quite messy. I was wondering if Hibernate-Search would be a good fit for this even though I don't really need any full-text search capabilities, I just feel (from what I read about it) that producing the query might be a bit easier?
Sorry, but Hibernate Search is based on Lucence. It is not just an other query language.
Lucene does not serach for entities in your database, it search for attibutes in the Lucene index.
Hibernate Search add the functionality to connect the Entities from your Database to the Lucene Index.
Hibernate Search and Lucene are create tool when you need advanced full text search. But if you don't need it, it is only a lot of unnesseary work (and problems).
So, as long as you do not use Lucene, Hibernate Search does not fit your needs.
The primary use case for Hibernate Search is fulltext search. However, it can also be used to index/search simple attributes/criteria. Whether the syntax for writing the queries is simpler than a criteria query is a matter of taste.
If you are not using the fulltext search capabilities you have to consider that you are adding an additional step in your application. The search query will be run against the Lucene index which will return entity ids (unless projection is used). The matching entities will then be fetched from the database.
On the other hand, once you use Hibernate Search it is easy to "improve" your search by adding some fulltext search capabilities to some of your criteria (if possible).
Whether or not you are using Search, I think the key is to write some sort of framework which programmatically builds your queries - Search or Criteria queries.
I was wondering about the following two options when one is not using SQL tables but ORM based DBs (Example - when you are using GAE)
Would the second option be less efficient?
Requirement:
There is an object. The object has a collection of similar items. I need to store this object. Example, say the object is a tree and it has a collection of leaves.
Option 1:
Traditional SQL type structure:
Table for the Tree (with TreeId as the
identifier for a row in the Table.)
Table for the Leaves (where each leaf
has a TreeId and to show the leaves
of a tree, I query all leaves where
the TreeId is the Id of the tree.)
Here, the Tree structure DOES NOT
have a field with leaves.
Option 2:
ORM / GAE Tables:
Using the same example above,
I have an object for Tree where the object has a collection (Set/List in Java/C++) of leaves.
I store and retrieve the Tree together with the leaves (as the leaves are implemented as a Set in the Tree object)
My question is, will the second one be less efficient that the first option?
If so, why? Are there other alternatives?
Thank you!
It would be better to use Hibernate(for java) or other ORM framework than ORM db.
1. orm db's are mostly amateur.
2. no one appreciates it. You will be much better specialist if you know PostgreSQL with orm framework than just some orm db.
3. there are many standards in the world of rdbms. no standards in orm dbs.
4. rdbms support and community make this choice safer in long term.
5. effeciency is a tricky question. almost 80% sure that if you want to find row with "name = 'Alex'" it will be faster in rdbms than in orm db, cuz orm db will need to unpack object for this operation.
PS: i understand, my post is almost offtopic, but i think it contains some good stuff to think about.