In my current project we have multiple search pages in the system where we fetch a lot of data from the database to be shown in a large table element in the UI. We're using JPA for data access (our provider is Hibernate). The data for most of the pages is gathered from multiple database tables - around 10 in many cases - including some aggregate data from OneToMany relationships (e.g. "number of associated entities of type X"). In order to improve performance, we're using result set pagination with TypedQuery.setFirstResult() and TypedQuery.setMaxResults() to lazy-load additional rows from the database as the user scrolls the table. As the searches are very dynamic, we're using the JPA CriteriaQuery API to build the queries. However, we're currently somewhat suffering from the N+1 SELECT problem. It's pretty bad in some cases actually, as we might be iterating through 3 levels of nested OneToMany relationships, where on each level the data is lazy-loaded. We can't really declare those collections as eager loaded in the entity mappings, as we're only interested in them in some of our pages. I.e. we might fetch data from the same table in several different pages, but we're showing different data from the table and from different associated tables in different pages.
In order to alleviate this, we started experimenting with JPA entity graphs, and they seem to help a lot with the N+1 SELECT problem. However, when you use entity graphs, Hibernate apparently applies the pagination in-memory. I can somewhat understand why it does that, but this behavior negates a lot (if not all) of the benefits of the entity graphs in many cases. When we didn't use entity graphs, we could load data without applying any WHERE restrictions (i.e. considering the whole table as the result set), no matter how many millions of rows the table had, as only a very limited amount of rows were actually fetched due to the pagination. Now that the pagination is done in-memory, Hibernate basically fetches the whole database table (plus all relationships defined in the entity graph), and then applies the pagination in-memory, throwing the rest of the rows away. Not good.
So the question is, is there an efficient way to apply both pagination and entity graphs with JPA (Hibernate)? If JPA does not offer a solution to this, Hibernate-specific extensions are also acceptable. If that's not possible either, what are the other alternatives? Using database Views? Views would be a bit cumbersome, as we support several database vendors. Creating all of the necessary views for different vendors would increase development effort quite a bit.
Another idea I've had would be to apply both the entity graphs and pagination as we currently do, and simply not trigger any queries if they would return too many rows. I already need to do COUNT queries to get the lazy-loading of rows to work properly in the UI.
I'm not sure I fully understand your problem but we faced something similar: We have paged lists of entities that may contain data from multiple joined entities. Those lists might be sorted and filtered (some of those sorts/filters have to be applied in memory due missing capabilities in the dbms but that's just a side note) and the paging should be applied afterwards.
Keeping all that data in memory doesn't work well so we took the following approach (there might be better/more standard ones):
Use a query to load the primary keys (simple longs in our case) of the main entities. Join only what is needed for sorting and filtering to make the query as simple as possible.
In our case the query would actually load more data to apply sorts and filters in memory where necessary but that data is released asap and only the primary keys are kept.
When displaying a specific page we extract the corresponding primary keys for a page and use a second query to load everything that is to be displayed on that page. This second query might contain more joins and thus be more complex and slower than the one in step 1 but since we only load data for that page the actual burden on the system is quite low.
Related
I have a SAAS product, which is build by Spring MVC and Hibernate. Generally SAAS products allow user's to customize the product like adding extra fields to the table. So i want to give the flexibility to users, to create custom fields in the tables for themselves. Please provide all the viable solutions to achieve it. Thank you so much for your help.
I'm guessing your trying to back this to a Relational database. The primary problem is that relational databases store things in tables, and tables don't really handle free form data well.
So one solution is to use a document structure that is flexible, like XML (and perhaps ditch the database) but databases have features which are nice, so let's also consider the database-using approaches.
You could create a "custom field" table which would have columns (composite primary key) for
ExtendedTable
ColumnName
but you'd also have to store the data somewhere
(ExtendedKey)
DataItem
And now we get into the really nasty bits. How would you apply constraints to this data? I mean, what would the type be of a DataItem? A general solution would be quite complex (being a type of free form database). Hopefully you could limit the solution to solve only the problems you require solved.
Another approach is to use a single "extra" column that contains an XML record which embeds it's own "column and value" extensions, but if you wanted to display a table of the efficiently, you'd have to parse out every XML document in every field, which is not ideal.
Neither one of these approaches will work well with the existing SQL query language, so you'll then start building your own query language.
I suggest you go back and look at real data requirements, instead of sweeping them under the table with a "and anything else one might want" set of columns on your table.
Your requirement is best suited use case for NoSQL databases (like MongoDB).
Dynamically creating relational database tables & columns (modifying schemas) upon user requests in an application is not a best practice as these involve DDL operations, which are very powerful and in case if you don't handle them carefully, the whole application's database goes to the inconsistent state.
We have a legacy database where a single top level table has many relationships and sub-relationships. We usually don't need all or most of them and we set them to lazy load by default, and then use joins in HQL to pre-fetch the ones we're going to need in a particular part of the code.
We've got a module where we need quite a few of these. We don't want to get into N+1, but we've hit a massive performance snafu with this approach where one record has almost 4000 children, and they in turn have varying numbers of children. We have tried lazy-loading as many as we can without getting into N+1, but it appears that the cross-product that the join is producing is just unrealistically large.
Is there a better way to approach this problem? It seems like what is needed is a way to break this joined query into multiple queries, and then piece the hibernate models together in their relationships as a second step. Like if there was a way to do the HQL to load tables A, B, and C, but then make the load of C's sub-detail a second step that hibernate applies to the hierarchy by key.
We are migrating a whole application originally developed in Oracle Forms a few years back, to a Java (7) web based application with Hibernate (4.2.7.Final) and Hibernate Search (4.1.1.Final).
One of the requirements is: as users are using the new migrated version, they able to use the Oracle Forms version - so Hibernate Search indexes will be out of sync. Is it feasable to implement a servlet so that some PL-SQL accesses some link that updates the local indexes in the application server (AS)?
I thought of implementing a some sort clustering mechanism for hibernate, but as I read through the documentation I realised that as clustering may be a good option for scalabillity and performance, for maintaining legacy data in sync may be a bit overkill.
Does anyone have any idea of how to implement a service, accessible via servlet, to update local AS indexes in a given model entity with a given ID?
I don't know what exactly you mean by the clustering part, but anyways:
It seems like you are facing a similar problem like me. I am currently in the works of creating a Hibernate-Search adaption for JPA providers (that are not Hibernate-ORM, meaning EclipseLink, TopLink, etc.) and I am working on an automatic reindexing feature at the moment. Since JPA doesn't have a event system suitable for reindexation with Hibernate-Search I came up with the idea to use triggers on a database level to keep track of everything.
For a basic OneToOne relationship it's pretty straight forward and for other things like relation-tables or anything that is not stored in the main table of an entity it gets a bit trickier, but once you got a system for OneToOne relationships it's not that hard to get to that next step. Okay, Let's start:
Imagine two Entities: Place and Sorcerer in the Lord of the rings universe. In order to keep things simple let's just say they are in a (quite restrictive :D) 1:1 relationship with each other. Normally you end up with 2 tables named SORCERER and PLACE.
Now you have to create 3 triggers (one for CREATE, one for DELETE and one for UPDATE) on each Table (SORCERER and PLACE) that store information about what entity (only the id, for mapping tables there are always multiple ids) has changed and how (CREATE, UPDATE, DELETE) into special UPDATE tables. Let's call these PLACE_UPDATES and SORCERER_UPDATES.
In addition to the ID of the original Object that has changed and the event-type these will need an ID field that is needed to be UNIQUE among all UPDATE tables. This is needed because if you want to feed information from the Update tables to the Hibernate-Search index you have to make sure the events are in the right order or you will break your index. How such an UNIQUE ID can be created on your database should be easy to find on the internet/stackoverflow.
Okay. Now that you have set up the triggers correctly you will just have to find a way to access all the UPDATES tables in a feasible fashion (I do this via querying from multiple tables at once and sorting each query by our UNIQUE id field and then just comparing the first result of each query with the others) and then update my index.
This can be a bit tricky and you have to find the correct ways of dealing with the specific update event but it can be done (that's what I am currently working on).
If you're interested in that part, you can find it here:
https://github.com/Hotware/Hibernate-Search-JPA/blob/master/hibernate-search-db/src/main/java/com/github/hotware/hsearch/db/events/IndexUpdater.java
The link to the whole project is:
https://github.com/Hotware/Hibernate-Search-JPA/
This uses Hibernate-Search 5.0.0.
I hope this was of help (at least a little bit).
And about your remote indexing problem:
The update tables can easily be used as some kind of dump for events until you send them to the remote machine that is to be updated.
Facts
Database: PostgreSQL (latest)
Programming language: Java
Problem statement (simplified)
We have 2 tables - overview and details. There could be millions of rows in "overview" and each row of "overview" can have millions of rows associated with it in "details". The foreign key details.overview_id refers to overview.id. Most queries are of the general formSELECT * FROM details WHERE overview_id = xxx AND details.id > yyy AND details.id < zzz; If we have a single table for details, the queries will be too slow (although the queries on details are almost always on primary keys). More on the nature of DB activities: INSERT and UPDATE on overview happens infrequently. INSERT on details happen at a rapid pace, while UPDATE on the same table almost never happens and bulk DELETE happens sometimes.
What we already have
In the past we used raw SQL to partition the table "details" against each row in "overview". (In practice, we did not actually partition, instead we created new tables based on a template. These tables did not have any column called overview_id (saving storage space), instead we had a separate table that did the mapping between overview.id and the table-name of the specific partition table.) So, as you can understand, the partitions had to be generated on the fly as new rows were inserted in overview and partitions were dropped as rows were deleted from overview. All of this was managed inside the application. The application-database interaction has been blazing fast, but the application code is fairly complex, implying it is hard to maintain. Also, with raw SQL lying around everywhere, it is hard to scale the DB horizontally - we have to reinvent what most JPA providers have already done.
Current goal
Currently we are exploring options for a mechanism by which this partitioning can happen behind the scene - possibly by a JPA provider (I understand that this is not part of the JPA spec), so that we can focus on the application while the underlying framework/layer takes care of the scalability issues.
I looked at openJPA Slice and EclipseLink. Both of them provide partition (shard) management across hosts. We certainly need that. But we also need partition management within a single host. However, if there is a better or more elegant solution to this or if there is a totally different angle to look at this, I will be really glad to know about that.
I will appreciate any insight you can provide.
Thanks.
Prajesh
Have you looked into using Postgres's table partitioning?
http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html
Thank you all for your comments/answers till date. We decided to stick to what we already have (see the section named "what we already have"), with minor modifications.
I currently have hibernate set up in my project. It works well for most things. However today I needed to have a query return a couple hundred thousand rows from a table. It was ~2/3s of the total rows in the table. The problem is the query is taking ~7 minutes. Using straight JDBC and executing what I assumed was an identical query, it takes < 20 seconds. Because of this I assume I am doing something completely wrong. I'll list some code below.
DetachedCriteria criteria =DetachedCriteria.forlass(MyObject.class);
criteria.add(Restrictions.eq("booleanFlag", false));
List<MyObject> list = getHibernateTemplate().findByCriteria(criteria);
Any ideas on why it would be slow and/or what I could do to change it?
You have probably answered your own question already, use straight JDBC.
Hibernate is creating at best an instance of some Object for every row, or worse, multiple Object instances for each row. Hibernate has some really degenerate code generation and instantiation behavior that can be difficult to control, especially with large data sets, and even worse if you have any of the caching options enabled.
Hibernate is not suited for large results sets, and processing hundreds of thousands of rows as objects isn't very performance oriented either.
Raw JDBC is just that raw types for rows columns. Orders of magnitudes of less data.
I'm not sure hibernate is the right thing to use if you need to pull hundreds of thousands of records. The query execute time might be under 20 seconds but the fetch time will be huge and consume a lot of memory. After you get all those records, how do you output them? It's far more data than you could display to a user. Hibernate isn't really a good solution for doing data wharehouse style data crunching.
Probably you have several references to other classes in your MyObject class and in your mapping you set eager loading or something like that. It's very hard to find the issue using the code you wrote because it's OK.
Probably it will be better for you to use Hibernate Profiler - http://hibernateprofiler.com/ . It will show you all the problems with your mappings, configurations and queries.