I'm running Hibernate 4.1 with Javassist runtime instrumentation running on top of Hikari pool, on top of Oracle 12c. JDK is 1.7.
I have a query that runs pretty fast on the database and fetches about 500 entities in Hibernate. The query runtime, according to JProfiler is quite small, about 11 ms, but in total Query.list runs about 7 seconds.
I've tried removing all filters and it shows that most of the time is spent in Javaassist and other Hibernate-related reflection calls (like AbstractProxyHandler and such). I read that the reflection overhead should be pretty small, but it seems like it is not, and it seems like it is too much.
Could you please advise what could be the bottleneck here?
Make sure the object you are retrieving does not have sub-objects that are being fetched lazily as a SELECT instead of eagerly as a JOIN. This can result in a behavior known as SELECT N + 1, where Hibernate ends up running a query to get the 500 objects from their respective table, then an additional query for each object to get the child object. If you have 4 or 5 relationships that are being fetched as SELECT statements for each record, and you have 500 records, suddenly you're running around 2000 queries in order to get the List.
I would recommend turning on the SQL logging for Hibernate to see which queries it's running. If it dumps out a really long list of SELECT queries when you're fetching your list, look at your mapping file to see how your relationships are set up. Try adjusting them to be a fetch="join" and see if those queries go away and if the performance improves.
Here are some possibly related Stack Overflow questions that may be able to provide more specific details.
Hibernate FetchMode SELECT vs JOIN
What is N+1 SELECT query issue?
Something else to note about profilers and other tools of that nature. Often when tracking down a performance issue, a particular block of code will show up as where the most time is being spent. The common conclusion people tend to jump to is that the block of code in question is slow. In your case, you seem to be observing Hibernate's reflective code as being where the time is spent. It is very important to consider that this code itself might not actually be slow, but it is where the time is being spent because it is being called frequently due to algorithm complexity. I have found in many problems, serialization or reflection appears to be slow, when the reality is that the code is used to communicate with an external resource, and that resource is being called 1000s of times when it should only be called a handful. Making 1000s of queries to fetch your list will result in your sampling showing that a lot of time is being spent in the code that processes those queries. Be careful not to mistake code that is called often due to a design/configuration issue for code that is slow. The problem very likely does not lay in hibernate's use of reflection, since reflection generally isn't slow on the order of seconds.
Related
I am having some doubts regarding an function that updates multiple entities, but it does one by one. Of course, there could be a latency problem if we are working with a remote DB, but aside from that, I am worry that we can get an OutOfMemoryException, because of the amount of entities we are updating in one single transaction. My code goes something like the one below.
EntityHeader entityHeader = entityHeaderService.findById(id);
for(EntityDetail entityDetail : entityHeader.getDetails()) {
for(Entity entity : entityDetail.getEntities()) {
entity.setState(true);
entityService.update(entity);
}
}
This is an example and we also have a similar case in another method, but with inserts instead. These methods can update or insert up to 2k or 3k entities in one transaction. So my question is, should we start using batch operations or the amount of entities is not big enough to worry about it? Also, would it performed better if done with a batch operation?
When optimizing things always ask yourself if it is worth the time, e.g. :
Are these method some batch that run nightly or are something that get called quite often?
Is the performance gain high enough or is negligible?
Anyway ~3k entities in one transaction doesn't sound bad, but there are benefits to jdbc batch even with those numbers (also it is quite easy to achieve).
Kinda hard to tell when you should worry about an OutOfMemoryException as it depends on how much memory you are giving to the jvm and how big are those entites you are updating; just to give you some number i personally had some memory trouble when i had to insert around between 10~100 thousand rows in the same transaction with 4Gb memory, i had to flush and empty hibernate cache every once in a while.
I have a query which has 2 'in' Clauses. First in clause takes around 125 values and second in clause of query takes around 21000 values. Its implemented using JPA CriteriaBuilder.
Query itself executes very fast and return results within seconds. Only problem is entityManager.createQuery(CriteriaQuery) takes around 12-13 minutes to return.
I search all over SO, all the threads are related to performance of Query.getResultList. None of them discuss about performance of entityManager.createQuery(CriteriaQuery). If you have seen such behavior earlier, please let me know, how to resolve it.
My JDK version is 1.7. Dependency version of javaee-api is 6.0. Application is deployed on JBOSS EAP 6.4. But that's not the concern as of now, as I am testing my code using junit using EntityManager connected to actual Oracle database. If you require more information, kindly let me know.
A hybrid approach is to dynamically create a query and then save it as a named query in the entity manager factory.
At that point it becomes just like any other named query that may have been declared statically in metadata. While this may seem like a good compromise, it turns out to be useful in only a few specific cases. The main advantage it offers is if there are queries that are not known until runtime, but then reissued repeatedly. Once the dynamic query becomes a named query it will only bear the cost of processing once.
It is implementation-specific whether that cost is paid when the query is registered as a named query, or deferred until the first time it is executed.
A dynamic query can be turned into a named query by using the
EntityManagerFactory addNamedQuery()
Keep us informed by the result and good luck
I observed that, having single query with 21 IN clauses (each with 1000 expressions) and all combined with OR clauses, made query run slower. I tried another approach of executing every IN Clause as a part of separate query. So these 21 individual queries performed better overall.
Another issue I observed was that Query with CriteriaBuilder was slow when result set is huge (something like 20K rows in result set). I solved this issue by adding query hint to my typed query:
TypedQuery.setHint("org.hibernate.fetchSize", 5000);
Hope it will help others.
Code in Hibernate is not expected to be used for binding lots of params:
for ( ImplicitParameterBinding implicitParameterBinding : parameterMetadata.implicitParameterBindings() ) {
implicitParameterBinding.bind( jpaqlQuery );
}
Unfortunately you need to find different approach if you want to do something similar.
In my app. I have Case and for each Case there can be 0 to 2 Claim. If a Case has 0 claims it runs pretty fast, 1 claims and it slows down, and 2 is awfully slow. Any idea how to make this faster? I didn't know if my case and claim were going back and forth causing an infinite recurison, so I added a JsonManagedReference and JsonBackReference, but that doesn't seem to help much with speeds. Any ideas? Here is my Case.java:
#Entity
public class Case {
#OneToMany(mappedBy="_case", fetch = FetchType.EAGER)
#Fetch(FetchMode.JOIN)
#JsonManagedReference(value = "case-claim")
public Set<Claim> claims;
}
In Claim.java:
#Entity
public class Claim implements Cloneable {
#ManyToOne(optional = true)
#JoinColumn(name = "CASE_ID")
#JsonBackReference(value = "case-claim")
private Case _case;
}
output of 0 claims:
https://gist.github.com/elmatt/2cafbe7ecb1fa0b7f6a8
output of 2 claims:
https://gist.github.com/elmatt/b000bc28909453effc95
Your problem has nothing to do with the relationship between Case and Claim.
FYI: 300ms is not "pretty fast." Your problem is that you expect hibernate to magically and quickly deliver a complex object hierarchy to you, with no particular effort on your part. I view ORM as "The Big Lie" - it is super easy to use and works great on toy problems, but tends to fail miserably when you try to scale to interesting applications (like yours).
Don't abandon hibernate, but realize that you are going to need to work harder than you thought you would in order to make it work for you.
I happen to work in a similar data domain (post-adjudication healthcare claim analysis and processing). You should be able to select this kind of data in well under 10ms per claim (with all associated dimensions) using MySQL on modest hardware from a table with >1 billion claims and the DB hosted on a separate server from the app.
How do you get from where you are to where you should be?
1. Minimize the number of round-trips to the database by minimizing the number of separate queries that are executed.
2. Hand-craft your important queries to grab just the rows and joins that you actually need.
3. Use explain plan on every query to make sure that it hits the tables in the right order and every step is appropriately supported by an index.
4. Consider partitioning your big tables and include the partition criteria in your queries to enable partition-pruning to focus the query on the proper data.
5. Be very hesitant to let hibernate manage your relationships between your entities. I generally do not let hibernate deal with any relationships.
A few years ago, I worked on a product that is an iPhone app where the user walks through workflows (e.g., a nurse taking a patient's vitals) and each screen made a round-trip to the app server to execute the workflow step and get the data for the next screen. Think about how little data you can work with on an iPhone screen. Yet the DB portion of the round-trip generally took 2-5 seconds to execute. Everyone there took it for granted, because "That is how long it has always taken." I dug into the code and found that each step was pulling in a significant portion of the database (and then was not used by the business logic).
The only time they tweaked the default hibernate behavior was when they got an exception due to too many joins (yes, MySQL has a limit of something like 67 tables in one query).
The approach of creating your Java data model and simply ORM'ing it into the database generally works just fine on configuration data and the like, but tends to perform terribly for complex data models involving your transactional data. This is what is biting you now.
Your problem is totally fixable, and can be attacked incrementally - you don't have to tear apart the whole application to start making things better.
Can you enable hibernate logging and provide the output. It should indicate the SQL queries being executed against your DB. Information about which DB you are using would also be useful. When you have those I would recommend profiling the queries to ensure your DB is setup appropriately. It sounds like an non indexed query.
Size of the datasets would be helpful in targeting possible issues as well - number of rows and so on.
I would also recommend timing the actual hibernate call (could be as crude as log statement immediately before / after) vs overall processing to identify whether it really is hibernate or some other processing. Without further information & context that is not clear here.
Now you've posted your queries we can see what is happening. It looks like the structure of your entities is more complex than the code snippet originally posted. There are references to Person, Activities, HealthPlan and others in there.
As others have commented your query is triggering a very large select of a lot of data due to the nature of your model.
I recommend creating Named Queries for claims, and then load those using the ID of Case.
You should also review your hibernate model and switch to FetchType.LAZY, other hibernate will create large queries such as the one you have posted. The catch here is that if you try to access a related entity outside of the transaction you will get a lazyinitializationexception. You will need to consider each use case and ensure you load the data you need. Two common mistakes with Hibernate is to use FetchType.EAGER everywhere or to initiate the transaction to early to avoid this. There is not one correct design approach, but I normally do the following
JSP -> Controller -> [TX BOUNDARY] Service -> DAO
You service method(s) should encapsulate the business logic you need to load the data you require, before passing it back to the controller.
Again, per the other answer, I think you're expecting too much of Hibernate. It is a powerful tool but you need to understand how it works to get the best from it.
I code a server side application with java run on linux server.
I use hibernate to open session to database, use native sql to query it and always close this session by try, catch, finally.
My server query DB using hibernate with very high frequency.
I already define MaxHeapSize for it is 3000M but it usually use 2.7GB on RAM, it can decrease but slower than increase. Sometime it grow up to 3.6GB memory usage, more than my MaxHeapSize define when start.
When memory used is 3.6GB, i try to dump it with -jmap command and got a heapdump with size of 1.3GB only.
Im using eclipse MAT to analyse it, here is the dominator tree from MAT
I think hibernate is the problem, i have so many org.apache.commons.collections.map.AbstractReferenceMap$ReferenceEntry like this. It maybe cant be dispose by garbage collection or can but slow.
How can i fix it?
You have 250k entries in your IN query list. Even a native query will put the database to its knees. Oracle limits the IN query listing to 1000 for performance reasons so you should do the same.
Giving it more RAM is not going to solve the problem, you need to limit your select/updates to batches of at most 1000 entries, by using pagination.
Streaming is an option as well, but, for such a large result set, keyset pagination is usually the best option.
If you can do all the processing in the database, then you will not have to move 250k records from the DB to the app. There's a very good reason why many RDBMS offer advanced procedural languages (e.g. PL/SQL, T-SQL).
Notice that even although the number of object within the queryPlanCache can be configured and limited, it is probably not normal having that much.
In our case we were writing queries in hql similar to this:
hql = String.format("from Entity where msisdn='%s'", msisdn);
This resulted in N different queries going to the queryPlanCache. When we changed this query to:
hql = "from Blacklist where msisnd = :msisdn";
...
query.setParameter("msisdn", msisdn);
the size of queryPlanCache was dramatically reduced from 100Mb to almost 0. This second query is translated into a one single preparedStament resulting just one object inside the cache.
Thank you Vlad Mihalcea with your link to Hibernate issue, this is bug on hibernate, it fix on version 3.6. I just update my hibernate version 3.3.2 to version 3.6.10, use default value of "hibernate.query.plan_cache_max_soft_references" (2048), "hibernate.query.plan_cache_max_strong_references" (128) and my problem is gone. No more high memory usage.
I currently have hibernate set up in my project. It works well for most things. However today I needed to have a query return a couple hundred thousand rows from a table. It was ~2/3s of the total rows in the table. The problem is the query is taking ~7 minutes. Using straight JDBC and executing what I assumed was an identical query, it takes < 20 seconds. Because of this I assume I am doing something completely wrong. I'll list some code below.
DetachedCriteria criteria =DetachedCriteria.forlass(MyObject.class);
criteria.add(Restrictions.eq("booleanFlag", false));
List<MyObject> list = getHibernateTemplate().findByCriteria(criteria);
Any ideas on why it would be slow and/or what I could do to change it?
You have probably answered your own question already, use straight JDBC.
Hibernate is creating at best an instance of some Object for every row, or worse, multiple Object instances for each row. Hibernate has some really degenerate code generation and instantiation behavior that can be difficult to control, especially with large data sets, and even worse if you have any of the caching options enabled.
Hibernate is not suited for large results sets, and processing hundreds of thousands of rows as objects isn't very performance oriented either.
Raw JDBC is just that raw types for rows columns. Orders of magnitudes of less data.
I'm not sure hibernate is the right thing to use if you need to pull hundreds of thousands of records. The query execute time might be under 20 seconds but the fetch time will be huge and consume a lot of memory. After you get all those records, how do you output them? It's far more data than you could display to a user. Hibernate isn't really a good solution for doing data wharehouse style data crunching.
Probably you have several references to other classes in your MyObject class and in your mapping you set eager loading or something like that. It's very hard to find the issue using the code you wrote because it's OK.
Probably it will be better for you to use Hibernate Profiler - http://hibernateprofiler.com/ . It will show you all the problems with your mappings, configurations and queries.