I code a server side application with java run on linux server.
I use hibernate to open session to database, use native sql to query it and always close this session by try, catch, finally.
My server query DB using hibernate with very high frequency.
I already define MaxHeapSize for it is 3000M but it usually use 2.7GB on RAM, it can decrease but slower than increase. Sometime it grow up to 3.6GB memory usage, more than my MaxHeapSize define when start.
When memory used is 3.6GB, i try to dump it with -jmap command and got a heapdump with size of 1.3GB only.
Im using eclipse MAT to analyse it, here is the dominator tree from MAT
I think hibernate is the problem, i have so many org.apache.commons.collections.map.AbstractReferenceMap$ReferenceEntry like this. It maybe cant be dispose by garbage collection or can but slow.
How can i fix it?
You have 250k entries in your IN query list. Even a native query will put the database to its knees. Oracle limits the IN query listing to 1000 for performance reasons so you should do the same.
Giving it more RAM is not going to solve the problem, you need to limit your select/updates to batches of at most 1000 entries, by using pagination.
Streaming is an option as well, but, for such a large result set, keyset pagination is usually the best option.
If you can do all the processing in the database, then you will not have to move 250k records from the DB to the app. There's a very good reason why many RDBMS offer advanced procedural languages (e.g. PL/SQL, T-SQL).
Notice that even although the number of object within the queryPlanCache can be configured and limited, it is probably not normal having that much.
In our case we were writing queries in hql similar to this:
hql = String.format("from Entity where msisdn='%s'", msisdn);
This resulted in N different queries going to the queryPlanCache. When we changed this query to:
hql = "from Blacklist where msisnd = :msisdn";
...
query.setParameter("msisdn", msisdn);
the size of queryPlanCache was dramatically reduced from 100Mb to almost 0. This second query is translated into a one single preparedStament resulting just one object inside the cache.
Thank you Vlad Mihalcea with your link to Hibernate issue, this is bug on hibernate, it fix on version 3.6. I just update my hibernate version 3.3.2 to version 3.6.10, use default value of "hibernate.query.plan_cache_max_soft_references" (2048), "hibernate.query.plan_cache_max_strong_references" (128) and my problem is gone. No more high memory usage.
Related
I'm running Hibernate 4.1 with Javassist runtime instrumentation running on top of Hikari pool, on top of Oracle 12c. JDK is 1.7.
I have a query that runs pretty fast on the database and fetches about 500 entities in Hibernate. The query runtime, according to JProfiler is quite small, about 11 ms, but in total Query.list runs about 7 seconds.
I've tried removing all filters and it shows that most of the time is spent in Javaassist and other Hibernate-related reflection calls (like AbstractProxyHandler and such). I read that the reflection overhead should be pretty small, but it seems like it is not, and it seems like it is too much.
Could you please advise what could be the bottleneck here?
Make sure the object you are retrieving does not have sub-objects that are being fetched lazily as a SELECT instead of eagerly as a JOIN. This can result in a behavior known as SELECT N + 1, where Hibernate ends up running a query to get the 500 objects from their respective table, then an additional query for each object to get the child object. If you have 4 or 5 relationships that are being fetched as SELECT statements for each record, and you have 500 records, suddenly you're running around 2000 queries in order to get the List.
I would recommend turning on the SQL logging for Hibernate to see which queries it's running. If it dumps out a really long list of SELECT queries when you're fetching your list, look at your mapping file to see how your relationships are set up. Try adjusting them to be a fetch="join" and see if those queries go away and if the performance improves.
Here are some possibly related Stack Overflow questions that may be able to provide more specific details.
Hibernate FetchMode SELECT vs JOIN
What is N+1 SELECT query issue?
Something else to note about profilers and other tools of that nature. Often when tracking down a performance issue, a particular block of code will show up as where the most time is being spent. The common conclusion people tend to jump to is that the block of code in question is slow. In your case, you seem to be observing Hibernate's reflective code as being where the time is spent. It is very important to consider that this code itself might not actually be slow, but it is where the time is being spent because it is being called frequently due to algorithm complexity. I have found in many problems, serialization or reflection appears to be slow, when the reality is that the code is used to communicate with an external resource, and that resource is being called 1000s of times when it should only be called a handful. Making 1000s of queries to fetch your list will result in your sampling showing that a lot of time is being spent in the code that processes those queries. Be careful not to mistake code that is called often due to a design/configuration issue for code that is slow. The problem very likely does not lay in hibernate's use of reflection, since reflection generally isn't slow on the order of seconds.
I'm running queries in parallel against a MySql database. Each query takes less than a second and another half a second to a second to fetch.
This is acceptable for me. But when I run 10 of these queries in parallel and then attempt another set in a different session everything slows down and a single query can take some 20 plus seconds.
My ORM is hibernate and I'm using C3P0 with <property name="hibernate.c3p0.max_size">20</property>. I'm sending the queries in parallel by using Java threads. But I don't think these are related because the slowdown also happens when I run queries in MySql Workbench. So I'm assuming something in my MySql config is missing, or the machine is not fast enough.
This is the query:
select
*
FROM
schema.table
where
site = 'sitename' and (description like '% family %' or title like '% family %')
limit 100 offset 0;
How can I make this go faster when facing let's say 100 concurrent queries?
I'm guessing that this is slow because the where clause is doing a full text search on the description and title columns; this will require the database to look through the entire field on every record, and that's never going to scale.
Each of those 10 concurrent queries must read the 1 million rows to fulfill the query. If you have a bottleneck anywhere in the system - disk i/o, memory, CPU - you may not hit that bottleneck with a single query, but you do hit it with 10 concurrent queries. You could use one of these tools to find out which bottleneck you're hitting.
Most of the time, those bottlenecks (CPU, memory, disk) are too expensive to upgrade - especially if you need to scale to 100 concurrent queries. So it's better to optimize the query/ORM approach.
I'd consider using Hibernate's built-in free text capability here - it requires some additional configuration, but works MUCH better when looking for arbitrary strings in a textual field.
I would like to ask the experts of what is the recommendation for fetching 3000-5000 records from oracle 11g database from Java application (using JDBC). Our standard is to always invoke a stored procedure.
I did some research and found that ref cursor makes multiple round trips to the database based on the JDBC fetch count property. (can somebody throw more light on this of the end to end flow of how data is stored in memory in oracle and JVM when processing ref cursors)
I was thinking collections are more efficient because the data is sent in one shot to the caller (Java) from oracle db (use bulk collect). With this approach we can avoid multiple network calls from Java to Oracle servers. is this a true assumption?
Appreciate your help!
This is a much bigger topic than anyone is willing to commit to in a posting. Here's a link that discusses how Oracle manages read consistency. That entire page is probably a good read to get some of idea of what's going on in the server. There's also an article here that discusses what happens using collections. How would you return the collection to a JDBC Client (not something I've ever tried)?
Essentially, there's a lot involved in performance, from how your database is configured to how your network is tuned to disk performance, to client performance, etc.
The short answer is you need to try things. Retrieving 3-5k records isn't a lot, and it depends on how big the record is, that your bringing back across the network. If they are 20 byte records, and your network (MTU?) size is 4k blocks, you can fit about 200 records in a block. At some point, you run into the law of diminishing returns.
I use stored procedures as a matter of habit, but you don't need to. It would depend on the complexity of the query (number of tables and the type of joins) and the ability for someone like a DBA to be able to go in and see what the query is doing.
Worrying about network trips is a little less critical, because there's only so much data you can stuff in a packet. There's going to be a number of network trips no matter what you use, it really depends on your use case to determine how critical it is to get that to a bare minimum.
I need one help from you guys regarding JDBC performance optimization. One of our pojo is using jdbc to connect to a oracle database and retrieve the records. Basically the records are email addresses basing upon which emails will be sent to the users. The problem here is the performance. This process happens every weekend and the records are very huge in number, around 100k.
The performance is very slow and it worries us a lot. Only 1000 records seem to be fetched from the database every 1 hour, which means that it will take 100 hours for this process to complete (which is very bad). Please help me on this.
The database server and the java process are in two different remote servers. We have used rs_email.setFetchSize(1000); hoping that it would make any difference but no change at all.
The same query executed on server takes 0.35 seconds to complete. Any quick suggestion would of great help to us.
Thanks,
Aamer.
First look at your queries. Analyze them. See if the SQL could be made more efficient (ie, ask the database for what you want, not for what you don't want -- makes a big difference). Also check to see if there are indexes on any fields in your where and join clauses. Indexes make a big difference. But it can't be just any indexes. They have to be good indexes (ie, that the fields that make up the index provide enough uniqueness for the database to retrieve things appropriately). Work with your DBA on this. Look for either high run time against the db or check for queries with high CPU usage (even if the queries run sub-second). These are the thing that can kill your database.
Also from a code perspective, check to see if you are opening and closing your connections or if you are re-using them. Can make a big difference too.
It would help to post your code, queries, table layouts, and any indexes you have.
Use log4jdbc to get the real sql for fetching single record. Then check speed and plan for that sql. You may need a proper index or even db defragmentation.
Not sure about the Oracle driver, but I do know that the MySQL driver supports two different results retrieval methods: "stream" and "wait until you've got it all".
The streaming method lets you start process the results the moment you've got the first row returned from the query, whereas the other method retrieves the entire resultset before you can start work on it. In cases where you deal with huge recordsets, this often leads to memory exceptions, or slow performance because java hit the "memory roof" and the garbage collector can't throw away "used" records like it can in the streaming mode.
The streaming mode doesn't let you navigate/scroll the resultset the way the "normal"/"wait until you've got it all" mode...
Anyway, not sure if this is of any help but it might be worth checking out.
My answer to your question, in summary is:
1. Check network
2. Check SQL
3. Check Java code.
It sounds very slow. First thing to check would be to see if you have a slow network. You can do this pretty quickly by just pinging the database server. Or run the database server on the same machine as your JVMM. If it is not the network, get an explain plan for your SQL and ensure you are not doing table scans when you don't need to be. If it is not the network or the SQL, then it's time to check your Java code. Are you doing anything like blocking when you shouldn't be?
I am writing a program that does a lot of writes to a Postgres database. In a typical scenario I would be writing say 100,000 rows to a table that's well normalized (three foreign integer keys, the combination of which is the primary key and the index of the table). I am using PreparedStatements and executeBatch(), yet I can only manage to push in say 100k rows in about 70 seconds on my laptop, when the embedded database we're replacing (which has the same foreign key constraints and indices) does it in 10.
I am new at JDBC and I don't expect it to beat a custom embedded DB, but I was hoping it to be only 2-3x slower, not 7x. Anything obvious that I maybe missing? does the order of the writes matter? (i.e. say if it's not the order of the index?). Things to look at to squeeze out a bit more speed?
This is an issue that I have had to deal with often on my current project. For our application, insert speed is a critical bottleneck. However, we have discovered for the vast majority of database users, the select speed as their chief bottleneck so you will find that there are more resources dealing with that issue.
So here are a few solutions that we have come up with:
First, all solutions involve using the postgres COPY command. Using COPY to import data into postgres is by far the quickest method available. However, the JDBC driver by default does not currently support COPY accross the network socket. So, if you want to use it you will need to do one of two workarounds:
A JDBC driver patched to support COPY, such as this one.
If the data you are inserting and the database are on the same physical machine, you can write the data out to a file on the filesystem and then use the COPY command to import the data in bulk.
Other options for increasing speed are using JNI to hit the postgres api so you can talk over the unix socket, removing indexes and the pg_bulkload project. However, in the end if you don't implement COPY you will always find performance disappointing.
Check if your connection is set to autoCommit. If autoCommit is true, then if you have 100 items in the batch when you call executeBatch, it will issue 100 individual commits. That can be a lot slower than calling executingBatch() followed by a single explicit commit().
I would avoid the temptation to drop indexes or foreign keys during the insert. It puts the table in an unusable state while your load is running, since nobody can query the table while the indexes are gone. Plus, it seems harmless enough, but what do you do when you try to re-enable the constraint and it fails because something you didn't expect to happen has happened? An RDBMS has integrity constraints for a reason, and disabling them even "for a little while" is dangerous.
You can obviously try to change the size of your batch to find the best size for your configuration, but I doubt that you will gain a factor 3.
You could also try to tune your database structure. You might have better performances when using a single field as a primary key than using a composed PK. Depending on the level of integrity you need, you might save quite some time by deactivating integrity checks on your DB.
You might also change the database you are using. MySQL is supposed to be pretty good for high speed simple inserts ... and I know there is a fork of MySQL around that tries to cut functionalities to get very high performances on highly concurrent access.
Good luck !
try disabling indexes, and reenabling them after the insert. also, wrap the whole process in a transaction