Related
I'm using Java persistence API to develop a standalone software. Recently I saw that the memory usage keep rising when I'm creating objects from entity classes, as well as JPAController classes. It seems that the objects stays at the memory since the memory allocation to the project won't decrease (Eg: 400mb ---> Create Object ---> 450mb ---> Stays at 450mb). Will this affect badly on performance? Should I call System.gc() method to remove these objects?
Generally System.gc() is not guarenteed to perform a garbage collection. Ultimately it is up to the JVM to decide. See the javadoc.
Have you observed what happens when you are approaching your memory limits of the JVM, does garbage collection happen then ? If not and you receive an OutOfMemoryError, you either are retaining something longer than you need to, or actually need extra heap allocated to your VM.
In anycase System.gc() I believe shouldn't be used to solve such problems.
In my opinion, the approach to the problem should be different. Actually the call to System.gc() is not a guarantee that it will free any memory at all; please see When does System.gc() do anything
If you can measure a problem in your memory allocation, either via jconsole, or making a post mortem analysis on the jvm dump, or whatever, then this is another problem. By gathering this information you will know what remains where in your memory regions, and then take actions in order to contain it.
The only way that this would negatively affect performance throughout the life of your program is if you want to keep these entities around forever but the size of your old generation in your heap is less than the 450MB you specified. Assuming that you are want to keep around between 1 and 2 times the 450MB you have you have specified forever, with the default ratios of the JVM, setting a parameter such as -Xmx2g will probably be fine. There are many more parameters to fine tune your performance much more than that, but that's probably all the complexity you're looking for for now. If you want to check out some more details on heap tuning and really get into performance, check out this doc on Garbage Collection Tuning by Oracle. Alternatively, something to eat lunch to is a great Youtube video on GC tuning by a guy named Gil Tene.
But calling System.gc() probably won't do anything useful.
I am working on an application that creates a lot of threads and relies heavily on String manipulation.
The application works for a good 24 hrs at a time and needs to be always very responsive.
I am trying to keep the creation of objects to a minimum. The application is doing well without any configuration at the moment.
But I was wondering for my own knowledge if there were any advantages (or disavantages) in using a specific JVM configuration?
Please bear with me, I am pretty new on on the subject of the JVM/GC configuration:
I was wondering if there were any JVM options I should absolutely use while working with multithreads?
Should I configure the heap?
Should I also configure the GC?
Should I keep the Garbage Collection to a minimum?
I started reading: http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
Any tips on the subject would greatly be appreciated.
Thanks in advance,
Generally, the best intial advice concerning tweaking your JVM is don't. Unless you are experiencing specific JVM-related problems with the default settings, leave them alone.
If you do need to fiddle around with the settings, I would recommend you set up a representative testcase and use an advanced profiler such as JProfiler.
Furthermore, you should really read the technical documentation regarding the HotSpot VM, specifically the Memory Management Whitepaper, all of which you may find here.
If it is working fine then you should not do anything.
If your application is CPU bound you should not create Lot of threads.
Reason is lot of time is wasted in context switching.
String manipulation if it in memory then there should be only those threads which are required
NCPU = UCPU* (1+W/C)
Where NCPU--> Number of CPU
UCPU--> Target CPU Utilization
W-->Wait time
C--> Compute time
So for CPU bound operations it should be max (Number of CPU +1) threads.
Also there are lot of test cases defined for concurrency applications in Java Concurrency in Practice. You may want to check those.
I was wondering if there were any JVM options I should absolutely use while working with multithreads?
All the best options will be on by default. If you look at HotSpot VM Options you can see quite a few are -XX:+ which means they are on by default.
Should I configure the heap?
Possibly. But I would leave the default setting if you can.
Should I also configure the GC?
Possibly. But I would leave the default setting if you can.
Should I keep the Garbage Collection to a minimum?
Reducing the amount of garbage created takes effort. It provides some benefit up to a point. You have to decide what is the best use of your time and how much time to spend reducing the amount of garbage created.
I would always start with a memory profiler and find where you are creating the most garbage. Start from the top of the list rather than trying to tune everything as this ensures you will get the most benefit for the least amount of effort.
BTW: I am an advocate of low garbage and off heap programs where it makes sense to do so. I have written trading systems which can run for a day without even a minor GC and programs which can load/use 500+ GB of data in off heap memory. However, you have to be able to demonstrate or quantify how much difference it will make to the end users or your business to determine whether it is really worth it.
I was wondering if there were any JVM options I should absolutely use while working with multithreads?
No.
Should I configure the heap?
No, apart from setting the heap size to something reasonable (with -Xmx and -Xms)
Should I also configure the GC?
No, unless you have a particular need for "low-pause". The default throughput compiler is the best option if you are currently meeting your "responsiveness" goals. If you are not meeting those goals then you should consider CMS or G1 ... but beware that they reduce pauses but they also reduce throughput.
Should I keep the Garbage Collection to a minimum?
No. That is not a sensible goal. Your aim is to maximize throughput, and minimizing GC won't necessarily achieve that. In a lot of case, it is more efficient to generate garbage than to to have the application do extra work to avoid generating garbage. (And as Peter Lawrey pointed out, you've also got the extra developer effort in writing and maintaining mode complex code.)
I would advise you to use a profiler to see if your application is spending a lot of time (CPU time or elapsed time) relative to doing other productive work. If not, or if the application is already running fast enough then don't fiddle with the JVM options.
If you are worried that your application won't cope with increased load in the future, then tweaking the GC doesn't scale. A better option is to investigate scaling up your hardware and/or figuring out how to do the work on multiple machines. In addition, tuning the GC to improve performance with current load may actually result in worse performance when the load increases. (Consider the problem that arises with CMS when it can't keep up and is forced to do a full stop-the-world collection to recover.)
Finally, it is generally speak a bad idea to have lots of threads. It is better to use a small number of worker threads (roughly equal to the number of processors / cores) and feed them work via concurrent queues, etcetera.
In the past, I have faced the similar server application: lots of String manipulation, String creation, and needs to be always very responsive. The app worked fine with default configuration, until run into high-stress situation. You need to enable -XX:+UseConcMarkSweepGC for low pause, and fine tune other parameters to ensure the app behavior the way that you want. Here is the short list:
-XX:+CMSParallelRemarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=nn
-XX:CMSWaitDuration=300000
-XX:GCTimeRatio=nn
-XX:+DisableExplicitGC
We have a j2ee application running on Jboss and we want to monitor its memory usage. Currently we use the following code
System.gc();
Runtime rt = Runtime.getRuntime();
long usedMB = (rt.totalMemory() - rt.freeMemory()) / 1024 / 1024;
logger.information(this, "memory usage" + usedMB);
This code works fine. That means it shows memory curve which corresponds to reality. When we create a big xml file from a DB a curve goes up, after the extraction is finished it goes down.
A consultant told us that calling gc() explicitly is wrong, "let jvm decide when to run gc". Basically his arguments were the same as disscussed here.
But I still don't understand:
how can I have my memory usage curve?
what is wrong with the explicit gc()? I don't care about small performance issues which can happen with explicit gc() and which I would estimate in 1-3%. What I need is memory and thread monitor which helps me in analysis of our system on customer site.
If you want to really look at what is going on in the VM memory you should use a good tool like VisualVM. This is Free Software and it's a great way to see what is going on.
Nothing is really "wrong" with explicit gc() calls. However, remember that when you call gc() you are "suggesting" that the garbage collector run. There is no guarantee that it will run at the exact time you run that command.
There are tools that let you monitor the VM's memory usage. The VM can expose memory statistics using JMX. You can also print GC statistics to see how the memory is performing over time.
Invoking System.gc() can harm the GC's performance because objects will be prematurely moved from the new to old generations, and weak references will be cleared prematurely. This can result in decreased memory efficiency, longer GC times, and decreased cache hits (for caches that use weak refs). I agree with your consultant: System.gc() is bad. I'd go as far as to disable it using the command line switch.
You can take a look at stagemonitor. It is a open source java (web) application performance monitor. It captures response time metrics, JVM metrics, request details (including a call stack captured by the request profiler) and more. The overhead is very low.
Optionally, you can use the great timeseries database graphite with it to store a long history of datapoints that you can look at with fancy dashboards.
Example:
Take a look at the project website to see screenshots, feature descriptions and documentation.
Note: I am the developer of stagemonitor
I would say that the consultant is right in the theory, and you are right in practice. As the saying goes:
In theory, theory and practice are the same. In practice, they are not.
The Java spec says that System.gc suggests to call garbage collection. In practice, it just spawns a thread and runs right away on the Sun JVM.
Although in theory you could be messing up some finely tuned JVM implementation of garbage collection, unless you are writing generic code intended to be deployed on any JVM out there, don't worry about it. If it works for you, do it.
Have you tried JMX?
http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html
(source: sun.com)
Peek into what is happening inside tomcat through Visual VM.
http://www.skill-guru.com/blog/2010/10/05/increasing-permgen-size-in-your-server/
Take a look at the JVM args: http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp#DebuggingOptions
XX:-PrintGC Print messages at garbage collection. Manageable.
-XX:-PrintGCDetails Print more details at garbage collection.
Manageable. (Introduced in 1.4.0.)
-XX:-PrintGCTimeStamps Print timestamps at garbage collection.
Manageable (Introduced in 1.4.0.)
-XX:-PrintTenuringDistribution Print tenuring age information.
While you're not going to upset the JVM with explicit calls to System.gc() they may not have the effect you are expecting. To really understand what's going on with the memory in a JVM with read anything and everything the Brian Goetz writes.
Explicitly running System.gc() on a production system is a terrible idea. If the memory gets to any size at all, the entire system can freeze while a full GC is running. On a multi-gigabyte-sized server, this can easily be very noticeable, depending on how the jvm is configured, and how much headroom it has, etc etc - I've seen pauses of more than 30 seconds.
Another issue is that by explicitly calling GC you're not actually monitoring how the JVM is running the GC, you're actually altering it - depending on how you've configured the JVM, it's going to garbage collect when appropriate, and usually incrementally (It doesn't just run a full GC when it runs out of memory). What you'll be printing out will be nothing like what the JVM will do on it's own - for one thing you'll probably see fewer automatic / incremental GC's as you'll be clearing the memory manually.
As Nick Holt's post points out, options to print GC activity already exist as JVM flags.
You could have a thread that just prints out free and available at reasonable intervals, this will show you actual mem useage.
If you like a nice way to do this from the command line use jstat:
http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jstat.html
It gives raw information at configurable intervals which is very useful for logging and graphing purposes.
If you use java 1.5, you can look at ManagementFactory.getMemoryMXBean() which give you
numbers on all kinds of memory. heap and non-heap, perm-gen.
A good example can be found there
http://www.freshblurbs.com/explaining-java-lang-outofmemoryerror-permgen-space
If you use the JMX provided history of GC runs you can use the same before/after numbers, you just dont have to force a GC.
You just need to keep in mind that those GC runs (typically one for old and one for new generation) are not on regular intervalls, so you need to extract the starttime as well for plotting (or you plot against a sequence number, for most practical purposes that would be enough for plotting).
For example on Oracle HotSpot VM with ParNewGC, there is a JMX MBean called java.lang:type=GarbageCollector,name=PS Scavenge, it has a attribute LastGCInfo, it returns a CompositeData of the last YG scavenger run. It is recorded with duration, absolute startTime and memoryUsageBefore and memoryUsageAfter.
Just use a timer to read that attribute. Whenever a new startTime shows up you know that it describes a new GC event, you extract the memory information and keep polling for the next update. (Not sure if a AttributeChangeNotification somehow can be used.)
Tip: in your timer you might measure the distance to the last GC run, and if that is too long for the resulution of your plotting, you could invoke System.gc() conditionally. But I would not do that in a OLTP instance.
As has been suggested, try VisualVM to get a basic view.
You can also use Eclipse MAT, to do a more detailed memory analysis.
It's ok to do a System.gc() as long as you dont depend on it, for the correctness of your program.
The problem with system.gc, is that the JVM already automatically allocates time to the garbage collector based on memory usage.
However, if you are, for instance, working in a very memory limited condition, like a mobile device, System.gc allows you to manually allocate more time towards this garbage collection, but at the cost of cpu time (but, as you said, you aren't that concerned about performance issues of gc).
Best practice would probably be to only use it where you might be doing large amounts of deallocation (like flushing a large array).
All considered, since you are simply concerned about memory usage, feel free to call gc, or, better yet, see if it makes much of a memory difference in your case, and then decide.
About System.gc()… I just read in Oracle's documentation the following sentence here
The performance effect of explicit garbage collections can be measured by disabling them using the flag -XX:+DisableExplicitGC, which causes the VM to ignore calls to System.gc().
If your VM vendor and version supports that flag you can run your code with and without it and compare Performance.
Also note the previous quoted sentence is preceded by this one:
This can force a major collection to be done when it may not be necessary (for example, when a minor collection would suffice), and so in general should be avoided.
JavaMelody might be a solution for your need.
Developed for Java EE applications, this tool measure and build report about the real operation of your applications on any environments. It's free and open-source and easy to integrate into applications with some history, no database nor profiling, really lightweight.
I have a piece of code that load a very big image in memory. So it seemed like a reasonable thing to call
System.gc();
before loading the image. From what I can tell it works with no issues.
Yesterday i decided to use a pretty useful piece of software called FindBugs that scans your code and reports back issues that might cause bugs or generally not advised strategies. The problem is that this piece of code i mentioned gets reported. The description is this:
... forces garbage collection;
extremely dubious except in
benchmarking code
And it goes on to elaborate :
Code explicitly invokes garbage
collection. Except for specific use in
benchmarking, this is very dubious.
In the past, situations where people
have explicitly invoked the garbage
collector in routines such as close or
finalize methods has led to huge
performance black holes. Garbage
collection can be expensive. Any
situation that forces hundreds or
thousands of garbage collections will
bring the machine to a crawl.
So my question is : Is it NOT OK to programmatically call the garbage collector in such a case? My code only calls it once and the method that it is in gets used rarely. And if it is not OK to call it then what should you do in a case where you need as much memory as possible before doing a very memory intensive operation and you need to free as much memory as posible prior to it?
Did you get any performance improvements with the System.gc()?
I don't think so, since you probably dont have a lot of objects that needs to be collected before you load the image.
Usually modern garbage collectors know best when to run, so you shouldnt force a collection, unless you have a really really good reason to. (for example a benchmarking application as suggested by that plugin)
btw: Calling System.gc() recommends the VM to perform a "full" or "large" collection, which means that all threads are stopped shortly.
Otherwise it will probably only make "small" garbage collections, which don't stop all threads.
Run your program with -verbose:gc to see how many bytes are collected.
There is also lots of technical information on garbage collection here:
http://java.sun.com/developer/technicalArticles/Programming/GCPortal/
Typically the GC is smarter than you, so it's better to let it run whenever the runtime decides. If the runtime needs memory, it'll run the GC itself
It's fine to call the garbage collector, you don't get any "problems" from it.
However, I doubt it will significently boost performance, unless that call also deals with defragging the allocated data. I don't know that.
What you should do in this case is profile the code. Run it several times, see what sort of results you get.
Typically, you should not interfere with the garbage collector. If it's necessary to free some memory before loading the image, then the garbage collector will do it for you.
Regardless, if you're only doing it once, it's probably not going to affect your performance drastically. Things done in loops are far more important.
You already got plenty of good advice, which I will try not to reiterate.
If you actually get problems with the GC, like full stops of your application for a second, do the following:
1. check that there aren't any calls to System.gc();
2. check out the various options for configuring the gc. There are tons of those around, and they are much more helpful, then forcing gc.
Ensure that the large objects can be gc'ed as early as possible. I.e. set variables to null and/or let them fall out of scope. THis helps!
If a memory allocation fails, a GC cycle is initiated and the allocation is tried again.
Generally, no. It isn't appropriate to call System.gc(). However, I have had a few cases where it made sense.
In the software I write, there is a built-in performance tracking layer. It is mostly used during automated testing, but can be used in the field for diagnostic purposes. Between tests or after specific runs we will call System.gc a few times and then record the memory still present. It provides us with a basic memory foot print benchmark for watching memory consumption trend lines over time. While this can be done with some of the external JVM interfaces, it was easier to do it in place and exact numbers were not required.
On a much older system, we could have upwards of 72 separate JVMs (yeah, 72, there was a good reason for it at the time of construction). In that system, leaving the heap to free float on all 72 JVMs could yield some excessive (well beyond physical memory) total memory consumption. System.gc() was called between heavy data exercises to try and keep the JVM closer to average to keep the heap from growing (capping the heap could have been another direction, but then it would have required the implementers to configure more per site and be more aware of what was happening under the hood to get it right and not have the system fail under load).
I've read plenty of articles about tuning GC in Java and have often wondered how many people really use some of the more advanced features.
I've always avoided tuning where possible and concentrated on writing the simplest possible code I can (Brian Goetz's advice) - this seems to have worked well for me so far.
Are these tuning strategies resilient to change across VM versions or do they require constant re-assessment?
The one tuning that I have made use of is the -server flag.
Part of my current job is the care and feeding of a large java application that was designed to run with a large amount of memory (currently around 8 Gb), mostly due to ongoing calculations with lots of cached data. I did the initial deployment with the standard GC setup , mostly because there wasn't a simple way to simulate the production environment running at full tilt.
In stages, over the next few months, I've customized the GC settings. Generally, the biggest available knob seems to be adjusting the frequency and magnitude of incremental gc - the biggest improvement has been in trading off large periodic gc for smaller and more frequent ones. But we've definitely been able to see performance improvements.
I'm not going to post my specific settings because a) they're specific to our setup, and b) because I don't have them handy :). But in general, what I've found is
there's been a lot of work done in
tuning the default gc settings.
Almost always the defaults work better than any tweaks I would make.
At least for me, the situations where
gc tuning was actually worthwhile
were extreme enough that it was
unreasonable to attempt to simulate
them, so I had to do it
experimentally and incrementally.
Here's a good reference from a prev. stackoverflow discussion.
The vast majority of developers will never have to (or want to) tune GC. I have worked with people who have had to tune it and here is the advice:
Before you attempt to tune the garbage
collector make 100% sure you have
verified, with a profiler. what is
going on. Once you start tuning make
sure you verify, with a profiler, that
it had a positive effect.
You should also revisit the changes with each version of the VM you run on (different VMs will have different tuning strategies).
I once helped someone with an GC issue that turned out to be them not closing JDBC result sets (or some issue like that). This caused memory to never be freed (his code held onto them for some reason). Fixing that issue made the program go from 20 minutes to something like 30 seconds or a couple of minutes. The memory usage went way down as well.
I have to say that I haven't had the need myself to use tuning very much. But I work closely with people who write code where latency is critical: they make much use of such tuning - specifying which GC algorithm to use, max pause times, survivor ratios etc.
I guess the answer therefore is: if latency is critical to an application, you might need to look at tuning your GC
I would say the most common thing to tune is the maximum memory size. Most of the other memory options have sensible defaults and are often over tuned IMHO. i.e. Set when it really doesn't make much difference. I often see people set lots of options when half of them are the default in any case. ;)
Using a profiler is the most useful way to improve GC behaviour (by reducing the number of objects created)
I have but not recently. The application that I was working on was real-time rendering of a video stream constructed of individual motion JPEG images. At the time (circa JDK 1.2 and 1.3), the -Xincgc setting would switch the client garbage collector from more of a big bang cleanup to a mode where a bit of garbage was cleaned up regularly. As a result, the distribution of frame latencies was much lower, giving the impression of a smoother video (instead of 1-2-3-pause, 1-2-3-pause).
I haven't looked at that code in quite a long time but I strongly suspect that, with the modern garbage collection algorithms, -Xincgc would actually decrease performance.
In today's world, I would say that standard optimization skepticism should always apply: profile profile profile. Are you sure that the bottleneck is really the garbage collector...?
In short, yes, it's very useful for tuning any serious Java application. We've often found that in production scenarios it's the difference between a stable app and a completely unpredictable app. It's certainly not the first thing I do but once you have an application working and can apply real load to it, it's one of the first things to investigate at that point.