One day my application got stuck for a ~5 min. I believe it happened because of ParNew GC. I don't have GC logs but the internal tool shows that ParNew consumed ~35% CPU at that time. Now I wonder how to prevent that in future.
The application runs with JDK 1.8 and 2.5G heap. The GC options are XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode.
I know I can use XX:MaxGCPauseMillis and XX:GCTimeRatio. What else would you propose to prevent ParNew from stopping the application for a few minutes ?
GC is optimized not to eat your cpu time, being stuck for 5 minutes is far from normal, either you released millions of objects or most likely your allocated almost (if not all) your heap space, forcing the GC to retrieve every piece of memory it could find. I would also check for memory leaks.
Related
I have a Java process which does a lot of JSON processing and therefore creates a bunch of objects that are frequently collected by the Garbage Collector.
Typically, the GCs are Minor and last no longer than 500ms - 1 second max.
However during peak processing times the Minor GCs run 40x longer, we are talking about GCs that last 40-60 seconds. Sometimes they can be over a minute in duration.
In a 5 minute period this process can be spending 70% of it's time doing GCs.
My initial troubleshooting has lead me to believe the machine was Swapping which indeed it was so I turned it off but still I am getting these long minor GCs.
So I have turned off swapping, I have ensured the VM the process runs on is not overloaded by which of course I mean there is plenty of free available memory on the host and free CPU cores.
What else should I be looking at?
What else could cause a JVM to Minor GC for such a long duration of time?
I have java enterprise application which is consuming more memory since few days, even though GC is running and we have adequate parameters set (ConcMarkSweepGC) it is not freeing complete memory.
When I have attached JProfiler, it is observed that whenever GC is running it is only clearing lets say if it was consuming 9GB, only around 1 to 1.2 GB is getting cleared. At the same time if I click on "Run GC" button attached with JProfiler it clears atleast 6-7 GB out of 9 GB occupied.
I was trying to understand what extra does Jprofiler GC does compare to regular GC executed by the application.
Following are few of the details required:
- App server: Wildfly 9
- Java version: Java 8
- OS: Windows 2012 - 64Bit
Any help around this would be helpful. Thanks in advance.
The behaviour varies between different GC algorithms but in principle a GC on the Old Space is not supposed to clear all unused memory at all times. In New Space we see a copying parallel GC to combat memory fragmentation but the Old Space is supposed to be significantly larger. Running such a GC would result in a long stop-the-world pause. You selected ConcMarkSweepGC which is a concurrent GC that won't attempt to execute the full stop-the-world GC cycle if there is enough free memory. You probably initiated a full stop-the-world GC on the Old Space with JProfiler.
If you want to understand it in detail read about different GC algorithms in JVM. There is quite a few of them and they are designed with different goals in mind.
I am observing some strange behaviors in my java application. Minor GC pause times are very stable, between 2 to 4 milliseconds, and spaced several seconds apart (ranging from around 4 seconds to minutes depending on busyness). I have noticed that before the first full GC, minor collection pause times can spike to several hundred milliseconds, sometimes breaching the seconds mark. However, after the first full collection, these spikes go away and minor collection pause times do not spike anymore and remain rock steady between 2-4 milliseconds. These spikes do not appear to correlate with tenured heap usage.
I'm not sure how to diagnose the issue. Obviously something changed from the full collection, or else the spikes would continue to happen. I'm looking for some ideas on how to resolve it.
Some details:
I am using the -server flag. The throughput parallel collector is used.
Heap size is at 1.5G, default ratio is used between young and tenured generation. Survivor rations remain at default. I'm not sure how relevant these are to this investigation as the same behavior is shown despite more tweaking.
On startup, I make several DB calls. Most of the information can be GC'd away (and does upon a full collection). Some instances of my application will GC while others will not.
What I've tried/thought about:
Full Perm Gen? I think the parallel collector handles this fine and does not need more flags, unlike CMS.
Manually triggering a full GC after startup. I will be trying this, hopefully making the problem go away based on observations. However, this is only a temporary solution as I still don't understand why there is even an issue.
First, wanted more information on this but since comment needs 50 repo which I dont have so asking here.
This is too less info to work with.
To diagnose the issue you can switch on the GC logs and post the behaviour that you notice. Also, you can use jstat to view the heap space usage live while application is running: http://docs.oracle.com/javase/1.5.0/docs/tooldocs/share/jstat.html
To turn on GC logs you can read here : http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html
Our JBoss 3.2.6 application server is having some performance issues and after turning on the verbose GC logging and analyzing these logs with GCViewer we've noticed that after a while (7 to 35 hours after a server restart) the GC going crazy. It seems that initially the GC is working fine and doing a GC every hour or so but at a certain point it starts going crazy and performing full GC's every minute. As this only happens in our production environment have not been able to try turning off explicit GCs (-XX:-DisableExplicitGC) or modify the RMI GC interval yet but as this happens after a few hours it does not seem to be caused by the know RMI GC issues.
Any ideas?
Update:
I'm not able to post the GCViewer output just yet but it does not seem to be hitting the max heap limitations at all. Before the GC goes crazy it is GC-ing just fine but when the GC goes crazy the heap doesn't get above 2GB (24GB max).
Besides RMI are there any other ways explicit GC can be triggered? (I checked our code and no calls to System.gc() are being made)
Is your heap filling up? Sometimes the VM will get stuck in a 'GC loop' when it can free up just enough memory to prevent a real OutOfMemoryError but not enough to actually keep the application running steadily.
Normally this would trigger an "OutOfMemoryError: GC overhead limit exceeded", but there is a certain threshold that must be crossed before this happens (98% CPU time spent on GC off the top of my head).
Have you tried enlarging heap size? Have you inspected your code / used a profiler to detect memory leaks?
You almost certainly have a memory leak and the if you let the application server continue to run it will eventually crash with an OutOfMemoryException. You need to use a memory analysis tool - one example would be VisualVM - and determine what is the source of the problem. Usually memory leaks are caused by some static or global objects that never release object references that they store.
Good luck!
Update:
Rereading your question it sounds like things are fine and then suddenly you get in this situation where GC is working much harder to reclaim space. That sounds like there is some specific operation that occurs that consumes (and doesn't release) a large amount of heap.
Perhaps, as #Tim suggests, your heap requirements are just at the threshold of max heap size, but in my experience, you'd need to pretty lucky to hit that exactly. At any rate some analysis should determine whether it is a leak or you just need to increase the size of the heap.
Apart from the more likely event of a memory leak in your application, there could be 1-2 other reasons for this.
On a Solaris environment, I've once had such an issue when I allocated almost all of the available 4GB of physical memory to the JVM, leaving only around 200-300MB to the operating system. This lead to the VM process suddenly swapping to the disk whenever the OS had some increased load. The solution was not to exceed 3.2GB. A real corner-case, but maybe it's the same issue as yours?
The reason why this lead to increased GC activity is the fact that heavy swapping slows down the JVM's memory management, which lead to many short-lived objects escaping the survivor space, ending up in the tenured space, which again filled up much more quickly.
I recommend when this happens that you do a stack dump.
More often or not I have seen this happen with a thread population explosion.
Anyway look at the stack dump file and see whats running. You could easily setup some cron jobs or monitoring scripts to run jstack periodically.
You can also compare the size of the stack dump. If it grows really big you have something thats making lots of threads.
If it doesn't get bigger you can at least see which objects (call stacks) are running.
You can use VisualVM or some fancy JMX crap later if that doesn't work but first start with jstack as its easy to use.
Occasionally, somewhere between once every 2 days to once every 2 weeks, my application crashes in a seemingly random location in the code with: java.lang.OutOfMemoryError: GC overhead limit exceeded. If I google this error I come to this SO question and that lead me to this piece of sun documentation which expains:
The parallel collector will throw an OutOfMemoryError if too much time is
being spent in garbage collection: if more than 98% of the total time is
spent in garbage collection and less than 2% of the heap is recovered, an
OutOfMemoryError will be thrown. This feature is designed to prevent
applications from running for an extended period of time while making
little or no progress because the heap is too small. If necessary, this
feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the
command line.
Which tells me that my application is apparently spending 98% of the total time in garbage collection to recover only 2% of the heap.
But 98% of what time? 98% of the entire two weeks the application has been running? 98% of the last millisecond?
I'm trying to determine a best approach to actually solving this issue rather than just using -XX:-UseGCOverheadLimit but I feel a need to better understand the issue I'm solving.
I'm trying to determine a best approach to actually solving this issue rather than just using -XX:-UseGCOverheadLimit but I feel a need to better understand the issue I'm solving.
Well, you're using too much memory - and from the sound of it, it's probably because of a slow memory leak.
You can try increasing the heap size with -Xmx, which would help if this isn't a memory leak but a sign that your app actually needs a lot of heap and the setting you currently have is slightly to low. If it is a memory leak, this'll just postpone the inevitable.
To investigate if it is a memory leak, instruct the VM to dump heap on OOM using the -XX:+HeapDumpOnOutOfMemoryError switch, and then analyze the heap dump to see if there are more objects of some kind than there should be. http://blogs.oracle.com/alanb/entry/heap_dumps_are_back_with is a pretty good place to start.
Edit: As fate would have it, I happened to run into this problem myself just a day after this question was asked, in a batch-style app. This was not caused by a memory leak, and increasing heap size didn't help, either. What I did was actually to decrease heap size (from 1GB to 256MB) to make full GCs faster (though somewhat more frequent). YMMV, but it's worth a shot.
Edit 2: Not all problems solved by smaller heap... next step was enabling the G1 garbage collector which seems to do a better job than CMS.
The >98% would be measured over the same period in which less than 2% of memory is recovered.
It's quite possible that there is no fixed period for this. For instance, if the OOM check would be done after every 1,000,000 object live checks. The time that takes would be machine-dependent.
You most likely can't "solve" your problem by adding -XX:-UseGCOverheadLimit. The most likely result is that your application will slow to a crawl, use a bit more of memory, and then hit the point where the GC simply does not recover any memory anymore. Instead, fix your memory leaks and then (if still needed) increase your heap size.
But 98% of what time? 98% of the entire two weeks the application has been running? 98% of the last millisecond?
The simple answer is that it is not specified. However, in practice the heuristic "works", so it cannot be either of the two extreme interpretations that you posited.
If you really wanted to find out what the interval over which the measurements are made, you could always read the OpenJDK 6 or 7 source-code. But I wouldn't bother because it wouldn't help you solve your problem.
The "best" approach is to do some reading on tuning (starting with the Oracle / Sun pages), and then carefully "twiddle the tuning knobs". It is not very scientific, but the problem space (accurately predicting application + GC performance) is "too hard" given the tools that are currently available.