How to prevent the Garbage Collector to slow down my application

How to prevent the Garbage Collector to slow down my application - java

Let's say I've got an applciation which has a memory leak. At some point the GC will try very hard to clear memory and will slow down my application. I know that if you set this parameter for the JVM -XX:-UseGCOverheadLimit it will throw an OutOfMemoryException:
if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered.
However this is somehow not good enough for me. Because my application will become very slow even before these numbers hit. The GC will absorb the CPU for some time before the OutOfMemoryException will be thrown. My goal is to somehow recognize very early if there will most likly a problem and then throw the OutOfMemoryexception. After that I have some kind of recovery strategy.
Ok now I've found these two additional parameters GCTimeLimit and GCHeapFreeLimit. With them it is possible to tweak the two quoted constants (98% and 2%).
I've made some tests on my own like a small piece of code which produces a memory leak and played with those settings. But I'm not really sure how to find the correct tradeoff. My hope is that someone else had the same problem and came up with a reasonable solution, or maybe there are some other GC switches which i don't know yet.
I'm feeling a little bit lost since I'm not really an expert on this topic and it seems that there are a lot of thing's which can be considered.

If you are using the Sun/Oracle JVM, this page seems to be a pretty complete GC-tuning primer.

You can use java.lang.management.MemoryUsage to determine the used memory, and total memory available. As it approaches the tunable GC collection threshold then you can throw your error.
Of course doing this is a little ridiculous. If the issue is that you need more memory then increase the heap size. The more likely issue is that you're not releasing memory gracefully when you're done with it.

Side-step the JVM heap and use something like Terra Cotta's Big Memory which uses direct memory management to grow beyond the reach of the garbage collector.

Related

How do you determine the CMSInitiatingOccupancyFraction for large Java applications?

We are using a database (Apache Geode) that is written in Java. Our servers have 64g of RAM, so we set our Java heap (Xms and Xmx) at about 62g of RAM.
Most Java recommendations I've seen for situations like these is to use the CMS garbage collector, and to set the CMSInitiatingOccupancyFraction at somewhere around 68% (give or take a little, but not much).
But my question is: Why can't we set garbage collection to start at 95% instead of 68%? It seems maybe wasteful to run Java in such a fashion that you can never use more than 68% of your heap without causing non-stop Garbage Collections.
It's bugging us because we are at a stage where our database is doing non-stop garbage collections, and it's hard to justify more RAM when really the JVM has like 18 gigs free. :)
Thanks in advance for any advice.

I think the answers in the comments, plus some things I've managed to figure out, pretty much answers my question.
It seems one of the concerns is fragmentation. You can have 20% of your heap free (80% used), and yet your heap can be badly fragmented to where it's difficult to find the large contiguous blocks necessary for operation. If you start running GC's earlier, like at 70%, you keep fragmentation a little more under control by reclaiming unused objects and creating bigger holes for new objects.
For a non-compacting GC like CMS, there's no 100% guarantee that you'll never become too fragmented to allocate a block of memory, but if you collect early and often you can make that happen as infrequently as possible.

Garbage Collector going crazy after a few hours

Our JBoss 3.2.6 application server is having some performance issues and after turning on the verbose GC logging and analyzing these logs with GCViewer we've noticed that after a while (7 to 35 hours after a server restart) the GC going crazy. It seems that initially the GC is working fine and doing a GC every hour or so but at a certain point it starts going crazy and performing full GC's every minute. As this only happens in our production environment have not been able to try turning off explicit GCs (-XX:-DisableExplicitGC) or modify the RMI GC interval yet but as this happens after a few hours it does not seem to be caused by the know RMI GC issues.
Any ideas?
Update:
I'm not able to post the GCViewer output just yet but it does not seem to be hitting the max heap limitations at all. Before the GC goes crazy it is GC-ing just fine but when the GC goes crazy the heap doesn't get above 2GB (24GB max).
Besides RMI are there any other ways explicit GC can be triggered? (I checked our code and no calls to System.gc() are being made)

Is your heap filling up? Sometimes the VM will get stuck in a 'GC loop' when it can free up just enough memory to prevent a real OutOfMemoryError but not enough to actually keep the application running steadily.
Normally this would trigger an "OutOfMemoryError: GC overhead limit exceeded", but there is a certain threshold that must be crossed before this happens (98% CPU time spent on GC off the top of my head).
Have you tried enlarging heap size? Have you inspected your code / used a profiler to detect memory leaks?

You almost certainly have a memory leak and the if you let the application server continue to run it will eventually crash with an OutOfMemoryException. You need to use a memory analysis tool - one example would be VisualVM - and determine what is the source of the problem. Usually memory leaks are caused by some static or global objects that never release object references that they store.
Good luck!
Update:
Rereading your question it sounds like things are fine and then suddenly you get in this situation where GC is working much harder to reclaim space. That sounds like there is some specific operation that occurs that consumes (and doesn't release) a large amount of heap.
Perhaps, as #Tim suggests, your heap requirements are just at the threshold of max heap size, but in my experience, you'd need to pretty lucky to hit that exactly. At any rate some analysis should determine whether it is a leak or you just need to increase the size of the heap.

Apart from the more likely event of a memory leak in your application, there could be 1-2 other reasons for this.
On a Solaris environment, I've once had such an issue when I allocated almost all of the available 4GB of physical memory to the JVM, leaving only around 200-300MB to the operating system. This lead to the VM process suddenly swapping to the disk whenever the OS had some increased load. The solution was not to exceed 3.2GB. A real corner-case, but maybe it's the same issue as yours?
The reason why this lead to increased GC activity is the fact that heavy swapping slows down the JVM's memory management, which lead to many short-lived objects escaping the survivor space, ending up in the tenured space, which again filled up much more quickly.

I recommend when this happens that you do a stack dump.
More often or not I have seen this happen with a thread population explosion.
Anyway look at the stack dump file and see whats running. You could easily setup some cron jobs or monitoring scripts to run jstack periodically.
You can also compare the size of the stack dump. If it grows really big you have something thats making lots of threads.
If it doesn't get bigger you can at least see which objects (call stacks) are running.
You can use VisualVM or some fancy JMX crap later if that doesn't work but first start with jstack as its easy to use.

Detecting memory-leak programmatically

If, on purpose, I create an application that crunches data while suffering from memory-leaks, I can notice that the memory as reported by, say:
Runtime.getRuntime().freeMemory()
starts oscillating between 1 and 2 MB of free memory.
The application then enters a loop that goes like this: GC, processing some data, GC, etc. but because the GC happens so often, the application basically isn't doing much else anymore. Even the GUI takes age to respond (and, no, I'm not talking about EDT issues here, it's really the VM basically stuck in some endless GC'ing mode).
And I was wondering: is there a way to programmatically detect that the JVM doesn't have enough memory anymore?
Note that I'm not talking about ouf-of-memory errors nor about detecting the memory leak itself.
I'm talking about detecting that an application is running so low on memory that it is basically calling the GC all the time, leaving hardly any time to do something else (in my hypothetical example: crunching data).
Would it work, for example, to repeatedly read how much memory is available during, say, one minute, and see that if the number has been "oscillating" between different values all below, say, 4 MB, conclude that there's been some leak and that the application has become unusable?

And I was wondering: is there a way to programmatically detect that the JVM doesn't have enough memory anymore?
I don't think so. You can find out roughly how much heap memory is free at any given instant, but AFAIK you cannot reliably determine when you are running out of memory. (Sure, you can do things like scraping the GC log files, or trying to pick patterns in the free memory oscillations. But these are likely to be unreliable and fragile in the face of JVM changes.)
However, there is another (and IMO better) approach.
In recent versions of Hotspot (version 1.6 and later, I believe), you can tune the JVM / GC so that it will give up and throw an OOME sooner. Specifically, the JVM can be configured to check that:
the ratio of free heap to total heap is greater than a given threshold after a full GC, and/or
the time spent running the GC is less than a certain percentage of the total.
The relevant JVM parameters are "UseGCOverheadLimit", "GCTimeLimit" and "GCHeapFreeLimit". Unfortunately, Hotspot's tuning parameters are not well documented on the public web, but these ones are all listed here.
Assuming that you want your application to do the sensible thing ... give up when it doesn't have enough memory to run properly anymore ... then just launch the JVM with a smaller "GCTimeLimitor" or "GCHeapFreeLimit" than the defaults.
EDIT
I've discovered that the MemoryPoolMXBean API allows you to look at the peak usage of individual memory pools (heaps), and set thresholds. However, I've never tried this, and the APIs have lots of hints that suggest that not all JVMs implement the full API. So, I would still recommend the HotSpot tuning option approach (see above) over this one.

You can use getHeapMemoryUsage.

I see two attack vectors.
Either monitor your memory consumption.
When you more or less constantly use lots of the available memory it is very likely that you have a memory leak (or are just using too much memory). The vm will constantly try to free some memory without much success => constant high memory usage.
You need to distinguish that from a large zigzag pattern which happens often without being an indicator of memory problem. Basically you use more an more memory, but when gc finds time to do its job it finds lots of garbage to bring out, so everything is fine.
The other attack vector is to monitor how often and what kind of success the gc runs. If it runs often with only small gains in memory, it is likely you have a problem.
I don't know if you can access this kind of information directly from your program. But if nothing else I think you can specify parameters on startup which makes the gc log information into a file which in turn could get parsed.

What you could do is spawn a thread that wakes up periodically and calculates the amount of used memory and records the result. Then you can do regression analysis on the result to estimate the rate of memory growth in your application. If you know the rate of growth, and the maximum amount of memory, you can predict (with some confidence) when your application will run out of memory.

You can pass arguments to your java virtual machine that gives you GC diagnostics such as
-verbose:gc This flag turns on the logging of GC information. Available
in all JVMs.
-XX:+PrintGCTimeStamps Prints the times at which the GCs happen
relative to the start of the
application.
If you capture that output in a file, in your application you can periodcly read that file and parse it to know when the GC has happened. So you can work out the average time between every GC

I think the JVM does exactly this for you and throws java.lang.OutOfMemoryError: GC overhead limit exceeded. So if you catch OutOfMemoryError and check for that message then you have what you want, don't you?
See this question for more details

i've been using plumbr for memory leak detection and it's been a great experience, though the licence is very expensive: http://plumbr.eu/

Duration of Excessive GC Time in "java.lang.OutOfMemoryError: GC overhead limit exceeded"

Occasionally, somewhere between once every 2 days to once every 2 weeks, my application crashes in a seemingly random location in the code with: java.lang.OutOfMemoryError: GC overhead limit exceeded. If I google this error I come to this SO question and that lead me to this piece of sun documentation which expains:
The parallel collector will throw an OutOfMemoryError if too much time is
being spent in garbage collection: if more than 98% of the total time is
spent in garbage collection and less than 2% of the heap is recovered, an
OutOfMemoryError will be thrown. This feature is designed to prevent
applications from running for an extended period of time while making
little or no progress because the heap is too small. If necessary, this
feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the
command line.
Which tells me that my application is apparently spending 98% of the total time in garbage collection to recover only 2% of the heap.
But 98% of what time? 98% of the entire two weeks the application has been running? 98% of the last millisecond?
I'm trying to determine a best approach to actually solving this issue rather than just using -XX:-UseGCOverheadLimit but I feel a need to better understand the issue I'm solving.

I'm trying to determine a best approach to actually solving this issue rather than just using -XX:-UseGCOverheadLimit but I feel a need to better understand the issue I'm solving.
Well, you're using too much memory - and from the sound of it, it's probably because of a slow memory leak.
You can try increasing the heap size with -Xmx, which would help if this isn't a memory leak but a sign that your app actually needs a lot of heap and the setting you currently have is slightly to low. If it is a memory leak, this'll just postpone the inevitable.
To investigate if it is a memory leak, instruct the VM to dump heap on OOM using the -XX:+HeapDumpOnOutOfMemoryError switch, and then analyze the heap dump to see if there are more objects of some kind than there should be. http://blogs.oracle.com/alanb/entry/heap_dumps_are_back_with is a pretty good place to start.
Edit: As fate would have it, I happened to run into this problem myself just a day after this question was asked, in a batch-style app. This was not caused by a memory leak, and increasing heap size didn't help, either. What I did was actually to decrease heap size (from 1GB to 256MB) to make full GCs faster (though somewhat more frequent). YMMV, but it's worth a shot.
Edit 2: Not all problems solved by smaller heap... next step was enabling the G1 garbage collector which seems to do a better job than CMS.

The >98% would be measured over the same period in which less than 2% of memory is recovered.
It's quite possible that there is no fixed period for this. For instance, if the OOM check would be done after every 1,000,000 object live checks. The time that takes would be machine-dependent.
You most likely can't "solve" your problem by adding -XX:-UseGCOverheadLimit. The most likely result is that your application will slow to a crawl, use a bit more of memory, and then hit the point where the GC simply does not recover any memory anymore. Instead, fix your memory leaks and then (if still needed) increase your heap size.

But 98% of what time? 98% of the entire two weeks the application has been running? 98% of the last millisecond?
The simple answer is that it is not specified. However, in practice the heuristic "works", so it cannot be either of the two extreme interpretations that you posited.
If you really wanted to find out what the interval over which the measurements are made, you could always read the OpenJDK 6 or 7 source-code. But I wouldn't bother because it wouldn't help you solve your problem.
The "best" approach is to do some reading on tuning (starting with the Oracle / Sun pages), and then carefully "twiddle the tuning knobs". It is not very scientific, but the problem space (accurately predicting application + GC performance) is "too hard" given the tools that are currently available.

What does fragmented memory look like?

I have a mobile application that is suffering from slow-down over time. My hunch, (In part fed by this article,) is that this is due to fragmentation of memory slowing the app down, but I'm not sure. Here's a pretty graph of the app's memory use over time:
fraggle rock http://kupio.com/image-dump/fragmented.png
The 4 peaks on the graph are 4 executions of the exact same task on the app. I start the task, it allocates a bunch of memory, it sits for a bit (The flat line on top) and then I stop the task. At that point it calls System.gc(); and the memory gets cleaned up.
As can be seen, each of the 4 runs of the exact same task take longer to execute. The low-points in the graph all return to the same level so there do not seem to be any memory leaks between task runs.
What I want to know is, is memory fragmentation a feasible explanation or should I look elsewhere first, bearing in mind that I've already done a lot of looking? The low-points on the graph are relatively low so my assumption is that in this state the memory would not be very fragmented since there can't be a lot of small memory holes to be causing problems.
I don't know how the j2me memory allocator works though, so I really don't know. Can anyone advise? Has anyone else had problems with this and recognises the memory profile of the app?

If you've got a little bit of time, you could test your theory by re-using the memory by using Memory Pool techniques: each run of the task uses the 'same' chunks of memory by getting them from the pool and returning them at release time.
If you're still seeing the degrading performance after doing this investigation, it's not memory fragmentation causing the problem. Let us all know your results and we can help troubleshoot further.

Memory fragmentation would account for it... what is not clear is whether the Apps use of memory is causing paging? this would also slow things up.... and could cause the same issues.

It the problem really is memory fragmentation, there is not much you can do about it.
But before you give up in despair, try running your app with a execution profiler to see if it is spending a lot of time executing in an unexpected place. It is possible that the slow down is actually due to a problem in your algorithms, and nothing to do with memory fragmentation. As people have already said, J2ME garbage collectors should not suffer from fragmentation issues.

Consider looking at garbage collection statistics. You should have a lot more on the last run than the first, if your theory is to hold. Another thought might be that something else eats your memory so your application has less.
In other words, profiler time :)

What OS are you running this on? I have some experience with Windows CE5 (or Windows Mobile) devices. CE5's operating system level memory architecture is quite broken and will fail soon for memory intensive applications. Your graph does not have any scales, but every process only gets 32MB of address space on CE5. The VM and shared libraries will take their fair share of that as well, leaving you with quite little left.
The only way around this is to re-use the memory you allocated instead of giving it back to the collector and re-allocating later. This is, of course, much more low-level programming than you would usually want to do in Java, but on this platform you might be out of luck.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.