How much time should be spent doing garbage collection - java

I have an application that is responsible for archiving old applications, which will do a large number of applications at a time and so it will need to run for days at a time.
When my company developed this they did a fair bit of performance testing on it and they seemed to get decent numbers out of this, but I have been running an archive for a customer recently and it seems to be running really slowly and the performance seems to be degrading even more longer it runs.
There does not appear to be a memory leak, as since I have monitoring it with jconsole there still is plenty of memory available and does not appear to be shrinking.
I have noticed however that the survivor space and tenured gen of the heap can very quickly fill up until a garbage collection comes along and clears it out which seems to be happening rather frequently which I am not sure if that could be a source of the apparent slow down.
The application has been running now for 7 days 3 hours and according to jconsole it has spent 6 hours performing copy garbage collection (772, 611 collections) and 12 hours and 25 minutes on marksweep compaction's (145,940 collections).
This seems like a large amount of time to be spent on garbage collection and I am just wondering if anyone has looked into something like this before and knows if this is normal or not?
Edits
Local processing seems to be slow, for instance I am looking at one part in the logs that took 5 seconds to extract some xml from a SOAP envelope using xpath which it then appends to a string buffer along with a root tag.. that's all it does. I haven't profiled it yet, as this is running in production, I would either have to pull the data down over the net or set up a large test base in our dev environment which may end up having to do.
Running Java HotSpot Client VM version 10.0-b23
Really just need high throughput, haven't configured any specific garbage collection parameters, would be running what ever the defaults would be. Not sure how to find what collectors would be in use?
Fix
End up getting a profiler going on it, turned out the cause of the slow down was some code that was constantly trimming lines off a status box outputting logging statements which was pretty badly done. Should have figured the garbage collection was symptom from constantly copying the status text into memory, rather than an actual cause.
Cheers Guys.

According to your numbers, total garbage collection time was about 18 hours out of 7 days execution time. At about 10% of total execution time, that's slightly elevated, but even if you managed to get this down to 0%, you'd only have saved 10% execution time ... so if you're looking for substantial savings, you should better look into the other 90%, for instance with a profiler.

Without proper profiling, this is a guessing game. As an anectode, though, a few years ago a web app I was involved with at the time suddenly slowed down (response time) by a factor of 10 after a JDK upgrade. We ended up chasing it down to an explicit GC invocation added by a genious who was no longer with the company.

There is a balance you will try and maintain between JVM heap footprints and GC time. Another question might be do you have heap (and generations) (under-)allocated in such a way which mandates too frequent GCing. When deploying muti-tenant JVMs on these system, I've tried to maintain the balance to under 5% total GC time along with aggressive heap shrinkage to keep footprint low (again, multi-tenant). Heap and generations will mostly ALWAYS fill as to avoid frequent GCing to whatever it is set. Remove the -Xms parameter to see a more realistic steady state (if it has any idle time)
+1 to the suggestion on profiling though; it may be something not related to GC, but code.

Related

Is there any reason for GC to "slow" current execution even though GC logs show it doesn't?

Here is a description of my java program :
It is an optimization algorithm that runs for hours in a Thread, that
is often interrupted by the service for querying stuff about the
current optimization.
The optimization routine is a succession of simple operations (let us
call it "moves") that are repeated over and over. Sometimes, the
optimization procedure will slow down a lot : a move generally takes
less than 100ms, and sometimes, it will take several seconds, even
minutes.
Of course I thought about GC freezing the execution while performing full GC so I did some experiment and according to GC logs, minor collection are done very regularly and take 10ms, while full GC is very rare and takes no more than 2 seconds. I also ran some analysis via GC log analysis tools and the Throughput hits more than 95%. As an exemple, on a 7 minutes test run, there is only between 2 and 3 seconds lost due to GC according to GC log.
One should conclude GC is not the problem here. However, I observed that trying other GC algorithms would have quite a huge impact on how much time the "worst" moves will take (it can double/triple the execution time of "moves" when it starts to slow down). Therefore, GC seems to have an impact on it, even though GC logs tells me GC is not taking time.
Moreover, I also observe that "moves" tends to slow down in sequence : when one move slows down, the following one has a greater chance to slow down as well, even though they are not related in any way as these operations are fully independant. It just feels like the whole system is slowing down at some times, especially when the service is called a lot.
I read some things about memory fragmentation of heapspace that could lead to major slowing down in applications, and as my moves are creating and destroying a lot of objects, that could explain what I am experiencing here. However, I don't see why GC logs don't show this.
I am really reaching the limits of my knowledge and exeperience here and I would take any intuition or clue you may have to investigate further.
Another explanation is that the cause of the slow-down is outside of the Java application.
It could be due to competition for CPU cycles with other processes.
It could be due to competition for memory (resulting in paging).
If you are running in a virtual machine, the resource competition could be between different VMs.
Or, it could be some unexpected behavior in your application:
Are you using WeakReference or SoftReference objects.
Do you use finalize() methods?
Have you tried profiling the application?
I suggest you try using OS-provided performance tools to see if there is other activity on the system that correlated to the pauses. Also, looks for slowdowns / pauses in the system itself.

PSYoungGen Pause Jitter

I am observing some strange behaviors in my java application. Minor GC pause times are very stable, between 2 to 4 milliseconds, and spaced several seconds apart (ranging from around 4 seconds to minutes depending on busyness). I have noticed that before the first full GC, minor collection pause times can spike to several hundred milliseconds, sometimes breaching the seconds mark. However, after the first full collection, these spikes go away and minor collection pause times do not spike anymore and remain rock steady between 2-4 milliseconds. These spikes do not appear to correlate with tenured heap usage.
I'm not sure how to diagnose the issue. Obviously something changed from the full collection, or else the spikes would continue to happen. I'm looking for some ideas on how to resolve it.
Some details:
I am using the -server flag. The throughput parallel collector is used.
Heap size is at 1.5G, default ratio is used between young and tenured generation. Survivor rations remain at default. I'm not sure how relevant these are to this investigation as the same behavior is shown despite more tweaking.
On startup, I make several DB calls. Most of the information can be GC'd away (and does upon a full collection). Some instances of my application will GC while others will not.
What I've tried/thought about:
Full Perm Gen? I think the parallel collector handles this fine and does not need more flags, unlike CMS.
Manually triggering a full GC after startup. I will be trying this, hopefully making the problem go away based on observations. However, this is only a temporary solution as I still don't understand why there is even an issue.
First, wanted more information on this but since comment needs 50 repo which I dont have so asking here.
This is too less info to work with.
To diagnose the issue you can switch on the GC logs and post the behaviour that you notice. Also, you can use jstat to view the heap space usage live while application is running: http://docs.oracle.com/javase/1.5.0/docs/tooldocs/share/jstat.html
To turn on GC logs you can read here : http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html

Should you cap the java heap size if you don't need that much?

Let's assume I have plenty of memory on a production Unix box. And I have a Java backend application that does not use all that much heap. Initial tests show that it seems fine with 100MB.
However, when uncapped, the memory grows until 1GB and beyond.
I probably wouldn't care if it wasn't for the fact that every now and then the processing stream that the application is part of seems to choke. One possible (very vague) explanation is that the culprit is the mentioned Java application.
Question : Could it be that leaving the heap unnecessary high defers the garbage collection for so long that, when it finally kicks in, it has "so much to do" that it visibly impacts the performance?
I should probably mention that we are still running Java 1.4 (pretty old system).
If you don't need it cap it. Yes you are correct giving too much heap space to a Java program 'may' cause the garbage collector threads to run for a longer period of time. What is 'too much' depends on the requirements of your program. I have no hard data to back this up, I have seen this happen in production level Java based servers in the past. Java 1.7 (the latest version) may not present the same issues as Java 1.4.
You are correct that the GC time grows with the size of the heap. Bigger heap means more work for GC. But, even with heap of several GBs you should see Full GC cycles take somewhere around 2-3s. Do you see such "chokes" or are your chokes much longer?
In general, it is tolerable to have GC time <5% of total application run time.
Furthermore, it is hard to blame GC, it would be helpful if you could show us some GC logs.

Garbage Collector going crazy after a few hours

Our JBoss 3.2.6 application server is having some performance issues and after turning on the verbose GC logging and analyzing these logs with GCViewer we've noticed that after a while (7 to 35 hours after a server restart) the GC going crazy. It seems that initially the GC is working fine and doing a GC every hour or so but at a certain point it starts going crazy and performing full GC's every minute. As this only happens in our production environment have not been able to try turning off explicit GCs (-XX:-DisableExplicitGC) or modify the RMI GC interval yet but as this happens after a few hours it does not seem to be caused by the know RMI GC issues.
Any ideas?
Update:
I'm not able to post the GCViewer output just yet but it does not seem to be hitting the max heap limitations at all. Before the GC goes crazy it is GC-ing just fine but when the GC goes crazy the heap doesn't get above 2GB (24GB max).
Besides RMI are there any other ways explicit GC can be triggered? (I checked our code and no calls to System.gc() are being made)
Is your heap filling up? Sometimes the VM will get stuck in a 'GC loop' when it can free up just enough memory to prevent a real OutOfMemoryError but not enough to actually keep the application running steadily.
Normally this would trigger an "OutOfMemoryError: GC overhead limit exceeded", but there is a certain threshold that must be crossed before this happens (98% CPU time spent on GC off the top of my head).
Have you tried enlarging heap size? Have you inspected your code / used a profiler to detect memory leaks?
You almost certainly have a memory leak and the if you let the application server continue to run it will eventually crash with an OutOfMemoryException. You need to use a memory analysis tool - one example would be VisualVM - and determine what is the source of the problem. Usually memory leaks are caused by some static or global objects that never release object references that they store.
Good luck!
Update:
Rereading your question it sounds like things are fine and then suddenly you get in this situation where GC is working much harder to reclaim space. That sounds like there is some specific operation that occurs that consumes (and doesn't release) a large amount of heap.
Perhaps, as #Tim suggests, your heap requirements are just at the threshold of max heap size, but in my experience, you'd need to pretty lucky to hit that exactly. At any rate some analysis should determine whether it is a leak or you just need to increase the size of the heap.
Apart from the more likely event of a memory leak in your application, there could be 1-2 other reasons for this.
On a Solaris environment, I've once had such an issue when I allocated almost all of the available 4GB of physical memory to the JVM, leaving only around 200-300MB to the operating system. This lead to the VM process suddenly swapping to the disk whenever the OS had some increased load. The solution was not to exceed 3.2GB. A real corner-case, but maybe it's the same issue as yours?
The reason why this lead to increased GC activity is the fact that heavy swapping slows down the JVM's memory management, which lead to many short-lived objects escaping the survivor space, ending up in the tenured space, which again filled up much more quickly.
I recommend when this happens that you do a stack dump.
More often or not I have seen this happen with a thread population explosion.
Anyway look at the stack dump file and see whats running. You could easily setup some cron jobs or monitoring scripts to run jstack periodically.
You can also compare the size of the stack dump. If it grows really big you have something thats making lots of threads.
If it doesn't get bigger you can at least see which objects (call stacks) are running.
You can use VisualVM or some fancy JMX crap later if that doesn't work but first start with jstack as its easy to use.

Java Garbage Collection- CPU Spikes - Longer connection establishment times

We have a pool of server that sits behind the load balancer. The machines in this pool does garbage collection every 6 seconds on average. It takes almost half a second to garbage collect. We also see a CPU spike during garbage collection.
The client machines see a spike in average time to make a connection to the server almost 10% during a day.
Theory : CPU is busy doing GC and that's why it cannot allocate a connection faster.
Is it a valid theory?
JVM : IBM
GC algorithm :gencon
Nursery : 5 GB
Heap Size : 18 GB
I'd say with that many allocations all bets are off--it could absolutely get worse over time, I mean if you are doing GC every 6 seconds all day long that seems problematic.
Do you have access to that code? Can it be re-written to reuse objects and be more intelligent about allocation? I've done a few embedded systems and the trick is to NEVER call new once the system is up and running (Quite doable if you have control over the entire system)
If you don't have access to the code, check into some of the GC tuning options available (including the selection of the garbage collector used)--both distributed with the JDK and 3rd party options. You may be able to improve performance with a few command-line modifications.
It's possible I guess.
Given garbage collection is such an intensive process, is there any reason for it to occur every 6 seconds? I'm not familiar with the IBM JVM or the particular collection algorithm you are using so I can't really comment on those. However, there are some good tuning documents provided by Sun (now offered by Oracle) that discuss the different types of collectors and when you would use them. See this link for some ideas.
One way to prove your theory could to be add some code that logs the time a connection was requested and the time when it was actually allocated. If the GC related CPU spikes seem to coincide with longer times in allocating connections, then that'd prove your theory. Your problem will then become how to get around it.

Categories

Resources