Debugging a strange memory leak - Java/Tomcat - java

I'm experiencing a very odd problem with a Java application running under Tomcat.
We tried to update the production code from a fresh newly produced in a 1-week sprint, the application has been running over months without hiccups and then this new code makes our Linux servers start swapping after some time.
The very strange thing is that when looking at VisualVM for memory usage it never exceeds the maximum heap size, the JVM does not throw an OutOfMemory, the machine only starts swapping and the JVM keeps running even after that.
So, it seems that's leaking memory from somewhere, it seems like it's from the new code but it's odd that it's not inside the JVM, any ideas in how to debug that?
Thanks!

Swapping is not a conclusive indicator of leakage. It results from low physical memory. Use vmstat on Linux to get swap usage. Try using a different machine, experiment with configurations --swap size, physical memory size, address space.
If you are confident that the problem is in your program try this:
Estimate the median and peak memory that your program should use. You must be able to account for all deviations from these metrics. If you cannot, proceed to step 3.
Assuming you did step 1 correctly and were able to account for all deviations, you can rule out the leak (sorry about such vague suggestions but debugging is only as good as the detective). You should now focus on GC tuning. First, enable GC logging. See if your heap is actually full and where the GC is spending most of its time collecting. This may be a good starting point to start optimizations. Try to see if adjusting GC options helps. Try experimenting with collection algorithms, max/min heap sizes, gen ratios etc. Only experiment when you have ruled out a leak (step 1).
Assuming you did step 1 correctly and were not able to account for all deviations, you can assume that you have a leak somwhere. Use a memory profiler to see what objects contribute to the heap size growth most. Leave a profiler running for an extended period of time --have your program handle some requests it routinely expects to get and then leave it relatively isolated after that. If the memory level keeps on growing you may have a leak somewhere. If not, then it is probably not a memory leak. Can you pin point the part of your program that may be creating them? If yes, try sending several requests that only target that part of your program. Does it replicate the problem deterministically? If no, repeat step 3. If yes, use divide and conquer and reapply step 3 till you can find the class/method that are the culprits. It can be a certain combination of multiple portions as well (meaning that individually they may look innocent but together they may form a brilliant crime syndicate).
Hope this helps, if not then please leave a comment to my post.
All the very best on your exercise!

I would suggest you look into creating heap dumps without using jvisualvm. For Unix-based Oracle JVM's this is normally done by sending a signal 3 to the JVM using kill.
For full details see http://www.startux.de/index.php/java/45-java-heap-dumpyvComment45
You can then see if the patterns changes.
If you do not get an idea from this, then this might be because you are storing a sub-string from a very large original string (which carries the underlying string array around), or because you hold on to operating system resources like open database connections etc.
You have checked your connection pool looks good?

If you aren't using it, I'd recommend using visual VM version 1.3.2 and all the plug-ins. It's a big jump up from earlier versions.
What happens to the perm gen space?
What are the memory settings you're using? Min and max, of course, but what about perm space size?

Related

why running garbage collector sometimes increase reserved ram java?

we have a java8 web application running on tomcat8.5.47 server.we have only 20-60 users sessions per time but most of time up to 600mb uploading files on server.we also use hibernate and c3p0 for manage database connections.
we monitored server several days and saw sometimes java reserved ram increased suddenly and garbage collector did not released it.how can we manage this?and is there any way to release reserved ram and prevent tomcat from increasing ram? and also any way to decrease used ram in task manager?
these are our settings:
-XX:MaxPermSize=1g -XX:+UseG1GC -XX:+UseStringDeduplication -XX:MaxHeapFreeRatio=15 -XX:MinHeapFreeRatio=5 -XX:-UseGCOverheadLimit -Xmn1g -XX:+UseCompressedOops -Xms10g -Xmx56g
and it is an image of profiler when this happened:
and it is an image of profiler and also task manager after 2 hours:
P.s. we use jprofiler to profile and the green colour shows reserved ram and the blue colour is for used ram.also in second box you can track gc activity and third is for classes and forth shows threads activities and last is for cpu activities.
Thank you all for your answers.
These types of questions are never easy, mainly because to get it "right", the person asking them needs to have some basic understanding of how an OS treats and deals with memory; and the fact that there are different types of memory (at least resident, committed and reserved). I am by far not versatile enough to get this entirely right too, but I keep learning and getting better at this. They mean very different things and some of them are usually irrelevant (I find reserved to be such). You are using windows, as such this, imho is a must watch to begin with.
After you watch that, you need to move to the JVM world and how a JVM process. The heap is managed by a garbage collector, so to shrink some un-used heap - the GC needs to be able to do that. And while, before jdk-12, G1 could do that - it was never very eager to. Since jdk-12, there is this JEP that will return memory back, i.e.: it will un-commit memory back. Be sure to read when that happens, though. Also notice that other collectors like Shenandoah and/or ZGC do it much more often.
Of course, since you disable -UseGCOverheadLimit, you get a huge spike in CPU (GC threads are running like crazy to free space) and of course everything slows down. If I were you, i would enable that one back, let GC fail and analyze GC logs to understand what is going on. 56GB of Heap is a huge number for 20-60 users (this surely looks like a leak?). Notice, that without GC logs, this might be impossible to give a solution to.
P.S. Look at the first screen you shared and notice how there are two colors there: green and blue. I don't know what tool is that, but it looks like green is for "reserved memory" and blue is "used" (this is what used means). But it would be great if you said exactly what those are.
Java8 doesn't return allocated RAM back to OS even if JVM doesn't need it. For that feature you need to move to another version of JDK. This is JEP for that https://openjdk.java.net/jeps/346 it says that it was delivered in version 12 so I assume JDKs with version after 12 should have that feature.
The only way to prevent increasing of reserved memory is to decrease Xmx value. And since you are setting it to 56g I assume you are OK with Tomcat consuming up to 56g of memory. So if you think that it is too much then just decrease that number.

Tomcat garbage collection not thoroughly

I'm really out of my comfortzone when I have to monitor memory. But I'm the only one here, and I'm just left clueless: I have a Java8 application (CMS) on a Tomcat application-server running, and where in some trouble. After a while the server crashes.
After some research, I found out it is memory related, so I attached visualVM on my environment, and started monitoring.
I see that the memory is slowly filling up. Garbage Collection does it's job, but not thoroughly. It always leaves some more memory in the used heap. When I do a manual 'Perform Garbage Collection' in visual VM, the garbage collection is performed much better. (See screenshots)
It'll take several hours, but the used heap will grow larger and larger again after each garbage collection. The moment that I will manually preform GC again, the minimal used heap will be 'normal' again.
I have noticed that the heap fills with byte[]. Those will fill the most of the space. Someone could help me out on this?
I see that the memory is slowly filling up. Garbage Collection does
it's job, but not thoroughly. It always leaves some more memory in the
used heap. When I do a manual 'Perform Garbage Collection' in visual
VM, the garbage collection is performed much better.
Full GC gets triggered when JVM feel its necessary (as its costly . For example Its stop the world for parallel GC . Similarly stop the world for two sub phases for concurrent mark sweep collector) . It depends on various factors like Xms and Xmx parameters see JVM heap parameters. So you should not be worried about until and unless you get out of mem exception as JVM will trigger when necessary
For server crash :- I can think of two problems
Memory leak. In that case memory footprints will be increasing even after each GC
May be you are constructing some cache without eviction algorithm if its near to full
If both does not apply, i see a usecase of increasing heap and give it a try
I've had a few problems like this before. One was our app's fault, one was the app server's fault, and one I wasn't able to figure out but was able to mitigate.
In each case I used JProfiler to watch memory usage on a local server and ran a variety of happy-path and exception tests to try to figure out what was causing the problem. Doing this testing wasn't a quick and easy process - on average I spent about a week each time.
In the first case (our app's fault), I found that we were not closing SQL connections for a web service when exceptions were thrown. Testing the happy paths showed no problems, but when I started testing exceptions I could exhaust the server's memory with about 100 consecutive exceptions. Adding code to manually clean up resources in the exception handler solved the problem.
In the second case (WebSphere's fault), I verified that our app was closing all resources correctly, but the problem persisted. So I started reading through WebSphere documentation and found that it was a known issue with JAX-WS clients. Luckily there was a patch to WebSphere which fixed the problem.
In the third case (couldn't determine the cause), I was unable to find any reason why it was happening. So the problem was mitigated by increasing JVM memory allocation to an amount where the OOM exceptions would take greater than 1 week to happen, and configuring the servers to restart every weekend.
There might be some simple technical workarounds to mitigate the problem; like: simply adding more memory to the JVM and/or the underlying machine.
Or if you really can prove that running System.gc() manually helps (as the comments indicates that most people think: it will not) you could automate that step.
If any of that is good enough to prevent those crashes; you are buying yourself more time to work on a real solution.
Beyond that, a meta, non-technical perspective. Actually there are two options:
You turn to your manager; and you tell him that you will need anything beetween 2, 4, 8 weeks in order to learn this stuff; so that you can identify the root cause and get it fixed.
You turn to your manager; and you tell him that he should look for some external consulting service to come, identify the root cause and help fixing it.
In other words: your team / product is in need for some "expert" knowledge. Either you invest in building that knowledge internally; or you have to buy it from somewhere.

Detecting memory-leak programmatically

If, on purpose, I create an application that crunches data while suffering from memory-leaks, I can notice that the memory as reported by, say:
Runtime.getRuntime().freeMemory()
starts oscillating between 1 and 2 MB of free memory.
The application then enters a loop that goes like this: GC, processing some data, GC, etc. but because the GC happens so often, the application basically isn't doing much else anymore. Even the GUI takes age to respond (and, no, I'm not talking about EDT issues here, it's really the VM basically stuck in some endless GC'ing mode).
And I was wondering: is there a way to programmatically detect that the JVM doesn't have enough memory anymore?
Note that I'm not talking about ouf-of-memory errors nor about detecting the memory leak itself.
I'm talking about detecting that an application is running so low on memory that it is basically calling the GC all the time, leaving hardly any time to do something else (in my hypothetical example: crunching data).
Would it work, for example, to repeatedly read how much memory is available during, say, one minute, and see that if the number has been "oscillating" between different values all below, say, 4 MB, conclude that there's been some leak and that the application has become unusable?
And I was wondering: is there a way to programmatically detect that the JVM doesn't have enough memory anymore?
I don't think so. You can find out roughly how much heap memory is free at any given instant, but AFAIK you cannot reliably determine when you are running out of memory. (Sure, you can do things like scraping the GC log files, or trying to pick patterns in the free memory oscillations. But these are likely to be unreliable and fragile in the face of JVM changes.)
However, there is another (and IMO better) approach.
In recent versions of Hotspot (version 1.6 and later, I believe), you can tune the JVM / GC so that it will give up and throw an OOME sooner. Specifically, the JVM can be configured to check that:
the ratio of free heap to total heap is greater than a given threshold after a full GC, and/or
the time spent running the GC is less than a certain percentage of the total.
The relevant JVM parameters are "UseGCOverheadLimit", "GCTimeLimit" and "GCHeapFreeLimit". Unfortunately, Hotspot's tuning parameters are not well documented on the public web, but these ones are all listed here.
Assuming that you want your application to do the sensible thing ... give up when it doesn't have enough memory to run properly anymore ... then just launch the JVM with a smaller "GCTimeLimitor" or "GCHeapFreeLimit" than the defaults.
EDIT
I've discovered that the MemoryPoolMXBean API allows you to look at the peak usage of individual memory pools (heaps), and set thresholds. However, I've never tried this, and the APIs have lots of hints that suggest that not all JVMs implement the full API. So, I would still recommend the HotSpot tuning option approach (see above) over this one.
You can use getHeapMemoryUsage.
I see two attack vectors.
Either monitor your memory consumption.
When you more or less constantly use lots of the available memory it is very likely that you have a memory leak (or are just using too much memory). The vm will constantly try to free some memory without much success => constant high memory usage.
You need to distinguish that from a large zigzag pattern which happens often without being an indicator of memory problem. Basically you use more an more memory, but when gc finds time to do its job it finds lots of garbage to bring out, so everything is fine.
The other attack vector is to monitor how often and what kind of success the gc runs. If it runs often with only small gains in memory, it is likely you have a problem.
I don't know if you can access this kind of information directly from your program. But if nothing else I think you can specify parameters on startup which makes the gc log information into a file which in turn could get parsed.
What you could do is spawn a thread that wakes up periodically and calculates the amount of used memory and records the result. Then you can do regression analysis on the result to estimate the rate of memory growth in your application. If you know the rate of growth, and the maximum amount of memory, you can predict (with some confidence) when your application will run out of memory.
You can pass arguments to your java virtual machine that gives you GC diagnostics such as
-verbose:gc This flag turns on the logging of GC information. Available
in all JVMs.
-XX:+PrintGCTimeStamps Prints the times at which the GCs happen
relative to the start of the
application.
If you capture that output in a file, in your application you can periodcly read that file and parse it to know when the GC has happened. So you can work out the average time between every GC
I think the JVM does exactly this for you and throws java.lang.OutOfMemoryError: GC overhead limit exceeded. So if you catch OutOfMemoryError and check for that message then you have what you want, don't you?
See this question for more details
i've been using plumbr for memory leak detection and it's been a great experience, though the licence is very expensive: http://plumbr.eu/

Eclipse not releasing memory in Java process on Linux

My Linux server need to be able to handle 30+ eclipse instances for developers. I did a quick test of running 10 eclipse instances. The Java process associated with each eclipse initially around 200MB RSS memory, increased up to around 550MB, when more projects are loaded.
But Java process doesn't seem to release memory, after closing/deleting all projects within eclipse instances. I still see it uses over 550MB RSS.
How can I change Eclipse or Java settings so that memory foot print got reduced when developers closed down projects or being idle for a while?
Thanks
You may want to experiment with these (and other) JVM tuning options to make the JVM less reluctant to return memory to the OS:
-XX:MaxHeapFreeRatio Maximum percentage of heap free after GC to avoid shrinking. Default is 70.
-XX:MinHeapFreeRatio Minimum percentage of heap free after GC to avoid expansion. Default is 40.
However, I suspect that you won't see the eclipse process shrink to anywhere near its initial size, since eclipse is a huge, complex application that probably lazy-loads (but does not unload, once used) a lot of classes and associated data structures.
I've never seen Java release memory.
I don't think you will get any value out of trying to get it to release memory with Eclipse, I've watched that little memory counter for YEARS and never once see the allocated memory drop.
You might try one of these.
After each session, exit the JVM and restart.
Set your -Xmx lower.
Separate your instances into categories with high -Xmx and low -Xmx and let the user determine which one he wants.
As a side-thought, if it really mattered to you, you MIGHT be able to run multiple eclipse instances under one VM. It would probably be WAY too much work (man-weeks to man-years), but if you could get it right you could reduce overhead by like 150-200mb/instance. The disadvantage would be that a VM crash (Pretty rare these days) would kill everyone.
Testing this theory would be a matter of calling eclipse's main from within an existing JVM and trying to get it to display somewhere useful. The rest of the man-year is spent trying to figure out where they used evil static variables or singletons and changing them to something else.
Switch the Java to use the G1 garbage collector with the HeapFreeRatio parameters. Use these options in eclipse.ini:
-XX:+UnlockExperimentalVMOptions
-XX:+UseG1GC
-XX:MinHeapFreeRatio=5
-XX:MaxHeapFreeRatio=25
Now when Eclipse eats up more than 1 GB of RAM for a complicated operation and switched back to 300 MB after Garbage Collection the memory will be released back to the operating system.
I would suggest checking on garbage collection, setting right options or even forcing GC periodically might increase time till eclipse memory usage grows high.
Following link might be useful http://www.eclipsezone.com/eclipse/forums/t93757.html

Strategies for the diagnosis of Java memory issues

I've been tasked with debugging a Java (J2SE) application which after some period of activity begins to throw OutOfMemory exceptions. I am new to Java, but have programming experience. I'm interested in getting your opinions on what a good approach to diagnosing a problem like this might be?
This far I've employed JConsole to get a picture of what's going on. I have a hunch that there are object which are not being released properly and therefor not being cleaned up during garbage collection.
Are there any tools I might use to get a picture of the object ecosystem? Where would you start?
I'd start with a proper Java profiler. JConsole is free, but it's nowhere near as full featured as the ones that cost money. I used JProfiler, and it was well worth the money. See https://stackoverflow.com/questions/14762/please-recommend-a-java-profiler for more options and opinions.
Try the Eclipse Memory Analyzer, or any other tool that can process a java heap dump, and then run your app with the flap that generates a heap dump when you run out of memory.
Then analyze the heap dump and look for suspiciously high object counts.
See this article for more information on the heap dump.
EDIT: Also, please note that your app may just legitimately require more memory than you initially thought. You might try increasing the java minimum and maximum memory allocation to something significantly larger first and see if your application runs indefinitely or simply gets slightly further.
The latest version of the Sun JDK includes VisualVM which is essentially the Netbeans profiler by itself. It works really well.
http://www.yourkit.com/download/index.jsp is the only tool you'll need.
You can take snapshots at (1) app start time, and (2) after running app for N amount of time, then comparing the snapshots to see where memory gets allocated. It will also take a snapshot on OutOfMemoryError so you can compare this snapshot with (1).
For instance, the latest project I had to troubleshoot threw OutOfMemoryError exceptions, and after firing up YourKit I realised that most memory were in fact being allocated to some ehcache "LFU " class, the point being that we specified loads of a certain POJO to be cached in memory, but us not specifying enough -Xms and -Xmx (starting- and max- JVM memory allocation).
I've also used Linux's vmstat e.g. some Linux platforms just don't have enough swap enabled, or don't allocate contiguous blocks of memory, and then there's jstat (bundled with JDK).
UPDATE see https://stackoverflow.com/questions/14762/please-recommend-a-java-profiler
You can also add an "UnhandledExceptionHandler" to your Application's Thread. This will catch 'uncaught' exception, like an out of memory error, and you will at least have an idea where the exception was thrown. Usually this not were the problem is but the 'new' that couldn't be satisfied. As a rule I always add the UnhandledExceptionHandler to a Thread if nothing else to add logging.

Categories

Resources