Analyze a growing heap during load

Analyze a growing heap during load - java

My jboss AS5 application is suffering from a weird heap behavior during "peak hour" load. It's constantly growing until it has so little free space to work with that it fails to process new requests.
This would be normal if we were talking about a typical leak, however I've ruled this out by simply lowering the load right before it goes over the threshold where it's not able to process new requests and then - after a cool-down period - restarting another "peak hour". The application (VM) is then able to process the high load as if nothing had happened.
How do I approach this problem? I've tried trial and error (heap sizes, GC, proprietary profiling/timing log) and I've tried hooking up jprofiler but no data of value has been presented to me.
Is there a good way to find out which objects (class) are constantly growing over time? I'm talking around ~200mb/hour.

You should be able to see heap growth using any of the below tools,
Utilities like JVisualVM,jstat (very little overhead)
Profilers like Jprofiler (considerable overhead)
both type of tools should show objects which are taking more heap space and if you continuously monitor the heap growth then you can also find out which objects are growing over period of time.
If you still think these tools are not showing valuable data then go for old fashioned way.
Take heap dumps of your heap regularly after an interval of time. Interval of time should be selected in such way that you get enough number of heap dumps to analyze and should be minimal
Feed those dumps to heap analyzer tools and see the trend for objects growing over period.
I hope this helps.

Related

Java Heap Dump : How to find the objects/class that is taking memory by 1. io.netty.buffer.ByteBufUtil 2. byte[] array

I found that one of my spring boot project's memory (RAM consumption) is increasing day by day. When I uploaded the jar file to the AWS server, it was taking 582 MB of RAM (Max Allocated RAM is 1500 MB), but each day, the RAM is increasing by 50MB to 100 MB and today after 5 days, it's taking 835 MB. Right now the project is having 100-150 users and with normal usage of Rest APIs.
Because of this increase in the RAM, couple of times the application went down with the following error (error found from the logs):
Exception in thread "http-nio-3384-ClientPoller" java.lang.OutOfMemoryError: Java heap space
So to resolve this, I found that by using JAVA Heap Dump, I can find the objects/classes that are taking the memory. So by using Jmap in the command line, I've created a heap dump and uploaded it to Heap Hero and Eclipse Memory Analyzer Tool. In both of them I found the following:
1. Total Waste memory is: 64.69MB (73%) (check below screenshot)
2. Out of these, 34.06MB is taken by Byte [] array and LinkedHashmap[] (check below screenshot), which I have never used in my whole project. I searched for it in my project but didn't found.
3. Following 2 large objects taking 32 MB and 20 MB respectively.
1. Java Static io.netty.buffer.ByteBufUtil.DEFAULT_ALLOCATOR
2. Java Static com.mysql.cj.jdbc.AbandonedConnectionCleanupThread.connectionFinalizerPhantomRefs`
So I tried to find this netty.buffer. in my project, but I don't find anything which matched with netty or buffer.
Now my question is how can I reduce this memory leak or how can I find the exact memory consumption objects/class/variable so that I can reduce the heap size.
I know few of the experts will ask for the source code or anything similar to that but I believe that from the heap dump we can find the memory leak or live objects that are available in the memory. I am looking for that option or anything that reduces this heap dump!
I am working on this issue for the past 3 weeks. Any help would be appreciated.
Thank you!

Start with enabling the JVM native memory tracker to get an idea which part of the memory is increasing by adding the flag -XX:NativeMemoryTracking=summary. There is some performance overhead according to the documentation (5-10%), but if this isn't a issue I would recommend running the JVM with this flag enabled even in production.
Then you can check the values using jcmd <PID> VM.native_memory (there's a good writeup in this answer: Java native memory usage)
If there is indeed a big chunk of native memory allocated, it's likely this is allocated by Netty.
How do you run your application in AWS? If it's running in a Docker image, you might have stumbled upon this issue: What would cause a java process to greatly exceed the Xmx or Xss limit?
If this is the case, you may need to set the environment variable MALLOC_ARENA_MAX if your application is using native memory (which Netty does) and running on a server with a large number of cores. It's perfectly possible that the JVM allocates this memory for Netty but doesn't see any reason to release it, so it will appear to only continue to grow.
If you want to control how much native memory can be allocated by Netty, you can use the JVM flag -XX:MaxDirectMemorySize for this (I believe the default is the same value as Xmx) and lower it in case you application doesn't require that much memory.
JVM memory tuning is a complex process and it becomes even more complex when native memory is involved - as the linked answer shows it's not as easy as simply setting the Xms and Xmx flag and expecting that no more memory will be used.

Heap dump is not enough to detect memory leaks.
You need to look at the difference of two consecutive heaps snapshots both taken after calling the GC.
Or you need a profiling tool that can give the generations count for each class.
Then you should only look at your domain objects (not generic objects like bytes or strings ...etc) that survived the GC and passed from the old snapshot to the new one.
Or, if using the profiling tool, look for old domain objects that still alive and growing for many generations.
Having objects lived for many generations and keeps growing means those objects are still refernced and the GC is not able to reclaim them. However, living for many generations alone is not enough to cause a leak because cached or static Objects may stay for many generations. The other important factor is that they keep growing.
After you detected what object is being leaked, you may use heap dumb to analyse those objects and get the references.

java heap and thread analysis for memory leak

My WebLogic server was configured with 16gb of heap space, but it was 90% used within 1 hour of production usage when most of the users started work. I observed there were several stuck threads whenever this happens.
I have captured the heap dump when the heap was approx 10% free. How do I inspect the heap dump to find out the memory leak, or process, codes which is causing this issue.
I have tried to understand the memory leak, running tools like JMap and Eclipse MAT, but it maybe due to lack of experience, I couldn't understand what these tools are trying to show. Or how/what should I look out for?
I have both the before/after GC heap dump to analyze.
I have reviewed the thread dumps, there were no "waiting to lock" objects threads, the threads were similar as shown below, with threads stuck with no obvious reasons.

According to your heap dump, your biggest memory issue is the int arrays, indeed it takes nearly 70 % of your heap (Yes sort the Size Column instead).
Select it in your heap dump, right click and select on Show in Instances View
Then browse the biggest objects and for each of them right click and select Show Nearest GC Root to see which Object has still an hard reference to the int array which prevents to be eligible for the GC.
It could help you to find your memory leak assuming that it is a memory leak.
See below an example of Nearest GC Root allowing to identify a leak that I added intentionally to my program just to show the idea. As you can see in the screenshot, I have an array of int which cannot be eligible for the GC because it is stored in an HashMap called leak in my class Application, so I know that my memory issue could be due to this particular HashMap especially if I have many other objects which lead to this HashMap.
NB: Be patient when you try to identify a leak as it is not always obvious, the ideal situation is where you have a huge object that takes the whole heap but obviously it is not your case there is nothing really obvious that is the reason why I propose to investigate the int arrays first. Don't forget that it could also be little int arrays but thousands of them with the same Nearest GC Root.
Another trick, If you have JProfiler you can simply follow this wonderful tutorial to find your leak.
Response Update:
One simple way to better identify the root cause of the memory leak is to take at least 2 heap dumps then compare them using a tool like jhat with the syntax
jhat -J-Xmx2G -baseline ${path-to-the-first-heap-dump} ${path-to-the-second-heap-dump}
It will launch a small HTTP sever on port 7000 so:
Launch http://localhost:7000/
Then click on Show instance counts for all classes (including platform)
You will then see the list of Classes ordered by total amount of new instances created. You can then use VisualVM to do what I described in the first part of my answer to find the root cause of your memory leak.
You can also use jhat
By selecting of the Top Classes then for each of them
click on one "Reference to this Object"
then click on Exclude weak refs
You will then see the GC root of each instances like the next screenshot:
Another way is to use Eclipse Memory Analyzer also called MAT.
Open the second snapshot with it
Select the view histogram
Then for each of the Top Classes right click
Choose Merge Shortest Paths To GC Roots/ Exclude All references
you will then see something like the next screenshot:

The JDK's "jmap -histo" command will dump object counts/bytes for all classes to a text file. If you capture/compare a few of these dumps over time, you will see which ones grow continually -- your memory leak. The overhead of -histo is much lower than that of capturing a full heap dump.
Comparing just a few dumps (like the python script detailed here) seems like too small of a sample, so I wrote an open-source tool (here) that runs this jmap -histo command in the background (at an interval). It has a live display and tracks the % of time that the byte count for each class is on the rise.

It seems you, probably, have a memory leak situation. Your best approach is to use Java Mission Control with Flight Recorder to get the class and method leaking.
You should setup your weblogic managed server with the following parameters:
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=8999
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-XX:+UnlockCommercialFeatures
-XX:+FlightRecorder
When you set this up, follow the instructions here to detect the leak.
Hope it helps !!

I am one of the developers of the tool called Plumbr. Among other things we make an automatic analysis of heap contents in case of excessive memory usage. You may find it useful.

Per your comments: you have Java 7 with 16GB heap, no GC algorithm explicitly specified, so default for Java 7 is Throughput GC, which is not suitable for most web apps, for it leads to long GC pauses for big heaps.
Switch to ConcurrentMarkSweep GC, this way GC will not wait till your memory fills up and will try its best to collect garbage incrementally, so that you will have fewer Stop The World pauses.

Did you try yourkit profiler? It's not free, but you can evaluate it for 30 days. In this case if you dump contains all object (not only live), you will be able to check roots for them as well. Because it could be that you don't have memory leak, but too big memory footprint. Also it would be great to enable GC logs and parse how much FullGC pauses do you have:
grep "Full GC" jvm_gc.log | wc -l
In ideal world it should be 0 :)
Btw, whole this article could be helpful for you.

UsageMemory threashold in JConsole

I am looking into how to use JConsole to detect memory leaks.
I see that in Memory Pool in my MBeans I can define UsageThreashold for my Tenured Generation.
So if my application exceeds this threashold the heap memory becomes red in the Memory tab.
Question: How does this help? I mean how am I supposed to use this setting to analyze my memory? How am I supposed to figure out this value?

In my opinion I don't think that UsageThreashold parameter is the most helpful for you to detect memory leaks (but if someone knows some tricks with it, please do share). In my experience that parameter is more helpful to visually understand if my application is getting way too near my max heap size and I'm in danger of getting an OutOfMemoryException.
Still regarding using JConsole to search for memory leaks, I don't think there's a silver bullet for the process. But what I usually do is the following:
If exists a memory leak, it means that the objects (the ones that are leaking) won't get collected, hence, your Tenured Generation won't fully recover after any amount of GCs.
With the application running I connect JConsole and try to spot a leak by observing the memory tab, if after several computations of my application and also after various GCs occurring (including pressing the Perform GC button, which will result in a full gc) the memory never goes below, or at least to the memory value, it started tracking there's a great possibility that something is leaking. When the leak is big, you can even see a "stair graph" pattern in your memory.
Keep in mind that if your application has long computations running, which may consume memory this analyzes must be done carefully. You must understand when those processes have finished. For example, just run one of those computations and track the total evolution of memory, before, during and afterwards.
Also, I suggest you to try visualVM instead, because it also allows you to create heap dumps, which you can use in order to understand which objects are still in memory and explore the references graph to understand why they are not being collected.

you can use JMAP to see the histogram and/or to create heap dumps and study your memory consumption with tools like Eclipse MAT or YourKit.
JConsole is used more for monitoring and running MBeans and less for analysis and in my expirence JVisualvm is better for that since you can use it for sampling your code and see what methods are CPU consuming.

Garbage Collector going crazy after a few hours

Our JBoss 3.2.6 application server is having some performance issues and after turning on the verbose GC logging and analyzing these logs with GCViewer we've noticed that after a while (7 to 35 hours after a server restart) the GC going crazy. It seems that initially the GC is working fine and doing a GC every hour or so but at a certain point it starts going crazy and performing full GC's every minute. As this only happens in our production environment have not been able to try turning off explicit GCs (-XX:-DisableExplicitGC) or modify the RMI GC interval yet but as this happens after a few hours it does not seem to be caused by the know RMI GC issues.
Any ideas?
Update:
I'm not able to post the GCViewer output just yet but it does not seem to be hitting the max heap limitations at all. Before the GC goes crazy it is GC-ing just fine but when the GC goes crazy the heap doesn't get above 2GB (24GB max).
Besides RMI are there any other ways explicit GC can be triggered? (I checked our code and no calls to System.gc() are being made)

Is your heap filling up? Sometimes the VM will get stuck in a 'GC loop' when it can free up just enough memory to prevent a real OutOfMemoryError but not enough to actually keep the application running steadily.
Normally this would trigger an "OutOfMemoryError: GC overhead limit exceeded", but there is a certain threshold that must be crossed before this happens (98% CPU time spent on GC off the top of my head).
Have you tried enlarging heap size? Have you inspected your code / used a profiler to detect memory leaks?

You almost certainly have a memory leak and the if you let the application server continue to run it will eventually crash with an OutOfMemoryException. You need to use a memory analysis tool - one example would be VisualVM - and determine what is the source of the problem. Usually memory leaks are caused by some static or global objects that never release object references that they store.
Good luck!
Update:
Rereading your question it sounds like things are fine and then suddenly you get in this situation where GC is working much harder to reclaim space. That sounds like there is some specific operation that occurs that consumes (and doesn't release) a large amount of heap.
Perhaps, as #Tim suggests, your heap requirements are just at the threshold of max heap size, but in my experience, you'd need to pretty lucky to hit that exactly. At any rate some analysis should determine whether it is a leak or you just need to increase the size of the heap.

Apart from the more likely event of a memory leak in your application, there could be 1-2 other reasons for this.
On a Solaris environment, I've once had such an issue when I allocated almost all of the available 4GB of physical memory to the JVM, leaving only around 200-300MB to the operating system. This lead to the VM process suddenly swapping to the disk whenever the OS had some increased load. The solution was not to exceed 3.2GB. A real corner-case, but maybe it's the same issue as yours?
The reason why this lead to increased GC activity is the fact that heavy swapping slows down the JVM's memory management, which lead to many short-lived objects escaping the survivor space, ending up in the tenured space, which again filled up much more quickly.

I recommend when this happens that you do a stack dump.
More often or not I have seen this happen with a thread population explosion.
Anyway look at the stack dump file and see whats running. You could easily setup some cron jobs or monitoring scripts to run jstack periodically.
You can also compare the size of the stack dump. If it grows really big you have something thats making lots of threads.
If it doesn't get bigger you can at least see which objects (call stacks) are running.
You can use VisualVM or some fancy JMX crap later if that doesn't work but first start with jstack as its easy to use.

Detecting memory-leak programmatically

If, on purpose, I create an application that crunches data while suffering from memory-leaks, I can notice that the memory as reported by, say:
Runtime.getRuntime().freeMemory()
starts oscillating between 1 and 2 MB of free memory.
The application then enters a loop that goes like this: GC, processing some data, GC, etc. but because the GC happens so often, the application basically isn't doing much else anymore. Even the GUI takes age to respond (and, no, I'm not talking about EDT issues here, it's really the VM basically stuck in some endless GC'ing mode).
And I was wondering: is there a way to programmatically detect that the JVM doesn't have enough memory anymore?
Note that I'm not talking about ouf-of-memory errors nor about detecting the memory leak itself.
I'm talking about detecting that an application is running so low on memory that it is basically calling the GC all the time, leaving hardly any time to do something else (in my hypothetical example: crunching data).
Would it work, for example, to repeatedly read how much memory is available during, say, one minute, and see that if the number has been "oscillating" between different values all below, say, 4 MB, conclude that there's been some leak and that the application has become unusable?

And I was wondering: is there a way to programmatically detect that the JVM doesn't have enough memory anymore?
I don't think so. You can find out roughly how much heap memory is free at any given instant, but AFAIK you cannot reliably determine when you are running out of memory. (Sure, you can do things like scraping the GC log files, or trying to pick patterns in the free memory oscillations. But these are likely to be unreliable and fragile in the face of JVM changes.)
However, there is another (and IMO better) approach.
In recent versions of Hotspot (version 1.6 and later, I believe), you can tune the JVM / GC so that it will give up and throw an OOME sooner. Specifically, the JVM can be configured to check that:
the ratio of free heap to total heap is greater than a given threshold after a full GC, and/or
the time spent running the GC is less than a certain percentage of the total.
The relevant JVM parameters are "UseGCOverheadLimit", "GCTimeLimit" and "GCHeapFreeLimit". Unfortunately, Hotspot's tuning parameters are not well documented on the public web, but these ones are all listed here.
Assuming that you want your application to do the sensible thing ... give up when it doesn't have enough memory to run properly anymore ... then just launch the JVM with a smaller "GCTimeLimitor" or "GCHeapFreeLimit" than the defaults.
EDIT
I've discovered that the MemoryPoolMXBean API allows you to look at the peak usage of individual memory pools (heaps), and set thresholds. However, I've never tried this, and the APIs have lots of hints that suggest that not all JVMs implement the full API. So, I would still recommend the HotSpot tuning option approach (see above) over this one.

You can use getHeapMemoryUsage.

I see two attack vectors.
Either monitor your memory consumption.
When you more or less constantly use lots of the available memory it is very likely that you have a memory leak (or are just using too much memory). The vm will constantly try to free some memory without much success => constant high memory usage.
You need to distinguish that from a large zigzag pattern which happens often without being an indicator of memory problem. Basically you use more an more memory, but when gc finds time to do its job it finds lots of garbage to bring out, so everything is fine.
The other attack vector is to monitor how often and what kind of success the gc runs. If it runs often with only small gains in memory, it is likely you have a problem.
I don't know if you can access this kind of information directly from your program. But if nothing else I think you can specify parameters on startup which makes the gc log information into a file which in turn could get parsed.

What you could do is spawn a thread that wakes up periodically and calculates the amount of used memory and records the result. Then you can do regression analysis on the result to estimate the rate of memory growth in your application. If you know the rate of growth, and the maximum amount of memory, you can predict (with some confidence) when your application will run out of memory.

You can pass arguments to your java virtual machine that gives you GC diagnostics such as
-verbose:gc This flag turns on the logging of GC information. Available
in all JVMs.
-XX:+PrintGCTimeStamps Prints the times at which the GCs happen
relative to the start of the
application.
If you capture that output in a file, in your application you can periodcly read that file and parse it to know when the GC has happened. So you can work out the average time between every GC

I think the JVM does exactly this for you and throws java.lang.OutOfMemoryError: GC overhead limit exceeded. So if you catch OutOfMemoryError and check for that message then you have what you want, don't you?
See this question for more details

i've been using plumbr for memory leak detection and it's been a great experience, though the licence is very expensive: http://plumbr.eu/

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.