wso2 axis2 OutOfMemory few times a day

wso2 axis2 OutOfMemory few times a day - java

What is the possible reason of constant OOM errors?
On our developement environment we have 3 nodes, VM on master node is initialized with: Xms=3GB, Xmx=3GB. Also there are about 30 proxies and 30 endpoints defined. Developers loading their changes (car files) constantly during the day without restarting carbon. Few times a day it freezes. Maybe constant configartion changes kill carbon? On preproduction environment carbon works flawlessly :/
I did heap dump and the 'leak suspect report' result is:
One instance of "org.apache.axis2.context.ConfigurationContext" loaded
by "axis2" occupies 661 590 744 (79,50%) bytes. The memory is
accumulated in one instance of
"java.util.concurrent.ConcurrentHashMap$Segment[]" loaded by "".
Report result:
Histogram:

According to the Heap Dump most of the retained heap is occupied by the ConfigurationContext instances. So this OOM issue seems to occur due to the heavy configuration load in your development system. May be due to large capps deployed frequently by lot of developers. Since that incident does not happen in your production, that ESB never goes OOM.
Thanks & Regards,
Ravindra.

Freezing happens mostly due to full gc cycles happening in the system, You can change your gc settings in order to do a proper gc configuration
For a typical deployment wso2 suggest to have below GC configuration. -Xms512m -Xmx2048m -XX:MaxPermSize=1024m Since you are using 3gb of RAM you have to adjust that based on your scenario. You can refer to https://docs.wso2.com/display/CLUSTER44x/Production+Deployment+Guidelines for more information regarding the deployments
Since you are using Xms=3GB, Xmx=3GB there is a possibility it will wait until the whole 3gb get filled and then do full gc cycles. Therefore better to adjust the Xms value to about 1gb so it motivate to have minor gc cycles and cleanup the unnecessary stuffs rather than waiting for full gc

Related

How to profile spring-boot application memory consumption?

I have a spring-boot app that I suspect might have a memory leak. Over time the memory consumption seems to increase, taking like 500M of memory until I restart the application. After a fresh restart it takes something like 150M. The spring-boot app should be a pretty stateless rest app, and there shouldn't be any objects left around after request is completed. I would wish the garbage collector would take care of this.
Currently on production the spring-boot app seems to use 343M of memory (RSS). I got the heapdump of the application and analysed it. According to the analysis the heapdump is only 31M of size. So where does the missing 300M lie in? How is the heapdump correlated with the actual memory the application is using? And how could I profile the memory consumption past the heapdump? If the memory used is not in the heap, then where is it? How to discover what is consuming the memory of the spring-boot application?

So where does the missing 300M lie in?
A lot of research has gone into this, especially in trying to tune the parameters that control the non-heap. One result of this research is the memory calculator (binary).
You see, in Docker environments with a hard limit on the available amount of memory, the JVM will crash when it tries to allocate more memory than is available. Even with all the research, the memory calculator still has a slack-option called "head-room" - usually set to 5 to 10% of total available memory in case the JVM decides to grab some more memory anyway (e.g. during intensive garbage collection).
Apart from "head-room", the memory calculator needs 4 additional input-parameters to calculate the Java options that control memory usage.
total-memory - a minimum of 384 MB for Spring Boot application, start with 512 MB.
loaded-class-count - for latest Spring Boot application about 19 000. This seems to grow with each Spring version. Note that this is a maximum: setting a too low value will result in all kinds of weird behavior (sometimes an "OutOfMemory: non-heap" exception is thrown, but not always).
thread-count - 40 for a "normal usage" Spring Boot web-application.
jvm-options - see the two parameters below.
The "Algorithm" section mentions additional parameters that can be tuned, of which I found two worth the effort to investigate per application and specify:
-Xss set to 256kb. Unless your application has really deep stacks (recursion), going from 1 MB to 256kb per thread saves a lot of memory.
-XX:ReservedCodeCacheSize set to 64MB. Peak "CodeCache" usage is often during application startup, going from 192 MB to 64 MB saves a lot of memory which can be used as heap. Applications that have a lot of active code during runtime (e.g. a web-application with a lot of endpoints) may need more "CodeCache". If "CodeCache" is too low, your application will use a lot of CPU without doing much (this can also manifest during startup: if "CodeCache" is too low, your application can take a very long time to startup). "CodeCache" is reported by the JVM as a non-heap memory region, it should not be hard to measure.
The output of the memory calculator is a bunch of Java options that all have an effect on what memory the JVM uses. If you really want to know where "the missing 300M" is, study and research each of these options in addition to the "Java Buildpack Memory Calculator v3" rationale.
# Memory calculator 4.2.0
$ ./java-buildpack-memory-calculator --total-memory 512M --loaded-class-count 19000 --thread-count 40 --head-room 5 --jvm-options "-Xss256k -XX:ReservedCodeCacheSize=64M"
-XX:MaxDirectMemorySize=10M -XX:MaxMetaspaceSize=121289K -Xmx290768K
# Combined JVM options to keep your total application memory usage under 512 MB:
-Xss256k -XX:ReservedCodeCacheSize=64M -XX:MaxDirectMemorySize=10M -XX:MaxMetaspaceSize=121289K -Xmx290768K

Besides heap, you have thread stacks, meta space, JIT code cache, native shared libraries and the off-heap store (direct allocations).
I would start with thread stacks: how many threads does your application spawn at peak? Each thread is likely to allocate 1MB for its stack by default, depending on Java version, platform, etc. With (say) 300 active threads (idle or not), you'll allocate 300MB of stack memory.
Consider making all your thread pools fixed-size (or at least provide reasonable upper bounds). Even if this proves not to be root cause for what you observed, it makes the app behaviour more deterministic and will help you better isolate the problem.

We can view how much of memory consumption in spring boot app, in this way.
Create spring boot app as .jar file and execute it using java -jar springboot-example.jar
Now open the CMD and type jconsole and hit enter.
Note :- before opening the jconsole you need to run .jar file
Now you can see a window like below and it will appear application that previously ran in Local Process section.
Select springboot-example.jar and click below connect button.
After it will show the below prompt and give Insecure connection option.
Finally you can see Below OverView (Heap Memory, Threads...).

You can use "JProfiler" https://www.ej-technologies.com/products/jprofiler/overview.html
remotely or locally to monitor running java app memory usage.

You can using "yourkit" with IntelliJ if you are using that as your IDE to troubleshoot memory related issues for your spring boot app. I have used this before and it provides better insight to applications.
https://www.yourkit.com/docs/java/help/idea.jsp

Interesting article about memory profiling: https://www.baeldung.com/java-profilers

Why JVM taking three times more memory in production than localhost?

My webApp in prod (linus, tomcat 7.0.22) takes 390 MB memory where as same takes about 106 MB in local environment. I am continue to investigate heap issues using eclipse MAT but don't understand why same App would consume so drastically different memory in prod ? Any clues will be helpful.

Roughly, a Java application always uses about as much memory as you allocate to it with -Xmx. In production the heap expands to that limit until the garbage collector kicks in. However, that is quite a simplification the rules when the JVM expands or shrinks the heap space are a litte more complex.
The JIT will produce more and larger machine code in production.
So if you just start your application locally and warm it up with a few clicks, there no more object garbage, which keeps the heap expanding. And there is no need to JIT the methods because nobody uses it frequently.

That difference is trivial, and a few hundred megabytes is still reasonably small for a running application. There doesn't seem to be a problem here.

Garbage Collector going crazy after a few hours

Our JBoss 3.2.6 application server is having some performance issues and after turning on the verbose GC logging and analyzing these logs with GCViewer we've noticed that after a while (7 to 35 hours after a server restart) the GC going crazy. It seems that initially the GC is working fine and doing a GC every hour or so but at a certain point it starts going crazy and performing full GC's every minute. As this only happens in our production environment have not been able to try turning off explicit GCs (-XX:-DisableExplicitGC) or modify the RMI GC interval yet but as this happens after a few hours it does not seem to be caused by the know RMI GC issues.
Any ideas?
Update:
I'm not able to post the GCViewer output just yet but it does not seem to be hitting the max heap limitations at all. Before the GC goes crazy it is GC-ing just fine but when the GC goes crazy the heap doesn't get above 2GB (24GB max).
Besides RMI are there any other ways explicit GC can be triggered? (I checked our code and no calls to System.gc() are being made)

Is your heap filling up? Sometimes the VM will get stuck in a 'GC loop' when it can free up just enough memory to prevent a real OutOfMemoryError but not enough to actually keep the application running steadily.
Normally this would trigger an "OutOfMemoryError: GC overhead limit exceeded", but there is a certain threshold that must be crossed before this happens (98% CPU time spent on GC off the top of my head).
Have you tried enlarging heap size? Have you inspected your code / used a profiler to detect memory leaks?

You almost certainly have a memory leak and the if you let the application server continue to run it will eventually crash with an OutOfMemoryException. You need to use a memory analysis tool - one example would be VisualVM - and determine what is the source of the problem. Usually memory leaks are caused by some static or global objects that never release object references that they store.
Good luck!
Update:
Rereading your question it sounds like things are fine and then suddenly you get in this situation where GC is working much harder to reclaim space. That sounds like there is some specific operation that occurs that consumes (and doesn't release) a large amount of heap.
Perhaps, as #Tim suggests, your heap requirements are just at the threshold of max heap size, but in my experience, you'd need to pretty lucky to hit that exactly. At any rate some analysis should determine whether it is a leak or you just need to increase the size of the heap.

Apart from the more likely event of a memory leak in your application, there could be 1-2 other reasons for this.
On a Solaris environment, I've once had such an issue when I allocated almost all of the available 4GB of physical memory to the JVM, leaving only around 200-300MB to the operating system. This lead to the VM process suddenly swapping to the disk whenever the OS had some increased load. The solution was not to exceed 3.2GB. A real corner-case, but maybe it's the same issue as yours?
The reason why this lead to increased GC activity is the fact that heavy swapping slows down the JVM's memory management, which lead to many short-lived objects escaping the survivor space, ending up in the tenured space, which again filled up much more quickly.

I recommend when this happens that you do a stack dump.
More often or not I have seen this happen with a thread population explosion.
Anyway look at the stack dump file and see whats running. You could easily setup some cron jobs or monitoring scripts to run jstack periodically.
You can also compare the size of the stack dump. If it grows really big you have something thats making lots of threads.
If it doesn't get bigger you can at least see which objects (call stacks) are running.
You can use VisualVM or some fancy JMX crap later if that doesn't work but first start with jstack as its easy to use.

Appropriate JVM/GC tuning for 4GB JVM with 3GB cache

I am looking for the appropriate settings to configure the JVM for a web application. I have read about old/young/perm generation, but I have trouble using those parameters at best for this configuration.
Out of the 4 GB, around 3 GB are used for a cache (applicative cache using EhCache), so I'm looking for the best set up considering that. FYI, the cache is static during the lifetime of the application (loaded from disk, never expires), but heavily used.
I have profiled my application already, and I have performed optimization regarding the DB queries, the application's architecture, the cache size, etc... I am just looking for JVM configuration advices here. I have measured 99% throughput for the Garbage Collector, and 6-8s pauses when the Full GC runs (approximately once every 1/2h).
Here are the current JVM parameters:
-XX:+UseParallelGC -XX:+AggressiveHeap -Xms2048m -Xmx4096m
-XX:NewSize=64m -XX:PermSize=64m -XX:MaxPermSize=512m
-verbose:gc -XX:+PrintGCDetails -Xloggc:gc.log
Those parameters may be completely off because they have been written a long time ago... Before the application became that big.
I am using Java 1.5 64 bits.
Do you see any possible improvements?
Edit: the machine has 4 cores.

-XX:+UseParallel*Old*GC should speed up the Full GCs on a multi core machine.
You could also profile with different NewRatio values. Your cached objects will live in the tenured generation so profile it with -XX:NewRatio=7 and then again with some higher and lower values.
You may not be able to accurately replicate realistic use during profiling, so make sure you monitor GC when it is in real life use and then you can make minor changes (e.g. to survivor space etc) and see what effect they have.
Old advice was not to use AggressiveHeap with Xms and Xmx, I am not sure if that is still true.
Edit: Please let us know which OS/hardware platform you are deployed on.
Full collections every 30 mins indicates the old generation is quite full. A high value for newRatio will give it more space at the expense of the young gen. Can you give the JVM more than 4g or are you limited to that?
It would also be useful to know what your goals / non functional requirements are. Do you want to avoid these 6 / 7 second pauses at the risk of lower throughput or are those pauses an acceptable compromise for highest possible throughput?
If you want to minimise the pauses, try the CMS collector by removing both
-XX:+UseParallelGC -XX:+UseParallelOldGC
and adding
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
Profile with that with various NewRatio values and see how you get on.
One downside of the CMS collector is that unlike the parallel old and serial collectors, it doesn't compact the old generation. If the old generation gets too fragmented and a minor collection needs to promote a lot of objects to the old gen at once, a full serial collection may be invoked which could mean a long pause. (I've seen this once in prod but with the IBM JVM which went out of memory instead of invoking a compacting collection!)
This might not be a problem for you - it depends on the nature of the application - but you can insure against it by restarting nightly or weekly.

I would use Java 6 update 30 or 7 update 2, 64-bit as they are much more efficient. e.g. they use 32-bit references by default.
I would also configure Ehcache to use direct memory or a memory mapped file if possible. This should minimise the impact on GC.
Using these options its possible to almost eliminate your heap foot print. e.g. I have an app which uses up to 180 GB of memory mapped files on a machine with 16 GB of memory and the heap size is 6 MB. A full GC takes up to 11 ms when trigger manually, not that it ever GCs. ;)
If you want a simple example where I map in an 8 TB file into memory and update it. http://vanillajava.blogspot.com/2011/12/using-memory-mapped-file-for-huge.html

I hope you just removed -server to not inflate the post, otherwise you should instantly enable it. Apart from the bit longer startup time (which really isn't an issue for a web application that should run days) I don't see any reason to use anything but c2. That could give some nice performance improvements in general. Umn back to topic:
Sadly the best thing I can think of won't work with your ancient JVM. The G1 garbage collector was basically designed to reduce latency. Not only does it try to reduce pauses in general, it also offers some tuning parameters to set pause goals and intervals. See this page.
There is an experimental backport to java6 though I doubt it's kept up to date. And nobody is wasting any time on optimizing GC or anything else for Java 1.5 anymore I fear.
PS: There would also be IBM's JVM and obviously azul systems (ok that wasn't a serious proposition ;) ), but those are obviously out of the question.. just wanted to mention them.

Full GC becoming very frequent

I've got a Java webapp running on one tomcat instance. During peak times the webapp serves around 30 pages per second and normally around 15.
My environment is:
O/S: SUSE Linux Enterprise Server 10 (x86_64)
RAM: 16GB
server: Tomcat 6.0.20
JVM: Java HotSpot(TM) 64-Bit Server VM 1.6.0_14
JVM options:
CATALINA_OPTS="-Xms512m -Xmx1024m -XX:PermSize=128m -XX:MaxPermSize=256m
-XX:+UseParallelGC
-Djava.awt.headless=true
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
JAVA_OPTS="-server"
After a couple of days of uptime the Full GC starts occurring more frequently and it becomes a serious problem to the application's availability. After a tomcat restart the problem goes away but, of course, returns after 5 to 10 or 30 days (not consistent).
The Full GC log before and after a restart is at http://pastebin.com/raw.php?i=4NtkNXmi
It shows a log before the restart at 6.6 days uptime where the app was suffering because Full GC needed 2.5 seconds and was happening every ~6 secs.
Then it shows a log just after the restart where Full GC only happened every 5-10 minutes.
I've got two dumps using jmap -dump:format=b,file=dump.hprof PID when the Full GCs where occurring (I'm not sure whether I got them exactly right when a Full GC was occurring or between 2 Full GCs) and opened them in http://www.eclipse.org/mat/ but didn't get anything useful in Leak Suspects:
60MB: 1 instance of "org.hibernate.impl.SessionFactoryImpl" (I use hibernate with ehcache)
80MB: 1,024 instances of "org.apache.tomcat.util.threads.ThreadWithAttributes" (these are probably the 1024 workers of tomcat)
45MB: 37 instances of "net.sf.ehcache.store.compound.impl.MemoryOnlyStore" (these should be my ~37 cache regions in ehcache)
Note that I never get an OutOfMemoryError.
Any ideas on where should I look next?

When we had this issue we eventually tracked it down to the young generation being too small. Although we had given plenty of ram the young generation wasn't given it's fair share.
This meant that small garbage collections would happen more frequently and caused some young objects to be moved into the tenured generation meaning more large garbage collections also.
Try using the -XX:NewRatio with a fairly low value (say 2 or 3) and see if this helps.
More info can be found here.

I've switched from -Xmx1024m to -Xmx2048m and the problem went away. I now have 100 days of uptime.

Beside tuning the various options of JVM I would also suggest to upgrade to a newer release of the VM, because later versions have much better tuned garbage collector (also without trying the new experimental one).
Beside that also if it's (partially) true that assigning more ram to JVM could increase the time required to perform GC there is a tradeoff point between using the whole 16 GB of memory and increasing your memory occupation, so you can try double all values, to start
Xms1024m -Xmx2048m -XX:PermSize=256m -XX:MaxPermSize=512m
Regards
Massimo

What might be happening in your case is that you have a lot of objects who live a little longer than NewGen life cycle. If survivor space is too small, they go straight to the OldGen. -XX:+PrintTenuringDistribution could provide some insight. Your NewGen is large enough, so try decreasing SurvivorRatio.
also, jconsole will probably provide more visual insight into what happens with your memory, try it.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.