Openfire websocket memory leak

Openfire websocket memory leak - java

My openfire server chat reach high CPU and RAM.
It becomes bigger time by time.
It is 4579MB RAM (openfire process 41.3%) at this time, but after 5 minutes, it will be 4600MB (41.5%). Bigger then bigger.
And it will reach ~100% RAM usage at the next morning.
I just used room chat feature.
Concurency ~600 at day, ~300 at night.
Connection timeout 30 min.
Openfire version: 4.2.1
What's my problem? How can I resolve it?
Thanks!

Openfire is written in Java. Java's memory management works in a way that is not compatible with how you analyzed your system. Java's memory usage can be expected to grow over time, up until it uses up a significant amount of memory that is available to it. Only then, a memory cleanup (garbage collect) will take place.
From what you wrote, it is not clear that you actually are experiencing a memory leak. Garbage collections can be infrequent: 5 minutes is not nearly enough time to observer a memory leak.
To get some kind of indication, try observing your process for a number of days. If you see a steady increase in memory usage, when usage patterns are not changed, then you can carefully assume that a memory leak might be present. If you get to that level, you'll need specialized tooling to inspect the state of the Java heap.

Related

Analyze a growing heap during load

My jboss AS5 application is suffering from a weird heap behavior during "peak hour" load. It's constantly growing until it has so little free space to work with that it fails to process new requests.
This would be normal if we were talking about a typical leak, however I've ruled this out by simply lowering the load right before it goes over the threshold where it's not able to process new requests and then - after a cool-down period - restarting another "peak hour". The application (VM) is then able to process the high load as if nothing had happened.
How do I approach this problem? I've tried trial and error (heap sizes, GC, proprietary profiling/timing log) and I've tried hooking up jprofiler but no data of value has been presented to me.
Is there a good way to find out which objects (class) are constantly growing over time? I'm talking around ~200mb/hour.

You should be able to see heap growth using any of the below tools,
Utilities like JVisualVM,jstat (very little overhead)
Profilers like Jprofiler (considerable overhead)
both type of tools should show objects which are taking more heap space and if you continuously monitor the heap growth then you can also find out which objects are growing over period of time.
If you still think these tools are not showing valuable data then go for old fashioned way.
Take heap dumps of your heap regularly after an interval of time. Interval of time should be selected in such way that you get enough number of heap dumps to analyze and should be minimal
Feed those dumps to heap analyzer tools and see the trend for objects growing over period.
I hope this helps.

Garbage Collector going crazy after a few hours

Our JBoss 3.2.6 application server is having some performance issues and after turning on the verbose GC logging and analyzing these logs with GCViewer we've noticed that after a while (7 to 35 hours after a server restart) the GC going crazy. It seems that initially the GC is working fine and doing a GC every hour or so but at a certain point it starts going crazy and performing full GC's every minute. As this only happens in our production environment have not been able to try turning off explicit GCs (-XX:-DisableExplicitGC) or modify the RMI GC interval yet but as this happens after a few hours it does not seem to be caused by the know RMI GC issues.
Any ideas?
Update:
I'm not able to post the GCViewer output just yet but it does not seem to be hitting the max heap limitations at all. Before the GC goes crazy it is GC-ing just fine but when the GC goes crazy the heap doesn't get above 2GB (24GB max).
Besides RMI are there any other ways explicit GC can be triggered? (I checked our code and no calls to System.gc() are being made)

Is your heap filling up? Sometimes the VM will get stuck in a 'GC loop' when it can free up just enough memory to prevent a real OutOfMemoryError but not enough to actually keep the application running steadily.
Normally this would trigger an "OutOfMemoryError: GC overhead limit exceeded", but there is a certain threshold that must be crossed before this happens (98% CPU time spent on GC off the top of my head).
Have you tried enlarging heap size? Have you inspected your code / used a profiler to detect memory leaks?

You almost certainly have a memory leak and the if you let the application server continue to run it will eventually crash with an OutOfMemoryException. You need to use a memory analysis tool - one example would be VisualVM - and determine what is the source of the problem. Usually memory leaks are caused by some static or global objects that never release object references that they store.
Good luck!
Update:
Rereading your question it sounds like things are fine and then suddenly you get in this situation where GC is working much harder to reclaim space. That sounds like there is some specific operation that occurs that consumes (and doesn't release) a large amount of heap.
Perhaps, as #Tim suggests, your heap requirements are just at the threshold of max heap size, but in my experience, you'd need to pretty lucky to hit that exactly. At any rate some analysis should determine whether it is a leak or you just need to increase the size of the heap.

Apart from the more likely event of a memory leak in your application, there could be 1-2 other reasons for this.
On a Solaris environment, I've once had such an issue when I allocated almost all of the available 4GB of physical memory to the JVM, leaving only around 200-300MB to the operating system. This lead to the VM process suddenly swapping to the disk whenever the OS had some increased load. The solution was not to exceed 3.2GB. A real corner-case, but maybe it's the same issue as yours?
The reason why this lead to increased GC activity is the fact that heavy swapping slows down the JVM's memory management, which lead to many short-lived objects escaping the survivor space, ending up in the tenured space, which again filled up much more quickly.

I recommend when this happens that you do a stack dump.
More often or not I have seen this happen with a thread population explosion.
Anyway look at the stack dump file and see whats running. You could easily setup some cron jobs or monitoring scripts to run jstack periodically.
You can also compare the size of the stack dump. If it grows really big you have something thats making lots of threads.
If it doesn't get bigger you can at least see which objects (call stacks) are running.
You can use VisualVM or some fancy JMX crap later if that doesn't work but first start with jstack as its easy to use.

How much time should be spent doing garbage collection

I have an application that is responsible for archiving old applications, which will do a large number of applications at a time and so it will need to run for days at a time.
When my company developed this they did a fair bit of performance testing on it and they seemed to get decent numbers out of this, but I have been running an archive for a customer recently and it seems to be running really slowly and the performance seems to be degrading even more longer it runs.
There does not appear to be a memory leak, as since I have monitoring it with jconsole there still is plenty of memory available and does not appear to be shrinking.
I have noticed however that the survivor space and tenured gen of the heap can very quickly fill up until a garbage collection comes along and clears it out which seems to be happening rather frequently which I am not sure if that could be a source of the apparent slow down.
The application has been running now for 7 days 3 hours and according to jconsole it has spent 6 hours performing copy garbage collection (772, 611 collections) and 12 hours and 25 minutes on marksweep compaction's (145,940 collections).
This seems like a large amount of time to be spent on garbage collection and I am just wondering if anyone has looked into something like this before and knows if this is normal or not?
Edits
Local processing seems to be slow, for instance I am looking at one part in the logs that took 5 seconds to extract some xml from a SOAP envelope using xpath which it then appends to a string buffer along with a root tag.. that's all it does. I haven't profiled it yet, as this is running in production, I would either have to pull the data down over the net or set up a large test base in our dev environment which may end up having to do.
Running Java HotSpot Client VM version 10.0-b23
Really just need high throughput, haven't configured any specific garbage collection parameters, would be running what ever the defaults would be. Not sure how to find what collectors would be in use?
Fix
End up getting a profiler going on it, turned out the cause of the slow down was some code that was constantly trimming lines off a status box outputting logging statements which was pretty badly done. Should have figured the garbage collection was symptom from constantly copying the status text into memory, rather than an actual cause.
Cheers Guys.

According to your numbers, total garbage collection time was about 18 hours out of 7 days execution time. At about 10% of total execution time, that's slightly elevated, but even if you managed to get this down to 0%, you'd only have saved 10% execution time ... so if you're looking for substantial savings, you should better look into the other 90%, for instance with a profiler.

Without proper profiling, this is a guessing game. As an anectode, though, a few years ago a web app I was involved with at the time suddenly slowed down (response time) by a factor of 10 after a JDK upgrade. We ended up chasing it down to an explicit GC invocation added by a genious who was no longer with the company.

There is a balance you will try and maintain between JVM heap footprints and GC time. Another question might be do you have heap (and generations) (under-)allocated in such a way which mandates too frequent GCing. When deploying muti-tenant JVMs on these system, I've tried to maintain the balance to under 5% total GC time along with aggressive heap shrinkage to keep footprint low (again, multi-tenant). Heap and generations will mostly ALWAYS fill as to avoid frequent GCing to whatever it is set. Remove the -Xms parameter to see a more realistic steady state (if it has any idle time)
+1 to the suggestion on profiling though; it may be something not related to GC, but code.

Java Garbage Collection- CPU Spikes - Longer connection establishment times

We have a pool of server that sits behind the load balancer. The machines in this pool does garbage collection every 6 seconds on average. It takes almost half a second to garbage collect. We also see a CPU spike during garbage collection.
The client machines see a spike in average time to make a connection to the server almost 10% during a day.
Theory : CPU is busy doing GC and that's why it cannot allocate a connection faster.
Is it a valid theory?
JVM : IBM
GC algorithm :gencon
Nursery : 5 GB
Heap Size : 18 GB

I'd say with that many allocations all bets are off--it could absolutely get worse over time, I mean if you are doing GC every 6 seconds all day long that seems problematic.
Do you have access to that code? Can it be re-written to reuse objects and be more intelligent about allocation? I've done a few embedded systems and the trick is to NEVER call new once the system is up and running (Quite doable if you have control over the entire system)
If you don't have access to the code, check into some of the GC tuning options available (including the selection of the garbage collector used)--both distributed with the JDK and 3rd party options. You may be able to improve performance with a few command-line modifications.

It's possible I guess.
Given garbage collection is such an intensive process, is there any reason for it to occur every 6 seconds? I'm not familiar with the IBM JVM or the particular collection algorithm you are using so I can't really comment on those. However, there are some good tuning documents provided by Sun (now offered by Oracle) that discuss the different types of collectors and when you would use them. See this link for some ideas.
One way to prove your theory could to be add some code that logs the time a connection was requested and the time when it was actually allocated. If the GC related CPU spikes seem to coincide with longer times in allocating connections, then that'd prove your theory. Your problem will then become how to get around it.

Can Sun JVM handle gigantic heap sizes without problems, and how?

I have heard several people claiming that you can not scale the JVM heap size up. I've heard claims of the practical limit being 4 gigabytes (I heard an IBM consultant say that), 10 gigabytes, 32 gigabytes, and so on... I simply can not believe any of those numbers and have been wondering about the issue now for a while.
So, I have three part question I would hope someone with experience could answer:
Given the following case how would you tune the heap and GC settings?
Would there be noticeable hickups (pauses of JVM etc) that would be noticed by the end users?
Should this really still work? I think it should.
The case:
64 bit platform
64 cores
64 gigabytes of memory
The application server is client facing (ie. Jboss/tomcat web application server) - complete pauses of JVM would probably be noticed by end users
Sun JVM, probably 1.5
To prove I am not asking you guys to do my homework this is what I came up with:
-XX:+UseConcMarkSweepGC -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions -XX:-EliminateZeroing -Xmn768m -Xmx55000m
CMS should reduce the amount of pauses, although it comes with overhead. The other settings for CMS seem to default automatically to the number of CPUs so they seem sane to me. The rest that I added are extras that might do good or bad generally for performance, and they should probably be tested.
Definitely.

I think it's going to be difficult for anybody to give you anything more than general advice, without having further knowledge of your application.
What I would suggest is that you use VisualGC (or the VisualGC plugin for VisualVM) to actually look at what the garbage collection is doing when your app is running. Once you have a greater understanding of how the GC is working alongside your application, it'll be far easier to tune it.

#1. Given the following case how would you tune the heap and GC settings?
First, having 64 gigabytes of memory doesn't imply that you have to use them all for one JVM. Actually, it rather means you can run many of them. Then, it is impossible to answer your question without any access to your machine and application to measure and analyse things (knowing what your application is doing isn't enough). And no, I'm not asking to get access to your environment :)
#2. Would there be noticeable hickups (pauses of JVM etc) that would be noticed by the end users?
The goal of tuning is to find a good compromise between frequency and duration of (major) GCs. With a ~55g heap, GC won't be frequent but will take noticeable time, for sure (the bigger the heap, the longer the major GC). Using a Parallel or Concurrent garbage collector will help on multiprocessor systems but won't entirely solve this issue. Why do you need ~55g (this is mega ultra huge for a webapp IMO), that's my question. I'd rather run many clustered JVMs to handle load if required (at some point, the database will become the bottleneck anyway with a data oriented application).
#3. Should this really still work? I think it should.
Hmm... not sure I get the question. What is "this"? Instantiating a JVM with a big heap? Yes, it should. Is it equivalent to running several JVMs? No, certainly not.
PS: 4G is the maximum theoretical heap limit for the 32-bit JVM running on a 64-bit operating system (see Why can't I get a larger heap with the 32-bit JVM?)
PPS: On 64-bit VMs, you have 64 bits of addressability to work with resulting in a maximum Java heap size limited only by the amount of physical memory and swap space your system provides. (see How large a heap can I create using a 64-bit VM?)

Obviously heap size is not unlimited and the larger is the heap size, the more your JVM will eventually spend on GC. Though I think it is possible to set heap size quite high on 64-bit JVM, I still think it's not really practical. The advice here is better to have several JVMs running with the same parameters i.e. cluster of JBoss/Tomcat nodes running on the same physical machine and you will get better throughput.
EDIT: Also your GC behavior depends on the taxonomy of your heap. If you have a lot of short-living objects and each request to the server creates a lot of those, then your GC will collect a lot of garbage very often and thus on large heap size this will result in longer pauses. If you have very many long-living objects (e.g. caching most of your data in memory) and the amount of short-living objects is not that big, then having bigger heap size is OK.

As Chris Rice already wrote, I wouldn't expect any obvious problems with the GC for heap sizes up to 32-64GB, although there may of course be some point of your application logic, which can cause problems.
Not directly related to GC, but I would still recommend you to perform a realistic load test on your production system. I used to work on a project, where we had a similar setup (relatively large, clustered JBoss/Tomcat setup to serve a public web application) and without exaggeration, JBoss is not behaving very well under high load or with a high number of concurrent calls if you are using EJBs. JBoss is spending a lot of time in synchronized blocks when accessing and managing the EJB instance pools and if you opt for a cluster, it will even wait for intra-cluster network communication within these synchronized blocks. Be especially aware of poorly performing state replication, if you are using SFSBs.

Only to add some more switches I would use by default: -Xms55g can help to reduce the rampup time because it frees Java from the need to check if it can fall back to the initial size and allows also better internal initial sizing of memory areas.
Additionally we made good experiences with NewSize to give you a large young size to get rid of short term garbage: -XX:NewSize=1g Additionally most webapps create a lot of short time garbage that will never survive the request processing. You can even make that bigger. With Xms55g, the VM reserves a large chunk already. Maybe downsizing can help.
-Xincgc helps to clean the young generation incrementally and return the cpu often to the user threads.
-XX:CMSInitiatingOccupancyFraction=70 If you really fill all that memory, try to start CMS garbage collection earlier.
-XX:+CMSIncrementalMode puts the CMS into incremental mode to return the cpu to the user threads more often.
Attach to the process with jstat -gc -h 10 <pid> 1s and watch the GC working.
Will you really fill up the memory? I assume that 64cpus for request processing might even be able to work with less memory. What do you store in there?

Depending on your GC pause analysis, you may wish to implement Incremental mode whereby the long pause may be broken out over a period of time.

I have found memory architecture plays a part in large memory sizes. Applications in general don't perform as well if they use more than one memory bank. The JVM appears to suffer as well, esp the GC which has to sweep the whole memory.
If you have an application which doesn't fit into one memory bank, your application has to pull in memory which is not local to a processor and use memory local to another processor.
On linux you can run numactl --hardware to see the layout of processors and memory banks.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.