Direct buffer memory OutOfMemoryError after updating to wildfly 18 - java

After updating the environment from Wildfly 13 to Wildfly 18.0.1 we experienced an
A channel event listener threw an exception: java.lang.OutOfMemoryError: Direct buffer memory
at java.base/java.nio.Bits.reserveMemory(Bits.java:175)
at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118)
at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317)
at org.jboss.xnio#3.7.3.Final//org.xnio.BufferAllocator$2.allocate(BufferAllocator.java:57)
at org.jboss.xnio#3.7.3.Final//org.xnio.BufferAllocator$2.allocate(BufferAllocator.java:55)
at org.jboss.xnio#3.7.3.Final//org.xnio.ByteBufferSlicePool.allocateSlices(ByteBufferSlicePool.java:162)
at org.jboss.xnio#3.7.3.Final//org.xnio.ByteBufferSlicePool.allocate(ByteBufferSlicePool.java:149)
at io.undertow.core#2.0.27.Final//io.undertow.server.XnioByteBufferPool.allocate(XnioByteBufferPool.java:53)
at io.undertow.core#2.0.27.Final//io.undertow.server.protocol.http.HttpReadListener.handleEventWithNoRunningRequest(HttpReadListener.java:147)
at io.undertow.core#2.0.27.Final//io.undertow.server.protocol.http.HttpReadListener.handleEvent(HttpReadListener.java:136)
at io.undertow.core#2.0.27.Final//io.undertow.server.protocol.http.HttpReadListener.handleEvent(HttpReadListener.java:59)
at org.jboss.xnio#3.7.3.Final//org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)
at org.jboss.xnio#3.7.3.Final//org.xnio.conduits.ReadReadyHandler$ChannelListenerHandler.readReady(ReadReadyHandler.java:66)
at org.jboss.xnio.nio#3.7.3.Final//org.xnio.nio.NioSocketConduit.handleReady(NioSocketConduit.java:89)
at org.jboss.xnio.nio#3.7.3.Final//org.xnio.nio.WorkerThread.run(WorkerThread.java:591)
Nothing was changed on the application side. I looked at the Buffer pools and it seems that some resources are not freed. I triggered several manual GCs but nearly nothing happens. (Uptime 2h)
Before in the old configuration it looked like this (Uptime >250h):
Now I did a lot of research and the closest thing I could find is this post here on SO. However this was in combination with websockets but there are no websockets in use.
I read several (good) articles (1,2,3,4,5,6) and watched this video about the topic.
The following things I tried but nothing had any effect:
The OutOfMemoryError occurred at 5GB since the heap is 5GB => I reduced the MaxDirectMemorySize to 512m and then 64m but then the OOM just occurs quicker
I set -Djdk.nio.maxCachedBufferSize=262144
I checked the number of IO workers: 96 (6cpus*16) which seems reasonable. The system has usually short lived threads (largest pool size was 13). So it could not be the number of workers I guess
I switched back to ParallelGC since this was default in Java8. Now when doing a manual GC at least 10MB are freed. For GC1 nothing happens at all. But still both GCs cannot clean up.
I removed the <websockets> from the wildfly configuration just to be sure
I tried to emulate it locally but failed.
I analyzed the heap using eclipseMAT and JXRay but it just points to some internal wildfly classes.
I reverted Java back to version 8 and the system still shows the same behavior thus wildfly is the most probable suspect.
In eclipseMAT one could also find these 1544 objects. They all got the same size.
The only thing what did work was to deactivate the bytebuffers in wildfly completely.
/subsystem=io/buffer-pool=default:write-attribute(name=direct-buffers,value=false)
However from what I read this has a performance drawback?
So does anyone know what the problem is? Any hints for additional settings / tweaks? Or was there a known Wildfly or JVM bug related to this?
Update 1: Regarding the IO threads - maybe the concept is not 100% clear to me. Because there is the ioThreads value
And there are the threads and thread pools.
From the definition one could think that per worker thread the number of ioThreads is created (in my case 12)? But still the number of threads / workers seems quite low in my case...
Update 2: I downgraded java and it still shows the same behavior. Thus I suspect wildfly to be cause of the problem.

Probably its a Xnio problem. Look at this issue https://issues.redhat.com/browse/JBEAP-728

After lots of analyzing, profiling etc. I draw the following conclusion:
The cause of the OOM is caused by wildfly in version 18.0.1. It also exists in 19.1.0 (did not test 20 or 21)
I was able to trigger the OOM fairly quickly when setting the -XX:MaxDirectMemorySize to values like 512m or lower. I think many people don't experience the problem since by default this value is equals the Xmx value which can be quite big. The problems occurs when using the ReST API of our application
As Evgeny indicated XNIO is a high potential candidate since when profiling it narrowed down to (or near) that area...
I didn't have the time to investigate further so I tried wildfly 22 and there it is working. This version is using the latest xnio package (3.8.4)
the DirectMemory remains quite low in WF 22. It is around 10mb. One can see the count rising and falling which wasn't the case before
So the final fix is to update to wildfly version 22.0.1 (or higher)

Related

Why does Orbeon create such a large org.orbeon.oxf.cache.MemoryCacheImpl object?

I'm running on old version of Orbeon Forms, 3.9.0.rc1.201103220245 CE to be precise. Lately I've regularly seen the JVM running it throwing OutOfMemoryError: Java Heap Space errors. I analyzed a heap dump, and found that an instance of org.orbeon.oxf.cache.MemoryCacheImpl had by far the largest retained size. I've researched, but did not found any bug report associated with it. One thing I found was a post on the Orbeon mailing list that may be related: http://discuss.orbeon.com/Unusually-large-number-of-blocking-threads-td2338846.html . Does anyone know if this issue is fixed in later versions of Orbeon?
Thanks,
-Michiel
Without spending a significant time analyzing the heap dump, it is very hard to say anything conclusive. Orbeon Forms maintains a number of caches, MemoryCacheImpl being one of them, and to run efficient it needs fair amount of memory.
In some cases, that might be "too much" memory. If it is a problem, it could be caused by of a bug (memory leak), a problem with the form or application running on top of Orbeon Forms, or something that could be done more efficiently in Orbeon Forms. At this point, there are no open known issues in the "leak" category, but a number of things could be done more efficiently. Quite a few things have already been improved since 3.9, but there is still work to do, and this is one of the items on the Orbeon Forms roadmap.
So, I would recommend you try to upgrade to the latest stable version (4.5 as of this writing), see of you're still seeing the same problem. If you are, see if there is anything in your app/forms that could be causing this, see if you can increase the heap size, or reduce the cache sizes.

How to handle thousands of threads in Java without using the new java.util.concurrent package

I have a situation in which I need to create thousands of instances of a class from third party API. Each new instance creates a new thread. I start getting OutOfMemoryError once threads are more than 1000. But my application requires creating 30,000 instances. Each instance is active all the time. The application is deployed on a 64 bit linux box with 8gb RAM and only 2 gb available to my application.
The way the third party library works, I cannot use the new Executor framework or thread pooling.
So how can I solve this problem?
Note that using thread pool is not an option. All threads are running all the time to capture events.
Sine memory size on the linux box is not in my control but if I had the choice to have 25GB available to my application in a 32GB system, would that solve my problem or JVM would still choke up?
Are there some optimal Java settings for the above scenario ?
The system uses Oracle Java 1.6 64 bit.
I concur with Ryan's Answer. But the problem is worse than his analysis suggests.
Hotspot JVMs have a hard-wired minimum stack size - 128k for Java 6 and 160k for Java 7.
That means that even if you set the stack size to the smallest possible value, you'd need to use roughly twice your allocated space ... just for thread stacks.
In addition, having 30k native threads is liable to cause problems on some operating systems.
I put it to you that your task is impossible. You need to find an alternative design that does not require you to have 30k threads simultaneously. Alternatively, you need a much larger machine to run the application.
Reference: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2012-June/003867.html
I'd say give up now and figure another way to do it. Default stack size is 512K. At 30k threads, that's 15G in stack space alone. To fit into 2G, you'll need to cut it down to less than 64K stacks, and that leaves you with zero memory for the heap, including all the Thread objects, or the JVM itself.
And that's just the most obvious problem you're likely to run into when running that many simultaneous threads in one JVM.
I think we are missing lots of details, but would a distributed plateform would work? Each of individual instances would manage a range of your classes instances. Those plateform could be running on different pcs or virtual machines and communicate with each other?
I had the same problem with an SNMP provider that required a thread for each outstanding get (I wanted to have tens of thousands of outstanding gets going on at once). Now that NIO exists I'd just rewrite the library myself if I had to do this again.
You cannot solve it in "Java Code" or configuration. Windows chokes at around 2-3000 threads in my experience (this may have changed in later versions). When I was doing this I surprisingly found that Linux supported even less threads (around 1000).
When the system stops supplying threads, "Out of Memory" is the exception you should expect to see--so I'm sure that's it--I started getting this exception long before I ran out of memory. Perhaps you could hack linux somehow to support more, but I have no idea how.
Using the concurrent package will not help here. If you could switch over to "Green" threads it might, but that might take recompiling the JVM (it would be nice if it was available as a command line switch, but I really don't think it is).

java.net.SocketException: Cannot allocate memory (not Mac)

I have a java app deployed on tomcat 7 running on ubuntu 10.04. There's been an issue during opening a server socket which I couldn't reproduce so far:
java.net.SocketException: Cannot allocate memory
at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.AbstractPlainSocketImpl.bind(Unknown Source)
at java.net.ServerSocket.bind(Unknown Source)
at org.subethamail.smtp.server.SMTPServer.createServerSocket(SMTPServer.java:338)
at org.subethamail.smtp.server.SMTPServer.start(SMTPServer.java:291)
All I've been able to find out is that this happens on some specific version of MacOS which is not relevant for me, and also on OpenJDK, which is not relevant either (I'm using Oracle JRE 1.7.0_17). Another possible reason is a virtualization environment, but in my case this happens on a hardware box.
So, the question is, has anyone ever faced the same problem and what could be a possible solution.
Update
There's been also this thing: tomcat consumed almost all of the heap, approximately 700mb, it's been caused by a memory leak in my code.
But as far as I understand, the exception tells about a socket buffer on system level, so it doesn't seem to be related to java heap. However, this is the only explanation I've got so far and it's very illusive in my opinion.
Update 2
Eventually we've been able to reproduce the issue several times, so this was not about memory leaks. I was considering authbind as a possible source of the problem when I faced it for the first time, but unfortunately I haven't paid much attention to it. When I got another hardware box affected by the problem, I tried to bind non-priveleged port and succeeded, while attempts to bind priveleged ports lead to exceptions. So, eventually I've replaced authbind with iptables.
Basically, fady taher's answer points to authbind, but Danny Thomas's answer provides very interresting information about connections between forking and "Cannot allocate memory",
actually we also use process builder to run bash scripts, so there is a great chance that the problem could be caused by it.
Sounds like you have insufficient physical memory or swap - on the systems affected, check memory and swap.
Does your application happen to execute external commands - fork/exec could be contributing. You might consider allowing memory overcommit, if that's the case:
http://bryanmarty.com/blog/2012/01/14/forking-jvm/
Please also check the following items:
Run a memory test to eliminate bad memory blocks
Run disk check on swap partition (the equivalent of which on Mac OS)
Check user resource limits (ulimit)
dunno if it helps, but check
Memory error when trying to change Apache Tomcat port from 8080 to 80
You can extend java heap space check for it.
If the project you are developing has been created by another java version, that issue might happen.
Try reducing the memory allocated to Tomcat (the -Xmx parameter in catalina.sh). Also increase the maximum heap size of tomcat. If it doesn’t solve it, you have to find memory leaks in your code, one such tool is java melody , use it and find the memory leakage to solve the issue.

Application in Tomcat is not responding

We are trying to access an application from the tomcat which is on a different host, but it is not loading even though the tomcat is running. It was running fine for the past 3 months. We restarted the tomcat now it is working fine.
But, we could not able to zero in on what happened.
Any idea how to trace / what might have caused this?
The CPU usage was normal and the tomcat memory was 1205640.
the memory setting of tomcat are 1024- 2048(min-max)
We are using tomcat 7.
Help much appreciated....thanks in advance.....cheers!!
...also - not sure on Windows - you may be running out of file descriptors. This typically happens when streams are not properly closed in finally blocks.
In addition, check with netstat if you have a lot of sockets remaining open or accumulating in wait state.
Less likely, the application is creating threads and never releasing them.
The application is leaking something (memory, file descriptors, sockets, threads,...) and running over a limit.
There are different ways to track this. A profiler may help or more simply, running JVM dumps at regular intervals and checking what is accumulating. The excellent MAT will help you analyze the dumps.
Memory leak problems are not uncommon. If your Tomcat instance was running for three months and suddenly the contained application became unresponsive maybe that was the case. One solution (and if your resources allow you to do so) could be monitoring that Tomcat instance though JMX using jconsole to see how it behaves

Permgen out of memory

Running Tomcat for an Enterprise level App. Been getting "Permgen out of memory" messages.
I am running this on:
Windows 2008 R2 server,
Java 1.6_43,
Running Tomcat as a service.
No multiple deployments. Service is started, and App runs. Eventually I get Permgen errors.
I can delay the errors by increasing the perm size, however I'd like to actually fix the problem. The vendor is disowning the issue. I don't know if it is a memory leak, as the vender simply say "runs fine with Jrockit". Ofc, that would have been nice to have in the documentation, like 3mos ago. Plus, some posts suggest that Jrockit just expands permspace to fit, up to 4gb if you have the mem (not sure that is accurate...).
Anyway, I see some posts for a potential fix in Java 1.5 with the options
"-XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled"
However, these seem to have been deprecated in Java 1.6, and now the only GC that seems to be available is "-XX:+UseG1GC".
The best link I could find, anywhere, is:
http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html#G1Options
Does anyone know if the new G1 garbage collector includes the permspace? or am I missing an option or 2 in the new Java6 GC settings that maybe i am not understanding?
Any help appreciated!
I wouldn't just increase the permgen space as this error is usually a sign of something wrong in software/setup. Is their a specific webapp that causes this? Without more info, I can only give basic advice.
1) Use the memory leak detector (Tomcat 6+) called Find Leaks
2) Turn off auto-deployment
3) Move JDBC drivers and logging software to the java classpath instead of tomcat per this blog entry
In earlier versions of Sun Java 1.6, the CMSPermGenSweepingEnabled option is functional only if UseConcMarkSweepGC is also set. See these answers:
CMSPermGenSweepingEnabled vs CMSClassUnloadingEnabled
What does JVM flag CMSClassUnloadingEnabled actually do?
I don't know if it's functional in later versions of 1.6 though.
A common cause for these errors/bugs in the past was dynamic class generation, particularly for libraries and frameworks that created dynamic proxies or used aspects. Subtle misuse of Spring and Hibernate (or more specifically cglib and/or aspectj) were common culprits. The underlying issue was that new dynamic classes were getting created on every request, eventually exhausting permgen space. The CMSPermGenSweepingEnabled option was a common workaround/fix. Recent versions of those frameworks no longer have the problem.

Categories

Resources