How to handle OutOfMemoryError in multi threading? [duplicate]

How to handle OutOfMemoryError in multi threading? [duplicate] - java

This question already has answers here:
What is an OutOfMemoryError and how do I debug and fix it
(4 answers)
Closed 5 years ago.
We are using heavy multi threading in a Swing application or extensive calculations. From time to time it can happen that the application runs against an OOME and can not create any native threads any more. I absolutely understand that the application has to be aware of this and it is bad by design then, however it can not be avoided 100%. The problem is that in such a case the JVM is absolutely lost because it can not handle the error and the system is behaving non predictable. Usually we log every memory error and restart the application by -XX:OnOutOfMemoryError="kill -9 %p", however this does not work for obvious reason. On the other hand it is a bit frustrating the JVM has no control any more. So what might be a good way to come around this kind of problem?
PS: I do not search for a solution like extending systems process limits or reducing thread stack size via Xss. I am looking for an approach how to handle in general.

The JVM has perfect control over OutOfMemoryErrors and handles it gracefully, what does not handle it gracefully is your program. You can catch and handle an OutOfMemoryError in the same way as every other error, just that most programs never do that.
To solve your problem you should first try to pinpoint the root of those memory errors, for example by logging them, or by using performance/memory analysis tools. Also enforcing a core-dump in these cases can be useful, which then allows you to analyze the root cause at the moment it happened.
In the end redesigning the application will be necessary to avoid OOM errors by limiting the amount of memory used. This can either be done by testing how many threads the program can gracefully handle and then enforcing that limit, or by checking free memory before creating a new thread. Also architectural changes might help, but you posted no details about the internals, so I can't give any advise here.

Related

Java - How is memory leak in Java harmful? How can it possibly get used for a bad cause?

Creating a memory leak with Java
I was going through above "interview" question. After reading it's answers I myself ended up having a few questions.
Let's guess there is already a memory leak in the code.
How is that harmful? How can the data go in wrong hands?
I am pretty sure that System.read(); (or something like that) is not going to read the data from the memory leak. Is that even possible?
Please help with some reference/code/documents.

Memory leak is really a broad argument, to be honest I've voted to close your question (because too broad) but on the other hand I would try to give you a little spark of what behind this problem.
Consider that you're creating a session in memory for every user connected to your web service, but you don't throw away the session after some time, simply because you forget or because a bad design of your application, this would cause a memory leak.
And again, consider that you don't close your open files or sockets.
Or consider that somewhere you save a reference to all the intermediate data structures produced by your process. In this case there is no way for the garbage collector to free the allocated memory.
Memory leaks mostly happens in long running application, because in the short run a memory leak have little chances to generate a out of memory exception. But in the long run the thing changes, there are applications that runs for months or even years.
There are so many situation where a memory leaks could happen. Many framework or libraries and even the languages try to save the programmers by this "bad" situations, but I personally think that is the experience of the programmer that does the difference.
For example in Java the Try with resource Statement is an example of language features born to help programmers in such situation (this helps to not forget).
So when designing your own objects that should close some resource at end of their life, try to implement java.lang.AutoCloseable interface and add the appropriate methods. Have a look at how many classes are now implementing the Autocloseable interface, this also explain how is important the memory (leak) and resource handling.
I would also suggest to study the difference between Java stack and heap memory management.
Once I experienced a Tomcat instance that hanged a server every three months. After some time the server had to be restarted every three week, till the time the server had to be restarted every day.
Comes out that "someone", wrote a for loop instead to add a while clause in a sql query.
So, there are programmers that does this as full time job, that are expert in this kind of investigations and that are able to find and correct memory leaks.

java first encounter with heap space error server data logger

I built my first Java program which is built on top of the Interactive Brokers Java API. That may or may not be important. I just extended the main API classes with a couple new classes.
The program is making data queries to a remote server. When the server responds, I log the received data to a local MySQL data base. Once the program finishes logging the data, the program will make the next data request.
I am having a problem after leaving the program running for some time, after making a couple hundred server requests. I will see this error, then the program doesn't continue to execute:
java.lang.OutOfMemoryError: Java heap space
I did some research, and from what I read, I conclude that the program is creating many new variables, and not destroying old worthless ones. Since I am using Netbeans for development, I used the Netbeans profiler to inspect if this was the case. See the picture here:
After running the program for quite some time, more and more of the memory is used up by Byte. So it seems that my theory is still true.
I don't really know where to go from here. There is no reference to a class or specific variable, just a variable type. How can pinpoint where the problem is coming from?
UPDATE
I corrected a specific problem that was mentioned by BigMike in the comments. Previiously, I was creating many Statements in the JDBC MySQL Connector API, and I was calling .execute() to execute the statements, but I wasn't closing the statement with .close().
I made sure the add the statement.close() call after each execution, and the program runs much better now. By looking at the RAM usage for this program, it seems to solved the problem. I am also not seeing the Java heap space error anymore, which is nice.
Thanks!

It's very hard to say what might be wrong by simply that.
It might have to do with Streams that you are opening that aren't being closed when you no longer need them.
Double check methods that allocate resources (reading from files, database, etc), especially if they read data into streams, and make sure you close those streams in a finally clause.
Apart from that, you can try and profile what methods are being called more often, etc, to try and narrow down the problem to a specific part of your code.
I found a site with a reasonable explanation of how Garbage Collection works, and what can cause OutOfMemoryErrors:
http://www.kdgregory.com/index.php?page=java.outOfMemory
If you read through that, there's a specific reference to high allocation of Object[] and byte[], that might point you in the right direction.

Generally speaking, this comes about for one of two reasons:
There is a memory leak in the application, such that the application fails to release items for garbage collection, leading to the JVM running out of memory over time.
The application attempted a one-off operation that would require more memory than is available, leading to the JVM running out of memory due to the operation.
Since your output seems to indicate that the bulk of the memory is consumed by literally a million plus small byte arrays, my guess is that #1 is probably the culprit; however, to verify this, restart your application and watch it's memory consumption over time. It will bounce up and down, but really you only need to watch the trend of consumption. If the consumption average continues to climb over time, you have a memory leak.
To solve this issue, you typically need the source code, and need to find the parts of the code where the troubling objects are being created, used, and then "stored" far beyond the last time that they will ever be used. The solution is to correct the code to no longer store them. HashMaps, Lists, and other Collections are often accomplices in memory leak problems.
If you lack the source code, you can attempt to measure the trend of the memory consumption, and schedule shutdowns and restarts of the application to effectively "reset the clock" such that you choose your downtime instead of watching the application choose it for you.
If it is a one-off operation (not likely considering your data) then you won't see an upward trend in memory consumption until the triggering event occurs. In such a case, with access to the source code, you should protect your application from processing data that grows very far outside of normal operating parameters. For example, reading a message from the network typically takes only a few KB, but in exceptional circumstances a client might transmit forever. In such a case, kill the message processing and throw the message away with an error if you exceed a maximum message size limit of 10 MB.
Without access to the source code in the latter scenario, the only hope is to identify the incoming upset, hunt down the source of the errant transmission, and attempt to manipulate it to prevent the overload of output.
The variations on how to approach these techniques are vast, but now you have a few ideas.

Can the JVM recover from an OutOfMemoryError without a restart

Can the JVM recover from an OutOfMemoryError without a restart if it gets a chance to run the GC before more object allocation requests come in?
Do the various JVM implementations differ in this aspect?
My question is about the JVM recovering and not the user program trying to recover by catching the error. In other words if an OOME is thrown in an application server (jboss/websphere/..) do I have to restart it? Or can I let it run if further requests seem to work without a problem.

It may work, but it is generally a bad idea. There is no guarantee that your application will succeed in recovering, or that it will know if it has not succeeded. For example:
There really may be not enough memory to do the requested tasks, even after taking recovery steps like releasing block of reserved memory. In this situation, your application may get stuck in a loop where it repeatedly appears to recover and then runs out of memory again.
The OOME may be thrown on any thread. If an application thread or library is not designed to cope with it, this might leave some long-lived data structure in an incomplete or inconsistent state.
If threads die as a result of the OOME, the application may need to restart them as part of the OOME recovery. At the very least, this makes the application more complicated.
Suppose that a thread synchronizes with other threads using notify/wait or some higher level mechanism. If that thread dies from an OOME, other threads may be left waiting for notifies (etc) that never come ... for example. Designing for this could make the application significantly more complicated.
In summary, designing, implementing and testing an application to recover from OOMEs can be difficult, especially if the application (or the framework in which it runs, or any of the libraries it uses) is multi-threaded. It is a better idea to treat OOME as a fatal error.
See also my answer to a related question:
EDIT - in response to this followup question:
In other words if an OOME is thrown in an application server (jboss/websphere/..) do I have to restart it?
No you don't have to restart. But it is probably wise to, especially if you don't have a good / automated way of checking that the service is running correctly.
The JVM will recover just fine. But the application server and the application itself may or may not recover, depending on how well they are designed to cope with this situation. (My experience is that some app servers are not designed to cope with this, and that designing and implementing a complicated application to recover from OOMEs is hard, and testing it properly is even harder.)
EDIT 2
In response to this comment:
"other threads may be left waiting for notifies (etc) that never come" Really? Wouldn't the killed thread unwind its stacks, releasing resources as it goes, including held locks?
Yes really! Consider this:
Thread #1 runs this:
synchronized(lock) {
while (!someCondition) {
lock.wait();
}
}
// ...
Thread #2 runs this:
synchronized(lock) {
// do something
lock.notify();
}
If Thread #1 is waiting on the notify, and Thread #2 gets an OOME in the // do something section, then Thread #2 won't make the notify() call, and Thread #1 may get stuck forever waiting for a notification that won't ever occur. Sure, Thread #2 is guaranteed to release the mutex on the lock object ... but that is not sufficient!
If not the code ran by the thread is not exception safe, which is a more general problem.
"Exception safe" is not a term I've heard of (though I know what you mean). Java programs are not normally designed to be resilient to unexpected exceptions. Indeed, in a scenario like the above, it is likely to be somewhere between hard and impossible to make the application exception safe.
You'd need some mechanism whereby the failure of Thread #1 (due to the OOME) gets turned into an inter-thread communication failure notification to Thread #2. Erlang does this ... but not Java. The reason they can do this in Erlang is that Erlang processes communicate using strict CSP-like primitives; i.e. there is no sharing of data structures!
(Note that you could get the above problem for just about any unexpected exception ... not just Error exceptions. There are certain kinds of Java code where attempting to recover from an unexpected exception is likely to end badly.)

The JVM will run the GC when it's on edge of the OutOfMemoryError. If the GC didn't help at all, then the JVM will throw OOME.
You can however catch it and if necessary take an alternative path. Any allocations inside the try block will be GC'ed.
Since the OOME is "just" an Error which you could just catch, I would expect the different JVM implementations to behave the same. I can at least confirm from experience that the above is true for the Sun JVM.
See also:
Catching java.lang.OutOfMemoryError
Is it possible to catch out of memory exception in java?

I'd say it depends partly on what caused the OutOfMemoryError. If the JVM truly is running low on memory, it might be a good idea to restart it, and with more memory if possible (or a more efficient app). However, I've seen a fair amount of OOMEs that were caused by allocating 2GB arrays and such. In that case, if it's something like a J2EE web app, the effects of the error should be constrained to that particular app, and a JVM-wide restart wouldn't do any good.

Can it recover? Possibly. Any well-written JVM is only going to throw an OOME after it's tried everything it can to reclaim enough memory to do what you tell it to do. There's a very good chance that this means you can't recover. But...
It depends on a lot of things. For example if the garbage collector isn't a copying collector, the "out of memory" condition may actually be "no chunk big enough left to allocate". The very act of unwinding the stack may have objects cleaned up in a later GC round that leave open chunks big enough for your purposes. In that situation you may be able to restart. It's probably worth at least retrying once as a result. But...
You probably don't want to rely on this. If you're getting an OOME with any regularity, you'd better look over your server and find out what's going on and why. Maybe you have to clean up your code (you could be leaking or making too many temporary objects). Maybe you have to raise your memory ceiling when invoking the JVM. Treat the OOME, even if it's recoverable, as a sign that something bad has hit the fan somewhere in your code and act accordingly. Maybe your server doesn't have to come down NOWNOWNOWNOWNOW, but you will have to fix something before you get into deeper trouble.

You can increase your odds of recovering from this scenario although its not recommended that you try. What you do is pre-allocate some fixed amount of memory on startup thats dedicated to doing your recovery work, and when you catch the OOM, null out that pre-allocated reference and you're more likely to have some memory to use in your recovery sequence.
I don't know about different JVM implementations.

Any sane JVM will throw an OutOfMemoryError only if there is nothing the Garbage collector can do. However, if you catch the OutOfMemoryError early enough on the stack frame it can be likely enough that the cause was itself became unreachable and was garbage collected (unless the problem is not in the current thread).
Generally frameworks that run other code, like application servers, attempting to continue in the face of an OME makes sense (as long as it can reasonably release the third-party code), but otherwise, in the general case, recovery should probably consist of bailing and telling the user why, rather than trying to go on as if nothing happened.
To answer your newly updated question: There is no reason to think you need to shut down the server if all is working well. My experience with JBoss is that as long as the OME didn't affect a deployment, things work fine. Sometimes JBoss runs out of permgen space if you do a lot of hot deployment. Then indeed the situation is hopeless and an immediate restart (which will have to be forced with a kill) is inevitable.
Of course each app server (and deployment scenario) will vary and it is really something learned from experience in each case.

You cannot fully a JVM that had OutOfMemoryError. At least with the oracle JVM you can add -XX:OnOutOfMemoryError="cmd args;cmd args" and take recovery actions, like kill the JVM or send the event somewhere.
Reference: https://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html

Trying to cause java.lang.OutOfMemoryException

I am trying to reproduce java.lang.OutOfMemoryException in Jboss4, which one of our client got, presumably by running the J2EE applications over days/weeks.
I am trying to find a way for the webapp to spitout java.lang.OutOfMemoryException in a matter of minutes (instead of days/weeks).
One thing come into mind is to write a selenium script and has the script bombards the webapps.
One other thing that we can do is to reduce JVM heap size, but we would prefer not to do this, as we want to see the limit of our system.
Any suggestions?
ps: I don't have access to the source code, as we just provide a hosting service (of course I could decompile the class files...)

If you don't have access to the source code of the J2EE app in question, the options that come to mind are:
Reduce the amount of RAM available to the JVM. You've already identified this one and said you don't want to do it.
Create a J2EE app (it could probably just be a JSP) and configure it to run within the same JVM as the target app, and have that app allocate a ridiculous amount of memory. That will reduce the amount of memory available to the target app, hopefully such that it fails in the way you're trying to force.

Try to use some profiling tools to investigate memory leakage. Also good to investigate memory damps that was taken after OOM happens and logs. IMHO: reducing memory is not the rightest way to investigate cose you can get issues not connected with real production one.

Do both, but in a controlled fashion :
Reduce the available memory to the absolute minimum (using -Xms1M -Xmx2M, as an example, but I fear your app won't even load with such limitations)
Do controlled "nuclear irradiation" : do Selenium scripts or each of the known working urls before to attack the presumed guilty one.
Finally, unleash the power that shall not be raised : start VisualVM and any other monitoring software you can think of (DB execution is a usual suspect).

If you are using Sun Java 6, you may want to consider attaching to the application with jvisualvm in the JDK. This will allow you to do in-place profiling without needing to alter anything in your scenario, and may possibly immediately reveal the culprit.

If you don't have the source use decompile it, at least if you think the terms of usage allows this and you live in a free country. You can use:
Java Decompiler or JAD.

In addition to all the others I must say that even if you can reproduce an OutOfMemory error, and find out where it occurred, you probably haven't found out anything worth knowing.
The trouble is that an OOM occurs when an allocation can not take place. The real problem however is not that allocation, but the fact that other allocations, in other parts of the code, have not been de-allocated (de-referenced and garbage collected). The failed allocation here might have nothing to do with the source of the trouble (no pun intended).
This problem is larger in your case as it might take weeks before trouble starts, suggesting either a sparsely used application, or an abnormal code path, or a relatively HUGE amount of memory in relation to what would be necessary if the code was OK.
It might be a good idea to ask around why this amount of memory is configured for JBoss and not something different. If it's recommended by the supplier than maybe they already know about the leak and require this to mitigate the effects of the bug.
For these kind of errors it really pays to have some idea in which code path the problem occurs so you can do targeted tests. And test with a profiler so you can see during run-time which objects (Lists, Maps and such) are growing without shrinking.
That would give you a chance to decompile the correct classes and see what's wrong with them. (Closing or cleaning in a try block and not a finally block perhaps).
In any case, good luck. I think I'd prefer to find a needle in a haystack. When you find the needle you at least know you have found it:)

The root of the problem is most likely a memory leak in the webapp that the client is running. In order to track it down, you need to run the app with a representative workload with memory profiling enabled. Take some snapshots, and then use the profiler to compare the snapshots to see where objects are leaking. While source-code would be ideal, you should be able to at least figure out where the leaking objects are being allocated. Then you need to track down the cause.
However, if your customer won't release binaries so that you can run an identical system to what he is running, you are kind of stuck, and you'll need to get the customer to do the profiling and leak detection himself.
BTW - there is not a lot of point causing the webapp to throw an OutOfMemoryError. It won't tell you why it is happening, and without understanding "why" you cannot do much about it.
EDIT
There is not point "measuring the limits", if the root cause of the memory leak is in the client's code. Assuming that you are providing a servlet hosting service, the best thing to do is to provide the client with instructions on how to debug memory leaks ... and step out of the way. And if they have a support contract that requires you to (in effect) debug their code, they ought to provide you with the source code to do your job.

Handling Errors (e.g. OutOfMemoryError) within servers

What is the best practice when dealing with Errors within a server application?
In particular, how do you think an application should handle errors like OutOfMemoryError?
I'm particularly interested in Java applications running within Tomcat, but I think that is a more general problem.
The reason I'm asking is because I am reviewing a web application that frequently throws OOME, but usually it just logs them and then proceeds with execution. That results, obviously, in more OOMEs.
While that is certainly bad practice, in my opinion, I'm not entirely sure that stopping the Server would be the best solution.

There is not much you can do to fix OutOfMemoryError except to clean up the code and adjust JVM memory (but if you have a leak somewhere it's just a bandaid)
If you don't have access to the source code and/or are not willing to fix it, an external solution is to use some sort of watch dog program that will monitor java application and restart it automatically when it detects OOMEs. Here is a link to one such program.
Of course it assumes that the application will survive restarts.

The application shouldn't handle OOM at all - that should be the server's responsibility.
Next step: Check if memory settings are appropriate. If they aren't, fix them; if they are, fix the application. :)

Well, if you have OOME then the best way would be to release as many resources (especially cached ones) as possible. Restarting the web-app (in case it's web-apps fault) or the web server itself (in case something else in the server does this) would do for recovering from this state. On the development front though it'd be good to profile the app and see what is taking up the space, sometimes there are resources that are attached to a class variable and hence not collected, sometimes something else. In the past we had problems where Tomcat wouldn't release the classes of previous versions of the same app when you replace the app with a newer version. Somewhat solved the problem by nullifying class variables or re-factoring not to use them at all but some leaks still remained.

An OutOfMemoryError is by no means always unrecoverable - it may well be the result of a single bad request, and depending on the app's structure it may just abandon processing the request and continue processing others without any problems.
So if your architecture supports it, catch the Error at a point where you have a chance to stop doing what caused it and continue doing something else - for an app server, this would be at the point that dispatches requests to individual app instances.
Of course, you should also make sure that this does not go unnoticed and a real fix can be implemented as soon as possible, so the app should log the error AND send out some sort of warning (e.g. email, but preferably something harder to ignore or get lost). If something goes wrong during that, then shutting down is the only sensible thing left to do.

#Michael Borgwardt, You can't recover from an OutOfMemoryError in Java. For other errors, it might not stop the application, but OutOfMemoryError literally hangs applications.

In our application which deals with Documents heavily, we do catch OOM errors where one bad request can result in OOM but we dont want to bring down the application because of this. We catch OOM and log it.
Not sure if this is best practice but seems like its working

I'm not an expert in such things, but I'll take a chance to give my vague opinion on this problem.
Generally, I think that there's two main ways:
Server is stopped.
Resources are thus gracefully degrading throughput, reducing memory consumption, but staying alive. For this case application must have appropriate architecture, I think.

According to the javadoc about a java.lang.Error:
An Error is a subclass of Throwable that indicates serious problems that a reasonable application should not try to catch. Most such errors are abnormal conditions. The ThreadDeath error, though a "normal" condition, is also a subclass of Error because most applications should not try to catch it.
A method is not required to declare in its throws clause any subclasses of Error that might be thrown during the execution of the method but not caught, since these errors are abnormal conditions that should never occur.
So, the best practice when dealing with subclasses of Error is to fix the problem that is causing them, not to "handle" them. As it's clearly stated, they should never occur.
In the case of an OutOfMemoryError, maybe you have a process that consumes lots of memory (e.g. generating reports) and your JVM is not well sized, maybe you have a memory leak somewhere in your application, etc. Whatever it is, find the problem and fix it, don't handle it.

I strongly disagree with the notion that you should never handle an OutOfMemoryError.
Yes, it tends to be unrecoverable most of the time. However, one of my servers got one a few days ago and the server was still mostly working for more than an hour and a half. Nobody complained so I didn't notice until my monitoring software got a failure and hour and a half after the first OutOfMemoryError. I need to know as soon as possible when there is an OutOfMemoryError on my server. I need to handle it so that I can set up a notification so that I can know to restart my server ASAP.
I'm still trying to figure out how to get Tomcat to do something when it gets an Error. error-page doesn't seem to be working for it.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.