How to deal with memory leaks from external library - java

I have a small java application running a set of computational heavy tasks. For processing the tasks, I use an external library which does most of the computation via native methods and some C code. Unfortunately, after solving one task, the library suffers from heavy memory leaks and can therefore only solve one task per application execution.
The memory problem is known to the coders from the library, but not fixed yet and maybe never will (it has something to do with the java garbage collector not properly working with the native inferface). Since there is no alternative for this particular library, I am looking for options to solve the tasks by sequentially application executions.
Currently, I have a bash wrapper script, which gets a list of tasks that should be executed and for each task the script calls the application with just this single task to execute.
Since tasks often need the results from previous tasks, this involves serializing and deserializing execution results to files. This does not seem to be good practice to me, also because the user has basically no way to interact with the program control flow.
Does anybody have an idea how I can to this sequential task execution inside one single java application? I guess this would involve starting a new JVM for each task exection, hopefully only transferring the task result and not the memory leaks from the new JVM to my application.
Edit providing further information:
Changing the root of the problem: Unfortunately, the library is not open source and I have neither access to the native methods nor to the java interface api.
New processes / JVMs: Is that the same in this context? I have not much experience with the java process api or starting new JVMs. My assumption is that this would involve starting a separate java program with its own main function using ProcessBuilder.start()?
Exchange of data: It is only a couple of kilobytes so performance is not an issue. Still, a solution without files would be preferable, but if I understand correctly memory mapped files also use local files. Sockets on the other hand do sound promising.

Funnily enough, I've faced the same issue. By definition, you need to accept nothing will be best practice or nice faced with having to use a faulty library you must use but cannot upgrade.
The solution we came up with was to isolate calls to the library in it's own process. This process was a child of a master process. The master process contains the good code and the child the bad. We were then able to keep track of the number of invocations of the child process and tear it down once it reached a certain number. We knew that we could get away with X invocations before the child process was corrupt.
Because of the nature of our problem, bringing up a fresh process enabled us to have another X invocations before repeating.
Any state was returned to the master process on a successful invocation. Any state gathered during an unsuccessful invocation was discarded and we started again.
Again, none of the above is "nice" but it worked for us.
For what it's worth, if I did this again, I'd use Akka and remote actors which would make all the sub-process, remoting etc far simpler.

That depends. Do you have the source code of this external application, i.e. can you recompile it? The easiest approach is obviously to fix the leak at its root. This might however be impractical. If the library, as you say, is implemented via native methods and some C code, I do not think that the problem has something to do with the Java garbage collector not properly working. Native methods and C code do not normally store their data on the JVM's heap and are therefore not garbage collected, i.e. it is the job of the library to clean up after itself.
If the leak is indeed in the bit of Java code that the library exposes, than there is a way. Memory leaks in Java occure by forgetting about references, e.g. consider the following example:
class Foo {
private ExpensiveObject eo;
Foo(ExpensiveObject eo) {
this.eo = eo;
}
}
The ExpensiveObject is alive (at least) as long as its referencing Foo instance. If you (or your library) do(es) not isolate instance life-cycles well enough, you get into trouble. If you do not have a chance to refactor, you can however use reflection to clean up the biggest mess from another place in your code:
void release(Foo foo) {
Field f = Foo.class.getDeclaredField("eo");
f.setAccessible(true);
f.set(foo, null);
}
This should however be considered a last-resort as it is quite a hack.
Alternatively, a better approach is normally to fork another instance of a JVM to do the dirty work. It seems like you are doing something similar already. By forking a JVM, you isolate the use of memory on a process level. Once the process dies, all memory is released by the OS. The problem with this approach is normally platform compatibility but as you already use a native library, this does not worsen your situation.
You say that you currently use files to communicate between these different processes. Why do you need to store data in a file? Rather consider using sockets or memory-mapped files (NIO), if performance is important for this matter.

Related

How to Assign thread to a particular core [duplicate]

Does anybody know of a way to lock down individual threads within a Java process to specific CPU cores (on Linux)? I've done this in C, but can't find how to do this in Java. My instincts are that this will require a JNI call, but I was hoping someone here might have some insight or might have done it before.
Thanks!
You can't do this in pure java. But if you really need it -- you can use JNI to call native code which do the job. This is the place to start with:
http://ovatman.blogspot.com/2010/02/using-java-jni-to-set-thread-affinity.html
http://blog.toadhead.net/index.php/2011/01/22/cputhread-affinity-in-java/
UPD: After some thinking, I've decided to create my own class for this: ThreadAffinity.java It's JNA-based, and very simple -- so, if you want to use it in production, may be you should spent some time making it more stable, but for benchmarking and testing it works well as is.
UPD 2: There is another library for working with thread affinity in java. It uses same method as previously noted, but has another interface
I know it's been a while, but if anyone comes across this thread, here's how I solved this problem. I wrote a script that would do the following:
"jstack -l "
Take the results, find the "nid"'s of the threads I want to manually lock down to cores.
Taskset those threads.
You might want to take a look at https://github.com/peter-lawrey/Java-Thread-Affinity/blob/master/src/test/java/com/higherfrequencytrading/affinity/AffinityLockBindMain.java
IMO, this will not be possible unless you use native calls. JVM is supposed to be platform independent, any system calls done to achieve this will not result in a portable code.
It's not possible (at least with plain Java).
You can use thread pools to limit the amount of threads (and therefore cores) used for different types of work, but there is no way to specify a core to use.
There is even the (small) possibility that your Java runtime doesn't support native threading for your OS or hardware. In this case, green threads are used and only one core will be used for the whole JVM.

What to consider when writing a java program that is supposed to run 'forever'

I have to write a program that is thought to run 'forever' , meaning that it won't terminate regularly. Up until now I always wrote programs that would run and be terminated at the end of the day. The program has to do some synchronizations, pause for n minutes and than sync again.
AFAIK there should be no problem with my current implementation and it should theoretically run just fine, but I'm lacking any real-world experience.
So are there any 'patterns' or best practices for writing very robust and resource efficient java programs that have a very long runtime? What could be possible problems after for example a month/year of runtime?
Some background :
Java : 1.7 but compiled down to 1.5
OS : Windows (exact version is not certain yet)
Thanks in advance
Just a brain dump of all the things I've had to keep in mind when writing this kind of app.
Avoid Memory Leaks
I had an app that runs once at mid day, every day, and in that I had a FileWriter. I wasn't closing that properly, and then we started wondering why our virtual machine was going into melt down after a few weeks. Memory leaks can come in the form of anyhing really, with one of the most common examples being that you don't de-reference an object appropriately. For example, using a class's field as a method of temporary storage. Often the class persists, and so does the reference. This leaves you with objects, sitting in memory and doing nothing.
Use the right kind of Scheduler
I used a java Timer in that app, and later I learnt that it's better to use a ScheduledThreadPoolExecutor when another app was changing the System clock. So if you plan on keeping it completely Java based, I would strongly recommend using that over a Timer for all of the reasons detailed in this question.
Be mindful of memory usage and your environment
If your app is loading large amounts of data each and every day, and you have other apps running on the same server, you may want to be careful about the timing. For example, say at mid day, three of the apps run their scheduled operation, I would say running it at any other time would probably be a smart move. Be mindful of the environment in which you're executing your code in.
Error handling
You probably want to configure your app to let you know if something has gone wrong, without the app breaking down. If it's running at a certain time every few hours, that means people are probably depending on it, so I would have a function in your Java code that sends out an email to you, detailing the nature of the exception.
Make it configurable
Again, if it needs to run at various points in the day, you don't want to have to pull the thing down for a few hours to work out some minor changes to your code. Instead, port it into a java Properties file, or into an XML Config (or really, whatever). The advantage of this is that you can update your program and get it up and running before anyone really noticed the difference.
Be afraid of the static keyword
That bad boy will make objects persist, even when you destroy their parent reference. It is the mother of all memory leaks if you are not careful with it. It's fine for constants, and things that you know don't need to change and need to exist within the project to run well, but if you're using it for random values inside a project, you're going to quickly wonder why your app is crashing every few hours rather than syncing.
Props to #X86 for reminding me of that one.
Memory leaks are likely to be the biggest problem. Ensure that there are no long-term references held after an iteration of your logic. Even a relatively small object being referenced forever, will exhaust the memory eventually (and worse, it's going to be harder to detect during testing if the growth rate is 1GB/month). One approach that may help is using the snapshot functionality of profilers: take a snapshot during the pause, let the sync run a few times, and take another snapshot. Comparing these should show the delta between the synchronizations, which should hopefully be zero.
Cache maintenance is another issue. The overall size of a cache needs to be strictly limited (whereas often you can get away without in short-running programs, because everything seen will be small enough to not cause problems). Equally it's more important to do cache-invalidation properly - broadly speaking, everything that gets cached will become stale at some point while your program is still running, and you need to be able to detect this and take appropriate action. This can be tricky depending on where the golden source of the cached data is.
The last thing I'll mention is exception-handling. For short-running processes, it's often enough to simply let the process die when an exception is encountered, so the issue can be dealt with, and the app rerun. With a long-running process you'll likely need to be more defensive than this. Consider running parts of your program in threads, which can be restarted* if/when they fail. You may need a supervisor-type module, which checks that everything else is still heartbeating and reboots it if not. If appropriate to your structure, this is anecdotally a lot easier to achieve with actors-style libraries rather than Java's standard executors. And if it's at all possible, you may want to have hooks (perhaps exposed over JMX/MBeans) that let you modify the behaviour somewhat, to allow a short-term hack/workaround to be affected without having to bring the process down. Though this requires quite some amount of foresight to predict exactly what's going to go wrong in several months...
*or rather, the job can be restarted in another thread

How to moniter memory allocated by some java method at runtime

I am creating a java program in which my class suppose A has it's some predefined behavior. But user can over-ride my class to change its behavior. So my script will check if there is some subclass than I will call it's behavior but what if he has written some blocking code or memory leak in his code.
This may harm my process. Is there is any way in java to monitor memory allocated by some method.
Please suggest.
but what if he has written some blocking code or memory leek in his
code
First of all i suggest you document your class well. Describe what the user is allowed to do and what not. Give use cases what to do(if possible).
For the blocking code part, if you have some timing issues, you could wrap the execution of the method in say a Future and let a ExecutorService execute the code. That way you will be able to cancel the execution if the execution takes too much time.
For the memory leak issue, well i guess you are not talking about memory leaks but increased memory consumption caused by calling the overridden method. Memory leaks in java are rare after all.
You will not be able to detect the memory consumption of a method, that's not how java works. Memory is global. What will you do if for example an external library is loaded(JNI), or some library in the classpath is called that will use more memory now? You just can not tell.
Other then monitoring the overall memory consumption, there is no other way(someone please tell me if i am wrong).
Oracle has quite a good document about solving memory leaks. It suggests that one should use NetBeans Profiler as a tool.
http://www.oracle.com/technetwork/java/javase/memleaks-137499.html
I believe you can use the same debugging API for checking against misbehaving code while it is running, but that will come with a performance penalty and is probably akin to killing a fly with a sledgehammer. I personally would not let anything like that to run in production. Instead I would rely on rigorous testing and peer review.
For external monitoring, you can use VisualVM or JConsole (part of JDK), for internal you can use the Runtime class:
Runtime rt = Runtime.getRuntime();
long totalMem = rt.totalMemory();
long maxMem = rt.maxMemory();
long freeMem = rt.freeMemory();
Via the Thread class, you can check the status of all threads. Never used it directly, because application servers or batch processing APIs doing their job... So, I don't need to reinvent the wheel. And I suggest to use tools like VisualVM...
EDIT: Watch also this thread: Why do threads share the heap space?
You cannot analyze the heap usage of a single thread. If you have problems with the execution of foreign code, you should sepearate it as good as you can from other threads and analyze the thread or heap dumps. This could be done as mentioned with VisualVM or JConsole which was also added by Oracle (or SUN).
Depending on what sort of behavior that the subclass can do, then we might think of options. For example, if it's a database related operation, we can force them to do connection clean ups, if it's file based, we can force them to read the file through your class and check for how big the file is, if it's any http call or some other streaming functionality, we can look at enforcing constraints accordingly.
If you're just worried about the heap size utilization and memory leaks there, you might want to look at http://java.dzone.com/tips/getting-jvm-heap-size-used which explains how to get runtime memory programatically. But then you'll have to do periodic checks and you can never be sure of whether a memory usage is caused by the subclass behavior.
I just found this while i was trying to build up an agent that records memory allocations:
In the post How to track any object creation in Java since freeMemory() only reports long-lived objects? it is specified that there is an open source project Java Allocation Instrumenter that you could use to register your own callback (it has examples too) and using that you are able to obtain what you need.
I started few days ago to work on a similar project and while researching i found your question and the below post.
I personally needed this kind of code in some unit tests to check if one allocates too many objects inside critical methods and found that using Runtime class was not appropiate because Garbage collector may interfere and the test recorded negative numbers for allocated memory.

Load existing class objects in JVM from another JVM

How to load existing class objects in JVM from another JVM?
I am analyzing a rare scenario in my server. I do not have proper logs in my sever to help me solve the situation and I believe that it can be a problem with a particular class object (user defined).
Say for example below is the class:
public class MyRequest
{
public byte[] getData()
{
return somdata;
}
}
Currently in my server's JVM, 100's of the above class object is in my JVM's memory. I want to know if there is a possibility to load all the 100 objects and access their data/method (getData()).
I do not want to create an new instance of the MyRequest class (that I know is pretty easy). I want to load the existing objects from my JVM through another Java process.
P.S : I can not kill my server for any reason.
P.S : And I can not install any tools like visualvm etc and more over tools tell us the objects type,memory but not the exact data.
Basically, it won't work.
If you can't attach a debugger, you can't do anything.
If you could attach a debugger, you should be able find and look at those instances, but you won't be able to get them to do something they weren't designed to do. In particular, if they are not designed to be serializable, you won't be able to serialize them.
I think your best bet is to change your server code to improve the logging, and then restart it with a debugger agent ... and wait for the problem to recur.
And of course, if you have a debugger attached, you don't need to move objects to another JVM. You can just look at their state directly.
However, there's a catch. Many "amazingly rare" scenarios are actually related to threading, thread-safety and timing problems. And many things you can do to observe the effects of a such a bug are liable to alter the program's behaviour.
FOLLOWUP
So if we know the starting address of the Virtual memory for that JVM...can we not know the data? assuming all objects are within the JVM memory space.
It is not as simple as that:
Locations of objects on the Java heap are not predictable.
Locations of thread stacks are not predictable.
and so on.
It may be theoretically possible to dump the memory of any process, and reconstruct the execution state of the JVM, and "read" the state of the objects. But you'd need specialized tools and/or a great deal of knowledge of JVM internals to do this. I'm not even sure if the tools exist ...
In short, it is not practical, AFAIK.
Objects and their references (aliases) are bound to the current running JVM. There is no possibility to share them between several JVMs.
If you want to "share" data between two JVMs, you must serialize this data, which means sending them from on JVM to the other. This also requires the classes, whose instances shall be serialized, to implement the interface Serializable. Note, that arrays automatically implement Serializable.
You can either stream those serializable objects yourself using sockets, output and input streams (which is much effort) or you can use RMI for calling remote methods and just stream your data. In either case, all objects are copied and built up again in the other JVM. There is no chance to have them shared.
In case of application servers, RMI calls are typically invoked by just using EJBs. But you need an application server; just using a web server is not enough.
Load existing class objects in JVM from another JVM
Its not possible
Note that you can tell the JVM to dump its state - with a kill signal or similar - to disk so you can use post-Mortem tools to analyze the state of your program.
Keywords are "core" and "hprof" and I have not done this myself yet.

Java thread affinity

Does anybody know of a way to lock down individual threads within a Java process to specific CPU cores (on Linux)? I've done this in C, but can't find how to do this in Java. My instincts are that this will require a JNI call, but I was hoping someone here might have some insight or might have done it before.
Thanks!
You can't do this in pure java. But if you really need it -- you can use JNI to call native code which do the job. This is the place to start with:
http://ovatman.blogspot.com/2010/02/using-java-jni-to-set-thread-affinity.html
http://blog.toadhead.net/index.php/2011/01/22/cputhread-affinity-in-java/
UPD: After some thinking, I've decided to create my own class for this: ThreadAffinity.java It's JNA-based, and very simple -- so, if you want to use it in production, may be you should spent some time making it more stable, but for benchmarking and testing it works well as is.
UPD 2: There is another library for working with thread affinity in java. It uses same method as previously noted, but has another interface
I know it's been a while, but if anyone comes across this thread, here's how I solved this problem. I wrote a script that would do the following:
"jstack -l "
Take the results, find the "nid"'s of the threads I want to manually lock down to cores.
Taskset those threads.
You might want to take a look at https://github.com/peter-lawrey/Java-Thread-Affinity/blob/master/src/test/java/com/higherfrequencytrading/affinity/AffinityLockBindMain.java
IMO, this will not be possible unless you use native calls. JVM is supposed to be platform independent, any system calls done to achieve this will not result in a portable code.
It's not possible (at least with plain Java).
You can use thread pools to limit the amount of threads (and therefore cores) used for different types of work, but there is no way to specify a core to use.
There is even the (small) possibility that your Java runtime doesn't support native threading for your OS or hardware. In this case, green threads are used and only one core will be used for the whole JVM.

Categories

Resources