I am having a hard time tracking this down since the profiler keeps crashing (hotspot error). Before I go too deep into figuring it out I'd like to know if I really have a problem or not :-)
I have a few thread pools created via: Executors.newFixedThreadPool(10); The threads connect to different web sites and, on occasion, I get connection refused and wind up throwing an exception.
When I later on call Future.get() to get the result it will then catch the ExecutionException that wraps the exception that was thrown when the connection could not be made.
The program uses a fairly constant amount of memory up until the point in time that the exceptions get thrown (they tend to happen in batches when a particular site is overloaded). After that point the memory again remains constant but at a higher level.
So my question is along the lines of is the memory behaviour (reported by "top" on Unix) expected because the exceptions just triggered something or do I probably have an actual leak that I'll need to track down? Additionally when Future.get() throws an exception is there anything else I need to do besides catch the exception (such as call Future.cancel() on it)?
EDIT: so I did tale a look with a couple of tools and from the Java point of view there is nothing going on memory leak wise. I'll play around with some other code that lives for a long time and throws an exception after a while and see if the memory reported by "top" also increases. Seems like it may just be some sort of oddity.
Does your Java process ever actually exit with the exception java.lang.OutOfMemoryError ? If not, its unlikely that you have a leak. Of course, you can always attach to the Java process with JConsole, capture a heap dump, and open it in a free tool like HPjmeter to find out really quickly.
JVM will start with Xms amout of heap, then grab more from the system up to Xmx. Even if GC frees heap in the JVM, the java process keeps holding on the heap it has grabbed from the OS (so in top, it never appears to shrink) (Experience with Sun JVM (Java1.5/6) on Redhat Linux).
It is well possible that heap is freed and is available, but it is not released to the OS and in top java appears to be using a lot of heap.
You can use visualvm, a tool that comes with JDK. Just run your app. for an extended time, connect the tool and take a look at the heap telemetry. If it keeps gradually growing, it might be a memory leak.. You can also take a heap dump with this tool and see what objects are accumulating.
Related
I have a system in Scala, with a lot of simultaneous threads and system calls. This system has some problem, because memory usage is increasing over time.
The image bellow shows the memory usage for one day. When it gets to the limit, the process shuts down and I put a watch-dog to recover it again.
I periodically run the command
jcmd <pid> GC.run
And this makes the memory to increase slowly, but the leak still happens.
I analysed with jvisualvm, comparing to distinct moments in time, with 40 minutes delta. The image bellow shows the comparison between these two moments in time. Notice that there is an increase for instances of some classes like ConcurrentHashMap$HashEntry, SNode, WeakReference, char[] and String and many classes in the package scala.collection.concurrent.
What can be causing the memory leak?
Edit 1:
Investigating JVisualVM, I noticed object of CNode and INode classes that are in TriedMap, that is instanced inside sbt.TrapExit$App class. Here is the object hierarchy figure:
First capture a heap dump when your application crashes due to an out of memory issue. Add the following flags when starting the jvm
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/dump
Next you need to analyze the heap dump to figure out the source of the memory leak. I recommend using Eclipse MAT. The Leak Suspects report should give you a sense of what objects are actually causing the leak.
Without seeing the implementation its hard to say. The title of your post suggests that there is a memory leak in Scala, but did you check your implementation against problems with releasing objects?
Did you check following:
Do you limit number of actors at all?
Do you set timeouts for the system calls?
Do you allow the actors to be removed from Heap when they performed therir tasks?
Did you count how many actors can fit into your memory or you are just creating "hundreds of actors" with hope that jvm will know "what to do"
What I'm trying to say is that maybe you run out of memory because you simply create to many objects which are not later released, because either they are still performing their tasks (no timeout) or you have created to many of them.
Maybe you need to scale your application to many jvms? How many jvms do you use?
I am having a really weird issue with a Java application.
Essentially it is a web page that uses magnolia (a cms system), there are 4 instances available on production environment. Sometimes the CPU goes to 100% in a java process.
So, first approach was to make a thread dump, and check the offending thread, what I found was weird:
"GC task thread#0 (ParallelGC)" prio=10 tid=0x000000000ce37800 nid=0x7dcb runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x000000000ce39000 nid=0x7dcc runnable
Ok, that is pretty weird, I have never had a problem with the garbage collector like that, so the next thing we did was to activate JMX and using jvisualvm inspect the machine: the heap memory usage was really high (95%).
Naive approach: Increase memory, so the problem takes more time to appear, result, on the restarted server with increased memory (6 GB!) the problem appeared 20 hours after restart while on other servers with less memory (4GB!) that had been running for 10 days, the problem took still a few more days to reappear. Also, I tried to use the apache access log from the server failing and use JMeter to replay the requests into a local server in an attemp to reproduce the error... it did not work either.
Then I investigated the logs a little bit more to find this errors
info.magnolia.module.data.importer.ImportException: Error while importing with handler [brightcoveplaylist]:GC overhead limit exceeded
at info.magnolia.module.data.importer.ImportHandler.execute(ImportHandler.java:464)
at info.magnolia.module.data.commands.ImportCommand.execute(ImportCommand.java:83)
at info.magnolia.commands.MgnlCommand.executePooledOrSynchronized(MgnlCommand.java:174)
at info.magnolia.commands.MgnlCommand.execute(MgnlCommand.java:161)
at info.magnolia.module.scheduler.CommandJob.execute(CommandJob.java:91)
at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
Another example
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:2894)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:407)
at java.lang.StringBuilder.append(StringBuilder.java:136)
at java.lang.StackTraceElement.toString(StackTraceElement.java:175)
at java.lang.String.valueOf(String.java:2838)
at java.lang.StringBuilder.append(StringBuilder.java:132)
at java.lang.Throwable.printStackTrace(Throwable.java:529)
at org.apache.log4j.DefaultThrowableRenderer.render(DefaultThrowableRenderer.java:60)
at org.apache.log4j.spi.ThrowableInformation.getThrowableStrRep(ThrowableInformation.java:87)
at org.apache.log4j.spi.LoggingEvent.getThrowableStrRep(LoggingEvent.java:413)
at org.apache.log4j.AsyncAppender.append(AsyncAppender.java:162)
at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
at org.apache.log4j.Category.callAppenders(Category.java:206)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at org.slf4j.impl.Log4jLoggerAdapter.error(Log4jLoggerAdapter.java:576)
at info.magnolia.module.templatingkit.functions.STKTemplatingFunctions.getReferencedContent(STKTemplatingFunctions.java:417)
at info.magnolia.module.templatingkit.templates.components.InternalLinkModel.getLinkNode(InternalLinkModel.java:90)
at info.magnolia.module.templatingkit.templates.components.InternalLinkModel.getLink(InternalLinkModel.java:66)
at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at freemarker.ext.beans.BeansWrapper.invokeMethod(BeansWrapper.java:866)
at freemarker.ext.beans.BeanModel.invokeThroughDescriptor(BeanModel.java:277)
at freemarker.ext.beans.BeanModel.get(BeanModel.java:184)
at freemarker.core.Dot._getAsTemplateModel(Dot.java:76)
at freemarker.core.Expression.getAsTemplateModel(Expression.java:89)
at freemarker.core.BuiltIn$existsBI._getAsTemplateModel(BuiltIn.java:709)
at freemarker.core.BuiltIn$existsBI.isTrue(BuiltIn.java:720)
at freemarker.core.OrExpression.isTrue(OrExpression.java:68)
Then I find out that such problem is due to the garbage collector using a ton of CPU but not able to free much memory
Ok, so it is a problem with the MEMORY that manifests itself in the CPU, so If the memory usage problem is solved, then the CPU should be fine, so I took a heapdump, unfortunatelly it was just too big to open it (the file was 10GB), anyway I run the server locallym loaded it a little bit and took a heapdump, after opening it, I found something interesting:
There are a TON of instances of
AbstractReferenceMap$WeakRef ==> Takes 21.6% of the memory, 9 million instances
AbstractReferenceMap$ReferenceEntry ==> Takes 9.6% of the memory, 3 million instances
In addition, I have found a Map which seems to be used as a "cache" (horrible but true), the problem is that such map is NOT synchronized and it is shared among threads (being static), the problem could be not only concurrent writes but also the fact that with lack of synchronization, there is no guarantee that thread A will see the changes done to the map by thread B, however, I am unable to figure out how to link this suspicious map using the memory eclipse analyzer, as it does not use the AbstracReferenceMap, it is just a normal HashMap.
Unfortunately, we do not use those classes directly (obviously the code uses them, but not directly), so I have seem to hit a dead end.
Problems for me are
I cannot reproduce the error
I cannot figure out where the hell the memory is leaking (if that is the case)
Any ideas at all?
The 'no-op' finalize() methods should definitely be removed as they are likely to make any GC performance problems worse. But I suspect that you have other memory leak issues as well.
Advice:
First get rid of the useless finalize() methods.
If you have other finalize() methods, consider getting rid of them. (Depending on finalization to do things is generally a bad idea ...)
Use memory profiler to try to identify the objects that are being leaked, and what is causing the leakage. There are lots of SO Questions ... and other resources on finding leaks in Java code. For example:
How to find a Java Memory Leak
Troubleshooting Guide for Java SE 6 with HotSpot VM, Chapter 3.
Now to your particular symptoms.
First of all, the place where the OutOfMemoryErrors were thrown is probably irrelevant.
However, the fact that you have huge numbers of AbstractReferenceMap$WeakRef and AbstractReferenceMap$ReferenceEntry objects is a string indication that something in your application or the libraries it is using is doing a huge amount of caching ... and that that caching is implicated in the problem. (The AbstractReferenceMap class is part of the Apache Commons Collections library. It is the superclass of ReferenceMap and ReferenceIdentityMap.)
You need to track down the map object (or objects) that those WeakRef and ReferenceEntry objects belong to, and the (target) objects that they refer to. Then you need to figure out what is creating it / them and figure out why the entries are not being cleared in response to the high memory demand.
Do you have strong references to the target objects elsewhere (which would stop the WeakRefs from being broken)?
Is / are the map(s) being used incorrectly so as to cause a leak. (Read the javadocs carefully ...)
Are the maps being used by multiple threads without external synchronization? That could result in corruption, which potentially could manifest as a massive storage leak.
Unfortunately, these are only theories and there could be other things causing this. And indeed, it is conceivable that this is not a memory leak at all.
Finally, your observation that the problem is worse when the heap is bigger. To me, this is still consistent with a Reference / cache-related issue.
Reference objects are more work for the GC than regular references.
When the GC needs to "break" a Reference, that creates more work; e.g. processing the Reference queues.
Even when that happens, the resulting unreachable objects still can't be collected until the next GC cycle at the earliest.
So I can see how a 6Gb heap full of References would significantly increase the percentage of time spent in the GC ... compared to a 4Gb heap, and that could cause the "GC Overhead Limit" mechanism to kick in earlier.
But I reckon that this is an incidental symptom rather than the root cause.
With a difficult debugging problem, you need to find a way to reproduce it. Only then will you be able to test experimental changes and determine if they make the problem better or worse. In this case, I'd try writing loops that rapidly create & delete server connections, that create a server connection and rapidly send it memory-expensive requests, etc.
After you can reproduce it, try reducing the heap size to see if you can reproduce it faster. But do that second since a small heap might not hit the "GC overhead limit" which means the GC is spending excessive time (98% by some measure) trying to recover memory.
For a memory leak, you need to figure out where in the code it's accumulating references to objects. E.g. does it build a Map of all incoming network requests?
A web search https://www.google.com/search?q=how+to+debug+java+memory+leaks shows many helpful articles on how to debug Java memory leaks, including tips on using tools like the Eclipse Memory Analyzer that you're using. A search for the specific error message https://www.google.com/search?q=GC+overhead+limit+exceeded is also helpful.
The no-op finalize() methods shouldn't cause this problem but they may well exacerbate it. The doc on finalize() reveals that having a finalize() method forces the GC to twice determine that the instance is unreferenced (before and after calling finalize()).
So once you can reproduce the problem, try deleting those no-op finalize() methods and see if the problem takes longer to reproduce.
It's significant that there are many AbstractReferenceMap$WeakRef instances in memory. The point of a weak reference is to refer to an object without forcing it to stay in memory. AbstractReferenceMap is a Map that lets one make the keys and/or values be weak references or soft references. (The point of a soft reference is to try to keep an object in memory but let the GC free it when memory gets low.) Anyway, all those WeakRef instances in memory are probably exacerbating the problem but shouldn't keep the referenced Map keys/values in memory. What are they referring to? What else is referring to those objects?
Try a tool that locates the leaks in your source code such as plumbr
There are a number of possibilities, perhaps some of which you've explored.
It's definitely a memory leak of some sort.
If your server has user sessions, and your user sessions aren't expiring or being disposed of properly when the user is inactive for more than X minutes/hours, you will get a buildup of used memory.
If you have one or more maps of something that your program generates, and you don't clear the map of old/unneeded entries, you could again get a buildup of used memory. For example, I once considered adding a map to keep track of process threads so that a user could get info from each thread, until my boss pointed out that at no point were finished threads getting removed from the map, so if the user stayed logged in and active, they would hold onto those threads forever.
You should try doing a load test on a non-production server where you simulate normal usage of your app by large numbers of users. Maybe even limit the server's memory even lower than usual.
Good luck, memory issues are a pain to track down.
You say that you have already tried jvisualvm, to inspect the machine. Maybe, try it again, like this:
This time look at the "Sampler -> Memory" tab.
It should tell you which (types of) objects occupy the most memory.
Then find out where such objects are usually created and removed.
A lot of times 'weird' errors can be caused by java agents plugged into the JVM. If you have any agents running (e.g. jrebel/liverebel, newrelic, jprofiler), try running without them first.
Weird things can also happen when running JVM with non-standard parameters (-XX); certain combinations are known to cause problems; which parameters are you using currently?
Memory leak can also be in Magnolia itself, have you tried googling "magnolia leak"? Are you using any 3rd-party magnolia modules? If possible, try disabling/removing them.
The problem might be connected to just one part of your You can try reproducing the problem by "replaying" your access logs on your staging/development server.
If nothing else works, if it were me, I would do the following:
- trying to replicate the problem on an "empty" Magnolia instance (without any of my code)
- trying to replicate the problem on an "empty" Magnolia instance (without 3rd party modules)
- trying to upgrade all software (magnolia, 3rd-party modules, JVM)
- finally try to run the production site with YourKit and try to find the leak
My guess is that you have automated import running which invokes some instance of ImportHandler. That handler is configured to make a backup of all the nodes it's going to update (I think this is default option), and since you have probably a lot of data in your data type, and since all of this is done in session you run out of memory. Try to find out which import job it is and disable backup for it.
HTH,
Jan
It appears that your memory leaks are emanating from your arrays. The garbage collector has trouble identifying object instances that were removed from arrays, therefore would not be collected for releasing of memory. My advice is when you do remove an object from an array, assign the former object's position to null, therefore the garbage collector can realize that it is a null object, and remove it. Doubt this will be your exact problem, but it is always good to know these things, and check if this is your problem.
It is also good to assign an object instance to null when you need to remove it/clean it up. This is because the finalize() method is sketchy and evil, and sometimes will not be called by the garbage collector. The best workaround for this is to call it (or another similar method) yourself. That way, you are assured that garbage cleanup was performed successfully. As Joshua Bloch said in his book: Effective Java, 2nd edition, Item 7, page 27: Avoid finalizers. "Finalizers are unpredictable, often dangerous and generally unnecessary". You can see the section here.
Because there is no code displayed, I cannot see if any of these methods can be useful, but it is still worth knowing these things. Hope these tips help you!
As recommended above, I'd get in touch with the devs of Magnolia, but meanwhile:
You are getting this error because the GC doesn't collect much on a run
The concurrent collector will throw an OutOfMemoryError if too much
time is being spent in garbage collection: if more than 98% of the
total time is spent in garbage collection and less than 2% of the heap
is recovered, an OutOfMemoryError will be thrown.
Since you can't change the implementation, I would recommend changing the config of the GC, in a way that runs less frequently, so it would be less likely to fail in this way.
Here is a example config just to get you started on the parameters, you would have to figure out your sweet spot. The logs of the GC will probably be of help for that
My VM params are as follow:
-Xms=6G
-Xmx=6G
-XX:MaxPermSize=1G
-XX:NewSize=2G
-XX:MaxTenuringThreshold=8
-XX:SurvivorRatio=7
-XX:+UseConcMarkSweepGC
-XX:+CMSClassUnloadingEnabled
-XX:+CMSPermGenSweepingEnabled
-XX:CMSInitiatingOccupancyFraction=60
-XX:+HeapDumpOnOutOfMemoryError
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution
-Xloggc:logs/gc.log
I am looking into how to use JConsole to detect memory leaks.
I see that in Memory Pool in my MBeans I can define UsageThreashold for my Tenured Generation.
So if my application exceeds this threashold the heap memory becomes red in the Memory tab.
Question: How does this help? I mean how am I supposed to use this setting to analyze my memory? How am I supposed to figure out this value?
In my opinion I don't think that UsageThreashold parameter is the most helpful for you to detect memory leaks (but if someone knows some tricks with it, please do share). In my experience that parameter is more helpful to visually understand if my application is getting way too near my max heap size and I'm in danger of getting an OutOfMemoryException.
Still regarding using JConsole to search for memory leaks, I don't think there's a silver bullet for the process. But what I usually do is the following:
If exists a memory leak, it means that the objects (the ones that are leaking) won't get collected, hence, your Tenured Generation won't fully recover after any amount of GCs.
With the application running I connect JConsole and try to spot a leak by observing the memory tab, if after several computations of my application and also after various GCs occurring (including pressing the Perform GC button, which will result in a full gc) the memory never goes below, or at least to the memory value, it started tracking there's a great possibility that something is leaking. When the leak is big, you can even see a "stair graph" pattern in your memory.
Keep in mind that if your application has long computations running, which may consume memory this analyzes must be done carefully. You must understand when those processes have finished. For example, just run one of those computations and track the total evolution of memory, before, during and afterwards.
Also, I suggest you to try visualVM instead, because it also allows you to create heap dumps, which you can use in order to understand which objects are still in memory and explore the references graph to understand why they are not being collected.
you can use JMAP to see the histogram and/or to create heap dumps and study your memory consumption with tools like Eclipse MAT or YourKit.
JConsole is used more for monitoring and running MBeans and less for analysis and in my expirence JVisualvm is better for that since you can use it for sampling your code and see what methods are CPU consuming.
If, on purpose, I create an application that crunches data while suffering from memory-leaks, I can notice that the memory as reported by, say:
Runtime.getRuntime().freeMemory()
starts oscillating between 1 and 2 MB of free memory.
The application then enters a loop that goes like this: GC, processing some data, GC, etc. but because the GC happens so often, the application basically isn't doing much else anymore. Even the GUI takes age to respond (and, no, I'm not talking about EDT issues here, it's really the VM basically stuck in some endless GC'ing mode).
And I was wondering: is there a way to programmatically detect that the JVM doesn't have enough memory anymore?
Note that I'm not talking about ouf-of-memory errors nor about detecting the memory leak itself.
I'm talking about detecting that an application is running so low on memory that it is basically calling the GC all the time, leaving hardly any time to do something else (in my hypothetical example: crunching data).
Would it work, for example, to repeatedly read how much memory is available during, say, one minute, and see that if the number has been "oscillating" between different values all below, say, 4 MB, conclude that there's been some leak and that the application has become unusable?
And I was wondering: is there a way to programmatically detect that the JVM doesn't have enough memory anymore?
I don't think so. You can find out roughly how much heap memory is free at any given instant, but AFAIK you cannot reliably determine when you are running out of memory. (Sure, you can do things like scraping the GC log files, or trying to pick patterns in the free memory oscillations. But these are likely to be unreliable and fragile in the face of JVM changes.)
However, there is another (and IMO better) approach.
In recent versions of Hotspot (version 1.6 and later, I believe), you can tune the JVM / GC so that it will give up and throw an OOME sooner. Specifically, the JVM can be configured to check that:
the ratio of free heap to total heap is greater than a given threshold after a full GC, and/or
the time spent running the GC is less than a certain percentage of the total.
The relevant JVM parameters are "UseGCOverheadLimit", "GCTimeLimit" and "GCHeapFreeLimit". Unfortunately, Hotspot's tuning parameters are not well documented on the public web, but these ones are all listed here.
Assuming that you want your application to do the sensible thing ... give up when it doesn't have enough memory to run properly anymore ... then just launch the JVM with a smaller "GCTimeLimitor" or "GCHeapFreeLimit" than the defaults.
EDIT
I've discovered that the MemoryPoolMXBean API allows you to look at the peak usage of individual memory pools (heaps), and set thresholds. However, I've never tried this, and the APIs have lots of hints that suggest that not all JVMs implement the full API. So, I would still recommend the HotSpot tuning option approach (see above) over this one.
You can use getHeapMemoryUsage.
I see two attack vectors.
Either monitor your memory consumption.
When you more or less constantly use lots of the available memory it is very likely that you have a memory leak (or are just using too much memory). The vm will constantly try to free some memory without much success => constant high memory usage.
You need to distinguish that from a large zigzag pattern which happens often without being an indicator of memory problem. Basically you use more an more memory, but when gc finds time to do its job it finds lots of garbage to bring out, so everything is fine.
The other attack vector is to monitor how often and what kind of success the gc runs. If it runs often with only small gains in memory, it is likely you have a problem.
I don't know if you can access this kind of information directly from your program. But if nothing else I think you can specify parameters on startup which makes the gc log information into a file which in turn could get parsed.
What you could do is spawn a thread that wakes up periodically and calculates the amount of used memory and records the result. Then you can do regression analysis on the result to estimate the rate of memory growth in your application. If you know the rate of growth, and the maximum amount of memory, you can predict (with some confidence) when your application will run out of memory.
You can pass arguments to your java virtual machine that gives you GC diagnostics such as
-verbose:gc This flag turns on the logging of GC information. Available
in all JVMs.
-XX:+PrintGCTimeStamps Prints the times at which the GCs happen
relative to the start of the
application.
If you capture that output in a file, in your application you can periodcly read that file and parse it to know when the GC has happened. So you can work out the average time between every GC
I think the JVM does exactly this for you and throws java.lang.OutOfMemoryError: GC overhead limit exceeded. So if you catch OutOfMemoryError and check for that message then you have what you want, don't you?
See this question for more details
i've been using plumbr for memory leak detection and it's been a great experience, though the licence is very expensive: http://plumbr.eu/
I've been tasked with debugging a Java (J2SE) application which after some period of activity begins to throw OutOfMemory exceptions. I am new to Java, but have programming experience. I'm interested in getting your opinions on what a good approach to diagnosing a problem like this might be?
This far I've employed JConsole to get a picture of what's going on. I have a hunch that there are object which are not being released properly and therefor not being cleaned up during garbage collection.
Are there any tools I might use to get a picture of the object ecosystem? Where would you start?
I'd start with a proper Java profiler. JConsole is free, but it's nowhere near as full featured as the ones that cost money. I used JProfiler, and it was well worth the money. See https://stackoverflow.com/questions/14762/please-recommend-a-java-profiler for more options and opinions.
Try the Eclipse Memory Analyzer, or any other tool that can process a java heap dump, and then run your app with the flap that generates a heap dump when you run out of memory.
Then analyze the heap dump and look for suspiciously high object counts.
See this article for more information on the heap dump.
EDIT: Also, please note that your app may just legitimately require more memory than you initially thought. You might try increasing the java minimum and maximum memory allocation to something significantly larger first and see if your application runs indefinitely or simply gets slightly further.
The latest version of the Sun JDK includes VisualVM which is essentially the Netbeans profiler by itself. It works really well.
http://www.yourkit.com/download/index.jsp is the only tool you'll need.
You can take snapshots at (1) app start time, and (2) after running app for N amount of time, then comparing the snapshots to see where memory gets allocated. It will also take a snapshot on OutOfMemoryError so you can compare this snapshot with (1).
For instance, the latest project I had to troubleshoot threw OutOfMemoryError exceptions, and after firing up YourKit I realised that most memory were in fact being allocated to some ehcache "LFU " class, the point being that we specified loads of a certain POJO to be cached in memory, but us not specifying enough -Xms and -Xmx (starting- and max- JVM memory allocation).
I've also used Linux's vmstat e.g. some Linux platforms just don't have enough swap enabled, or don't allocate contiguous blocks of memory, and then there's jstat (bundled with JDK).
UPDATE see https://stackoverflow.com/questions/14762/please-recommend-a-java-profiler
You can also add an "UnhandledExceptionHandler" to your Application's Thread. This will catch 'uncaught' exception, like an out of memory error, and you will at least have an idea where the exception was thrown. Usually this not were the problem is but the 'new' that couldn't be satisfied. As a rule I always add the UnhandledExceptionHandler to a Thread if nothing else to add logging.