I have memory leak in the web application (servlet) that I am working on. I am suspicious of 1 cause and wanted to hear your ideas about it.
I use hashmaps,hashsets etc. as DB (about 20MB data loaded). These maps,sets get reloaded once in every 10 min. There are huge amount of simultaneous requests. I read that, GC passes objects, that are not collected for a time period/cycle, to a generation (old and permanent generations) which is less checked or garbage collected. I think that my usage for static maps,sets is causing me leak problem. What do you think ?
As Romain noted, the static map is a suspect. If for some reason you can't regularly clean it up explicitly, you may consider using a WeakHashMap instead, which is
A hashtable-based Map implementation with weak keys. An entry in a WeakHashMap will automatically be removed when its key is no longer in ordinary use. More precisely, the presence of a mapping for a given key will not prevent the key from being discarded by the garbage collector, that is, made finalizable, finalized, and then reclaimed. When a key has been discarded its entry is effectively removed from the map, so this class behaves somewhat differently than other Map implementations.
Unfortunately, as of Java6 there seems to be no WeakHashSet in the standard library, but several implementations can be found on the net.
It is not a leak if you have removed all references to it. If you're clearing out your map entirely, then it's not the source of a leak. You should consider the fact that the JVM chooses not to GC tenured generation very often as irrelevant to you - all that matters is that you don't have a reference to it, so the JVM could GC it if it wants to.
There are different strategies that JVMs can use to manage GC, so I'm speaking in generalities instead of specifics here, but GCing tenured spaces tends to be very expensive and has a high impact on the application, so the JVM chooses not to do it often in general.
If you're looking at the amount of heap space used, you'll see a sawtooth pattern as items are added and eventually collected. Don't be concerned about where the top of the sawtooth is, be concerned about where the bottom is (and how close to the maximum amount of heap space available that is).
One way to test if it really is a leak is to load test your app for a long period of time. If you have a leak, the base amount of memory that your app is using will go up over time (the bottom of the sawtooth). If you don't, it will remain constant. If you do have a leak, you can use a profiler to help you find it.
Static Maps are a known source of leaks. The reason being that people put stuff in and do not remove them. If every ten minutes you simply clear the cache and then reload then you should be fine.
I would bet that you are not clearing it properly. The GC part is working properly, I would not worry that it is the issue.
You may also want to consider using WeakReference if you have some way of falling back to the real data if part of your cache is GC-ed but then subsequently required.
I suggest that you check the heap contents using a heap dump and a heap analyzer (such as JVisualVM). This will help you find leakage suspects. The fact that the old generation is collected less frequently does not mean that more memory is leaking; remember that while it may seem full, only a portion of it represents live objects, and the other portion is clearned by the next major GC. Like others said, the problem may be because of incomplete cleaning of the static collections.
The permanent generation never receives promoted objects. It is an out-of-heap area reserved for other purposes, such as reflective information of loaded classes, and interned strings.
Related
I have a memory leak in Java in which I have 9600 ImapClients in my heap dump and only 7800 MonitoringTasks. This is a problem since every ImapClient should be owned by a MonitoringTask, so those extra 1800 ImapClients are leaked.
One problem is I can't isolate them in the heap dump and see what's keeping them alive. So far I've only been able to pinpoint them by using external evidence to guess at which ImapClients are dangling. I'm learning OQL which I believe can solve this but it's coming slowly, and it'll take a while before I can understand how to perform something recursive like this in a new query language.
Determining a leak exists is difficult, so here is my full situation:
this process was spewing OOMEs a week ago. I thought I fixed it and I'm trying to verify whether my fixed worked without waiting another full week to see if it spews OOMEs again.
This task creates 7000-9000 ImapClients on start then under normal operation connects and disconnects very few of them.
I checked another process running older pre-OOME code, and it showed numbers of 9000/9100 instead of 7800/9600. I do not know why old code will be different from new code but this is evidence of a leak.
The point of this question is so I can determine if there is a leak. There is a business rule that every ImapClient should be a referee of a MonitoringTask. If this query I am asking about comes up empty, there is not a leak. If it comes up with objects, together with this business rule, it is not only evidence of a leak but conclusive proof of one.
Your expectations are incorrect, there is no actual evidence of any leaks occuring
The Garbage Collector's goal is to free space when it is needed and
only then, anything else is a waste of resources. There is absolutely
no benefit in attempting to keep as much free space as possible
available all the time and only down sides.
Just because something is a candidate for garbage collection doesn't
mean it will ever actually be collected, and there is no way to
force garbage collection either.
I don't see any mention of OutOfMemoryError anywhere.
What you are concerned about you can't control, not directly anyway
What you should focus on is what in in your control, which is making sure you don't hold on to references longer than you need to, and that you are not duplicating things unnecessarily. The garbage collection routines in Java are highly optimized, and if you learn how their algorithms work, you can make sure your program behaves in the optimal way for those algorithms to work.
Java Heap Memory isn't like manually managed memory in other languages, those rules don't apply
What are considered memory leaks in other languages aren't the same thing/root cause as in Java with its garbage collection system.
Most likely in Java memory isn't consumed by one single uber-object that is leaking ( dangling reference in other environments ).
Intermediate objects may be held around longer than expected by the garbage collector because of the scope they are in and lots of other things that can vary at run time.
EXAMPLE: the garbage collector may decide that there are candidates, but because it considers that there is plenty of memory still to be had that it might be too expensive time wise to flush them out at that point in time, and it will wait until memory pressure gets higher.
The garbage collector is really good now, but it isn't magic, if you are doing degenerate things, it will cause it to not work optimally. There is lots of documentation on the internet about the garbage collector settings for all the versions of the JVMs.
These un-referenced objects may just have not reached the time that the garbage collector thinks it needs them to for them to be expunged from memory, or there could be references to them held by some other object ( List ) for example that you don't realize still points to that object. This is what is most commonly referred to as a leak in Java, which is a reference leak more specifically.
I don't see any mention of OutOfMemoryError
You probably don't have a problem in your code, the garbage collection system just might not be getting put under enough pressure to kick in and deallocate objects that you think it should be cleaning up. What you think is a problem probably isn't, not unless your program is crashing with OutOfMemoryError. This isn't C, C++, Objective-C, or any other manual memory management language / runtime. You don't get to decide what is in memory or not at the detail level you are expecting you should be able to.
Check your code for finalizers, especially anything relating to IMapclient.
It could be that your MonitoringTasks are being easily collected whereas your IMapclient's are finalized, and therefore stay on the heap (though dead) until the finalizer thread runs.
The obvious answer is to add a WeakHashMap<X, Object> (and Y) to your code -- one tracking all instances of X and another tracking all instances of Y (make them static members of the class and insert every object into the map in the constructor with a null 'value'). Then you can at any time iterate over these maps to find all live instances of X and Y and see which Xs are not referenced by Ys. You might want to trigger a full GC first, to ignore objects that are dead and not yet collected.
I've been battling some memory leaks, and I'm currently baffled by this issue. There's a web application classloader that was supposed to be garbage collected, but it isn't (even after I fixed several leaks). I dumped the heap with jmap and browsed it with jhat, found the classloader and checked the rootset references.
If I exclude weak refs, the list is empty! How is that possible, since an object held only by weak references should get garbage collected? (I performed GC many times in jconsole)
If I include weak refs, I get a list of references, all of which come from one of the following fields:
java.lang.reflect.Proxy.loaderToCache
java.lang.reflect.Proxy.proxyClasses
java.io.ObjectStreamClass$Caches.localDescs
java.io.ObjectStreamClass$Caches.reflectors
java.lang.ref.Finalizer.unfinalized
I couldn't find any reason why any of those references should prevent garbage collecting the classloader. Is it a gc bug? Special undocumented case? jmap/jhat bug? Or what?
And the weirdest thing... after sitting idle and gc-ing from time to time for about 40 min, without changing anything, it finally decided to unload classes and collect the classloader.
Note:
If you make a claim about delayed collection of classloaders or weak references, then please specify the circumstances in which it happens, and ideally:
provide a link to an authoritative article that supports your claim
provide a sample program that demonstrates the behavior
If you think the behavior is implementation-dependent, then please focus on what happens in the oracle or icedtea jvm, version 6 or 7 (pick any one of them and be specific).
I'd really like to get to the bottom of this. I actually put some effort into reproducing the issue in a test program, and I failed - the classloader was instantly collected on System.gc() every time unless there was a strong reference to it.
It looks like there's a soft reference involved somewhere. That's the only explanation I could find for the delayed collection (about 40 min). I initially thought soft references were kept until the memory runs out, but I found that that's not the case.
From this page: "softly reachable objects will remain alive for some amount of time after the last time they were referenced. The default value is one second of lifetime per free megabyte in the heap. This value can be adjusted using the -XX:SoftRefLRUPolicyMSPerMB flag"
So I adjusted that flag to 1, and the classloader was collected within seconds!!
I think the soft reference comes from ObjectStreamClass. The question is why jhat doesn't show it in the rootset references. Is it because it's neither strong nor weak? Or because it already found weak references from the same static fields? Or some other reason? Either way, I think this needs to be improved in jhat.
Classes reside in special memory space - permanent generation. To unload classloader. GC should choose to include perm space into scope of collection. Different GC algorithms have a little different behavior, but generally GC will try to avoid perm space collection.
In my experience, even if classloader is not reachable JVM may end up with OutOfMemoryError before it would try to collect PERM space.
I am building a Java web app, using the Play! Framework. I'm hosting it on playapps.net. I have been puzzling for a while over the provided graphs of memory consumption. Here is a sample:
The graph comes from a period of consistent but nominal activity. I did nothing to trigger the falloff in memory, so I presume this occurred because the garbage collector ran as it has almost reached its allowable memory consumption.
My questions:
Is it fair for me to assume that my application does not have a memory leak, as it appears that all the memory is correctly reclaimed by the garbage collector when it does run?
(from the title) Why is java waiting until the last possible second to run the garbage collector? I am seeing significant performance degradation as the memory consumption grows to the top fourth of the graph.
If my assertions above are correct, then how can I go about fixing this issue? The other posts I have read on SO seem opposed to calls to System.gc(), ranging from neutral ("it's only a request to run GC, so the JVM may just ignore you") to outright opposed ("code that relies on System.gc() is fundamentally broken"). Or am I off base here, and I should be looking for defects in my own code that is causing this behavior and intermittent performance loss?
UPDATE
I have opened a discussion on PlayApps.net pointing to this question and mentioning some of the points here; specifically #Affe's comment regarding the settings for a full GC being set very conservatively, and #G_H's comment about settings for the initial and max heap size.
Here's a link to the discussion, though you unfortunately need a playapps account to view it.
I will report the feedback here when I get it; thanks so much everyone for your answers, I've already learned a great deal from them!
Resolution
Playapps support, which is still great, didn't have many suggestions for me, their only thought being that if I was using the cache extensively this may be keeping objects alive longer than need be, but that isn't the case. I still learned a ton (woo hoo!), and I gave #Ryan Amos the green check as I took his suggestion of calling System.gc() every half day, which for now is working fine.
Any detailed answer is going to depend on which garbage collector you're using, but there are some things that are basically the same across all (modern, sun/oracle) GCs.
Every time you see the usage in the graph go down, that is a garbage collection. The only way heap gets freed is through garbage collection. The thing is there are two types of garbage collections, minor and full. The heap gets divided into two basic "areas." Young and tenured. (There are lots more subgroups in reality.) Anything that is taking up space in Young and is still in use when the minor GC comes along to free up some memory, is going to get 'promoted' into tenured. Once something makes the leap into tenured, it sits around indefinitely until the heap has no free space and a full garbage collection is necessary.
So one interpretation of that graph is that your young generation is fairly small (by default it can be a fairly small % of total heap on some JVMs) and you're keeping objects "alive" for comparatively very long times. (perhaps you're holding references to them in the web session?) So your objects are 'surviving' garbage collections until they get promoted into tenured space, where they stick around indefinitely until the JVM is well and good truly out of memory.
Again, that's just one common situation that fits with the data you have. Would need full details about the JVM configuration and the GC logs to really tell for sure what's going on.
Java won't run the garbage cleaner until it has to, because the garbage cleaner slows things down quite a bit and shouldn't be run that frequently. I think you would be OK to schedule a cleaning more frequently, such as every 3 hours. If an application never consumes full memory, there should be no reason to ever run the garbage cleaner, which is why Java only runs it when the memory is very high.
So basically, don't worry about what others say: do what works best. If you find performance improvements from running the garbage cleaner at 66% memory, do it.
I am noticing that the graph isn't sloping strictly upward until the drop, but has smaller local variations. Although I'm not certain, I don't think memory use would show these small drops if there was no garbage collection going on.
There are minor and major collections in Java. Minor collections occur frequently, whereas major collections are rarer and diminish performance more. Minor collections probably tend to sweep up stuff like short-lived object instances created within methods. A major collection will remove a lot more, which is what probably happened at the end of your graph.
Now, some answers that were posted while I'm typing this give good explanations regarding the differences in garbage collectors, object generations and more. But that still doesn't explain why it would take so absurdly long (nearly 24 hours) before a serious cleaning is done.
Two things of interest that can be set for a JVM at startup are the maximum allowed heap size, and the initial heap size. The maximum is a hard limit, once you reach that, further garbage collection doesn't reduce memory usage and if you need to allocate new space for objects or other data, you'll get an OutOfMemoryError. However, internally there's a soft limit as well: the current heap size. A JVM doesn't immediately gobble up the maximum amount of memory. Instead, it starts at your initial heap size and then increases the heap when it's needed. Think of it a bit as the RAM of your JVM, that can increase dynamically.
If the actual memory use of your application starts to reach the current heap size, a garbage collection will typically be instigated. This might reduce the memory use, so an increase in heap size isn't needed. But it's also possible that the application currently does need all that memory and would exceed the heap size. In that case, it is increased provided that it hasn't already reached the maximum set limit.
Now, what might be your case is that the initial heap size is set to the same value as the maximum. Suppose that would be so, then the JVM will immediately seize all that memory. It will take a very long time before the application has accumulated enough garbage to reach the heap size in memory usage. But at that moment you'll see a large collection. Starting with a small enough heap and allowing it to grow keeps the memory use limited to what's needed.
This is assuming that your graph shows heap use and not allocated heap size. If that's not the case and you are actually seeing the heap itself grow like this, something else is going on. I'll admit I'm not savvy enough regarding the internals of garbage collection and its scheduling to be absolutely certain of what's happening here, most of this is from observation of leaking applications in profilers. So if I've provided faulty info, I'll take this answer down.
As you might have noticed, this does not affect you. The garbage collection only kicks in if the JVM feels there is a need for it to run and this happens for the sake of optimization, there's no use of doing many small collections if you can make a single full collection and do a full cleanup.
The current JVM contains some really interesting algorithms and the garbage collection itself id divided into 3 different regions, you can find a lot more about this here, here's a sample:
Three types of collection algorithms
The HotSpot JVM provides three GC algorithms, each tuned for a specific type of collection within a specific generation. The copy (also known as scavenge) collection quickly cleans up short-lived objects in the new generation heap. The mark-compact algorithm employs a slower, more robust technique to collect longer-lived objects in the old generation heap. The incremental algorithm attempts to improve old generation collection by performing robust GC while minimizing pauses.
Copy/scavenge collection
Using the copy algorithm, the JVM reclaims most objects in the new generation object space (also known as eden) simply by making small scavenges -- a Java term for collecting and removing refuse. Longer-lived objects are ultimately copied, or tenured, into the old object space.
Mark-compact collection
As more objects become tenured, the old object space begins to reach maximum occupancy. The mark-compact algorithm, used to collect objects in the old object space, has different requirements than the copy collection algorithm used in the new object space.
The mark-compact algorithm first scans all objects, marking all reachable objects. It then compacts all remaining gaps of dead objects. The mark-compact algorithm occupies more time than the copy collection algorithm; however, it requires less memory and eliminates memory fragmentation.
Incremental (train) collection
The new generation copy/scavenge and the old generation mark-compact algorithms can't eliminate all JVM pauses. Such pauses are proportional to the number of live objects. To address the need for pauseless GC, the HotSpot JVM also offers incremental, or train, collection.
Incremental collection breaks up old object collection pauses into many tiny pauses even with large object areas. Instead of just a new and an old generation, this algorithm has a middle generation comprising many small spaces. There is some overhead associated with incremental collection; you might see as much as a 10-percent speed degradation.
The -Xincgc and -Xnoincgc parameters control how you use incremental collection. The next release of HotSpot JVM, version 1.4, will attempt continuous, pauseless GC that will probably be a variation of the incremental algorithm. I won't discuss incremental collection since it will soon change.
This generational garbage collector is one of the most efficient solutions we have for the problem nowadays.
I had an app that produced a graph like that and acted as you describe. I was using the CMS collector (-XX:+UseConcMarkSweepGC). Here is what was going on in my case.
I did not have enough memory configured for the application, so over time I was running into fragmentation problems in the heap. This caused GCs with greater and greater frequency, but it did not actually throw an OOME or fail out of CMS to the serial collector (which it is supposed to do in that case) because the stats it keeps only count application paused time (GC blocks the world), application concurrent time (GC runs with application threads) is ignored for those calculations. I tuned some parameters, mainly gave it a whole crap load more heap (with a very large new space), set -XX:CMSFullGCsBeforeCompaction=1, and the problem stopped occurring.
Probably you do have memory leaks that's cleared every 24 hours.
I have this class and I'm testing insertions with different data distributions. I'm doing this in my code:
...
AVLTree tree = new AVLTree();
//insert the data from the first distribution
//get results
...
tree = new AVLTree();
//inser the data from the next distribution
//get results
...
I'm doing this for 3 distributions. Each one should be tested an average of 14 times, and the 2 lowest/highest values removed from to compute the average. This should be done 2000 times, each time for 1000 elements. In other words, it goes 1000, 2000, 3000, ..., 2000000.
The problem is, I can only get as far as 100000. When I tried 200000, I ran out of heap space. I increased the available heap space with -Xmx in the command line to 1024m and it didn't even complete the tests with 200000. I tried 2048m and again, it wouldn't work.
What I'm thinking is that the garbage collector isn't getting rid of the old trees once I do tree = new AVL Tree(). But why? I thought that the elements from the old trees would no longer be accessible and their memory would be cleaned up.
The garbage collector should have no trouble cleaning up your old tree objects, so I can only assume there's some other allocation that you're doing that's not being cleaned up.
Java has a good tool to watch the GC in progress (or not in your case), JVisualVM, which comes with the JDK.
Just run that and it will show you which objects are taking up the heap, and you can both trigger and see the progress of GC's. Then you can target those for pools so they can be re-used by you, saving the GC the work.
Also look into this option, which will probably stop the error you're getting that stops the program, and you program will finish, but it may take a long time because your app will fill up the heap then run very slowly.
-XX:-UseGCOverheadLimit
Which JVM you are using and what JVM parameters you have used to configure GC?
Your explaination shows there is a memory leak in your code. If you have any tool like jprofiler then use it to find out where is the memory leak.
There's no reason those trees shouldn't be collected, although I'd expect that before you ran out of memory you should see long pauses as the system ran a full GC. As it's been noted here that that's not what you're seeing, you could try running with flags like -XX:-PrintGC, -XX:-PrintGCDetails,-XX:-PrintGCTimeStamps to give you some more information on exactly what's going on, along with perhaps some sort of running count of roughly where you are. You could also explicitly tell the garbage collector to use a different garbage-collection algorithm.
However, it still seems unlikely to me. What other code is running? is it possible there's something in the AVLTree class itself that's keeping its instances from being GC'd? What about manually logging the finalize() on that class to insure that (some of them, at least) are collectible (e.g. make a few and manually call System.gc())?
GC params here, a nice ref on garbage collection from sun here that's well worth reading.
The Java garbage collector isn't guaranteed to garbage collect after each object's refcount becomes zero. So if you're writing code that is only creating and deleting a lot of objects, it's possible to expend all of the heap space before the gc has a chance to run. Alternatively, Pax's suggestion that there is a memory leak in your code is also a strong possibility.
If you are only doing benchmarking, then you may want to use the java gc function (in the System class I think) between tests, or even re-run you program for each distribution.
We noticed this in a server product. When making a lot of tiny objects that quickly get thrown away, the garbage collector can't keep up. The problem is more pronounced when the tiny objects have pointers to larger objects (e.g. an object that points to a large char[]). The GC doesn't seem to realize that if it frees up the tiny object, it can then free the larger object. Even when calling System.gc() directly, this was still a huge problem (both in 1.5 and 1.6 VMs)!
What we ended up doing and what I recommend to you is to maintain a pool of objects. When your object is no longer needed, throw it into the pool. When you need a new object, grab one from the pool or allocate a new one if the pool is empty. This will also save a small amount of time over pure allocation because Java doesn't have to clear (bzero) the object.
If you're worried about the pool getting too large (and thus wasting memory), you can either remove an arbitrary number of objects from the pool on a regular basis, or use weak references (for example, using java.util.WeakHashMap). One of the advantages of using a pool is that you can track the allocation frequency and totals, and you can adjust things accordingly.
We're using pools of char[] and byte[], and we maintain separate "bins" of sizes in the pool (for example, we always allocate arrays of size that are powers of two). Our product does a lot of string building, and using pools showed significant performance improvements.
Note: In general, the GC does a fine job. We just noticed that with small objects that point to larger structures, the GC doesn't seem to clean up the objects fast enough especially when the VM is under CPU load. Also, System.gc() is just a hint to help schedule the finalizer thread to do more work. Calling it too frequently causes a significant performance hit.
Given that you're just doing this for testing purposes, it might just be good housekeeping to invoke the garbage collector directly using System.gc() (thus forcing it to make a pass). It won't help you if there is a memory leak, but if there isn't, it might buy you back enough memory to get through your test.
I have a cache which has soft references to the cached objects. I am trying to write a functional test for behavior of classes which use the cache specifically for what happens when the cached objects are cleared.
The problem is: I can't seem to reliably get the soft references to be cleared. Simply using up a bunch of memory doesn't do the trick: I get an OutOfMemory before any soft references are cleared.
Is there any way to get Java to more eagerly clear up the soft references?
Found here:
"It is guaranteed though that all
SoftReferences will get cleared before
OutOfMemoryError is thrown, so they
theoretically can't cause an OOME."
So does this mean that the above scenario MUST mean I have a memory leak somewhere with some class holding a hard reference on my cached object?
The problem is: I can't seem to
reliably get the soft references to be
cleared.
This is not unique to SoftReferences. Due to the nature of garbage collection in Java, there is no guarantee that anything that is garbage-collectable will actually be collected at any point in time. Even with a simple bit of code:
Object temp = new Object();
temp = null;
System.gc();
there is no guarantee that the Object instantiated in the first line is garbage collected at this, or in fact any point. It's simply one of the things you have to live with in a memory-managed language, you're giving up declarative power over these things. And yes, that can make it hard to definitively test for memory leaks at times.
That said, as per the Javadocs you quoted, SoftReferences should definitely be cleared before an OutOfMemoryError is thrown (in fact, that's the entire point of them and the only way they differ from the default object references). It would thus sound like there is some sort of memory leak in that you're holding onto harder references to the objects in question.
If you use the -XX:+HeapDumpOnOutOfMemoryError option to the JVM, and then load the heap dump into something like jhat, you should be able to see all the references to your objects and thus see if there are any references beside your soft ones. Alternatively you can achieve the same thing with a profiler while the test is running.
There is also the following JVM parameter for tuning how soft references are handled:
-XX:SoftRefLRUPolicyMSPerMB=<value>
Where 'value' is the number of milliseconds a soft reference will remain for every free Mb of memory. The default is 1s/Mb, so if an object is only soft reachable it will last 1s if only 1Mb of heap space is free.
You can force all SoftReferences to be cleared in your tests with this piece of code.
If you really wanted to, you can call clear() on your SoftReference to clear it.
That said, if the JVM is throwing an OutOfMemoryError and your SoftReference has not been cleared yet, then this means that you must have a hard reference to the object somewhere else. To do otherwise would invalidate the contract of SoftReference. Otherwise, you are never guaranteed that the SoftReference is cleared: as long as there is still memory available, the JVM does not need to clear any SoftReferences. On the other hand, it is allowed to clear them next time it does a GC cycle, even if it doesn't need to.
Also, you can consider looking into WeakReferences since the VM tends to be more aggressive in clear them. Technically, the VM isn't ever required to clear a WeakReference, but it is supposed to clean them up next time it does a GC cycle if the object would otherwise be considered dead. If your are trying to test what happens when your cache is cleared, using WeakReferences should help your entries go away faster.
Also, remember that both of these are dependent on the JVM doing a GC cycle. Unfortunately, there is no way to guarantee that one of those ever happens. Even if you call System.gc(), the garbage collector may decide that it is doing just peachy and choose to do nothing.
In a typical JVM implementation (SUN) you need to trigger a Full GC more than once to get the Softreferences cleaned. The reason for that is because Softreferences require the GC to do more work, because for example of a mechanism that allows you to get notified when the objects are reclaimed.
IMHO using a lot of sofreferences in an application server is evil, because the developer has not much control over when they are released.
Garbage collection and other references like soft references are non deterministic this it's not really possible to reliable do stuff so that soft references are definitely cleared at that point so your test can judge how yourcache reacts. I would suggest you simulate the reference clearing in more definite way by mocking etc - your tests will be reproducable and more valuable rather than just Hopi g for the GC to clean up references. Using the latter approach is a really bad thing to do and willjust introduce additional problems rather than help you improve the quality of your cache and it's collaborating components.
From the documentation and my experience I'd say yes: you must have a reference somewhere else.
I'd suggest using a debugger that can show you all references to an object (such as Eclipse 3.4 when debugging Java 6) and just check when the OOM is thrown.
If you use eclipse, there is this tool named Memory Analyzer that makes heap dump debugging easier.
Does the cached object have a finalizer? The finalizer will create new strong references to the object, so even if the SoftReference is cleared the memory will not be reclaimed until a later GC cycle
If you have a cache which is a Map of SoftReferences and you want them cleared you can just clear() the map and they will all be cleaned up (including their references)