Reducing Java heap size

Reducing Java heap size - java

I have an application that uses a lot of memory diff'ing the contents of two potentially huge (100k+) directories. It makes sense to me that such an operation would use a lot of memory, but once my diff'ing operation is done, the heap remains the same size.
I basically have code that instantiates a class to store the filename, file size, path, and modification date for each file on the source and target. I save the additions, deletions, and updates in other arrays. I then clear() my source and target arrays (which could be 100k+ each by now), leaving relatively small additions, deletions, and updates arrays left.
After I clear() my target and source arrays though, the memory usage (as visible via VirtualVM and Windows Task Manager) doesn't drop. I'm not experienced enough with VirtualVM (or any profiler for that matter) to figure out what is taking up all this memory. VirtualVM's heap dump lists the top few objects with a retained size of a few megabytes.
Anything to help point me in the right direction?

If the used heap goes down after a Garbage Collection, than it likely works as expected. Java increases its heap when it needs more memory, but does not free it -- it prefers to keep it in case the application uses more memory again. See Is there a way to lower Java heap when not in use? for this topic on why the heap is not reduced after the used heap amount lowers.

The VM grows or shrinks the heap based on the command-line parameters -XX:MinHeapFreeRatio and -XX:MaxHeapFreeRatio. It will shrink the heap when the free percentage hits -XX:MaxHeapFreeRatio, whose default is 70.
There is a short discussion of this in Oracle's bug #6498735.

Depending on your code you might be generating memory leaks and the Garbage collector just can't free them up.
I would suggest to instrument your code in order to find potential memory leaks. Once this is ruled out or fixed, I would start to look at the code itself for possible improvement.
Note that for instance if you use the try/catch/finally block. The finally block might not be called at all (or at least not immediately). If you do some resource freeing in a finally block this might be the answer.
Nevertheless read up on the subject, for instance here: http://www.toptal.com/java/hunting-memory-leaks-in-java

Related

Java Heap Size Reduction

BACKGROUND
I recently wrote a java application that consumes a specified amount of MB. I am doing this purposefully to see how another Java application reacts to specific RAM loads (I am sure there are tools for this purpose, but this was the fastest). The memory consumer app is very simple. I enter the number of MB I want to consume and create a vector of that many bytes. I also have a reset button that removes the elements of the vector and prompts for a new number of bytes.
QUESTION
I noticed that the heap size of the java process never reduces once the vector is cleared. I tried clear(), but the heap remains the same size. It seems like the heap grows with the elements, but even though the elements are removed the size remains. Is there a way in java code to reduce heap size? Is there a detail about the java heap that I am missing? I feel like this is an important question because if I wanted to keep a low memory footprint in any java application, I would need a way to keep the heap size from growing or at least not large for long lengths of time.

Try garbage collection by making call to System.gc()
This might help you - When does System.gc() do anything
Calling GC extensively is not recommended.

You should provide max heap size with -Xmx option, and watch memory allocation by you app. Also use weak references for objects which have short time lifecycle and GC remove them automatically.

Why is the default size of PermGen so small?

What would be the purpose of limiting the size of the Permgen space on a Java JVM? Why not always set it equal to the max heap size? Why does Java default to such a small number of 64MB? Are they trying to force people to notice permgen issues in their code by doing this?
If my app uses 85MB of permgen, then it might be safe to set it to 96MB but why set it so small if its just really part of the main heap? Wouldn't it be efficient to allow the JVM to use as much PermGen as the heap allows?

The PermGen is set to disappear in JDK8.
What would be the purpose of limiting the size of the Permgen space on a Java JVM?
Not exhausting resources.
Why not always set it equal to the max heap size?
The PermGen is not part of the Java heap. Besides, even if it was, it wouldn't be of much help to the application to fill the heap with class metadata and constant Strings, since you'd then get "OutOfMemoryError: Java heap size" errors instead.

Conceptually to the programmer, you could argue that a "Permanent Generation" is largely pointless. If you need to load a class or other "permanent" data and there is memory space left, then in principle you may as well just load it somewhere and not care about calling the aggregate of these items a "generation" at all.
However, the rationale is probably more that:
there is potentially a benefit (e.g. from a processor cache point of view) from having all code/class metadata near together in memory space, and to guarantee this it is easier to allocate fixed sized area(s);
similarly, memory space where code/class metadata is stored potentially has certain "special" properties (notably, you don't want it to get paged out to disk if you can help it) and the system may not be able to set such properties on memory in a very granular way, so that it is more practical to have all "special" objects together in one (or a small number of) contiguous block or memory space;
having permanent objects all together helps avoid fragmenting the remaining memory space and again, the most practical way to do this is to allocate one contiguous block of memory of fixed size from the outset.
So as I see things, most of the time the reason for allocating a permanent "generation" is really for practical implementation reasons than because the programmer really cares terribly much.
On the other hand, the situation isn't usually terrible for the programmer either: the amount of permanent generation needed is usually predictable, so that you should be able to allocate the required amount with decent leeway. So if you find you are unexpectedly exceeding the allocation, this may well be a signal that "something serious is wrong".
N.B. It is probably the case that some of the issues that the PermGen originally was designed to solve are not such big issues on modern 64-bit processors with larger processor caches. If it is removed in future releases of Java, this is likely a sign that the JVM designers feel it has now "served its purpose".

PermGen is where class data and other static stuff (like string literals) are allocated.
You'd rather allocate memory to the Java heap for your application data (Xms and Xmx, where young (short-lived) and tenured objects go (when the the JVM realizes they need to stay around longer)).
So the historic PermGen 64MB default may be arbitrary but the having you explicitly set it lets you know (and control) how much static data your application is causing the JVM to store.

Why does java wait so long to run the garbage collector?

I am building a Java web app, using the Play! Framework. I'm hosting it on playapps.net. I have been puzzling for a while over the provided graphs of memory consumption. Here is a sample:
The graph comes from a period of consistent but nominal activity. I did nothing to trigger the falloff in memory, so I presume this occurred because the garbage collector ran as it has almost reached its allowable memory consumption.
My questions:
Is it fair for me to assume that my application does not have a memory leak, as it appears that all the memory is correctly reclaimed by the garbage collector when it does run?
(from the title) Why is java waiting until the last possible second to run the garbage collector? I am seeing significant performance degradation as the memory consumption grows to the top fourth of the graph.
If my assertions above are correct, then how can I go about fixing this issue? The other posts I have read on SO seem opposed to calls to System.gc(), ranging from neutral ("it's only a request to run GC, so the JVM may just ignore you") to outright opposed ("code that relies on System.gc() is fundamentally broken"). Or am I off base here, and I should be looking for defects in my own code that is causing this behavior and intermittent performance loss?
UPDATE
I have opened a discussion on PlayApps.net pointing to this question and mentioning some of the points here; specifically #Affe's comment regarding the settings for a full GC being set very conservatively, and #G_H's comment about settings for the initial and max heap size.
Here's a link to the discussion, though you unfortunately need a playapps account to view it.
I will report the feedback here when I get it; thanks so much everyone for your answers, I've already learned a great deal from them!
Resolution
Playapps support, which is still great, didn't have many suggestions for me, their only thought being that if I was using the cache extensively this may be keeping objects alive longer than need be, but that isn't the case. I still learned a ton (woo hoo!), and I gave #Ryan Amos the green check as I took his suggestion of calling System.gc() every half day, which for now is working fine.

Any detailed answer is going to depend on which garbage collector you're using, but there are some things that are basically the same across all (modern, sun/oracle) GCs.
Every time you see the usage in the graph go down, that is a garbage collection. The only way heap gets freed is through garbage collection. The thing is there are two types of garbage collections, minor and full. The heap gets divided into two basic "areas." Young and tenured. (There are lots more subgroups in reality.) Anything that is taking up space in Young and is still in use when the minor GC comes along to free up some memory, is going to get 'promoted' into tenured. Once something makes the leap into tenured, it sits around indefinitely until the heap has no free space and a full garbage collection is necessary.
So one interpretation of that graph is that your young generation is fairly small (by default it can be a fairly small % of total heap on some JVMs) and you're keeping objects "alive" for comparatively very long times. (perhaps you're holding references to them in the web session?) So your objects are 'surviving' garbage collections until they get promoted into tenured space, where they stick around indefinitely until the JVM is well and good truly out of memory.
Again, that's just one common situation that fits with the data you have. Would need full details about the JVM configuration and the GC logs to really tell for sure what's going on.

Java won't run the garbage cleaner until it has to, because the garbage cleaner slows things down quite a bit and shouldn't be run that frequently. I think you would be OK to schedule a cleaning more frequently, such as every 3 hours. If an application never consumes full memory, there should be no reason to ever run the garbage cleaner, which is why Java only runs it when the memory is very high.
So basically, don't worry about what others say: do what works best. If you find performance improvements from running the garbage cleaner at 66% memory, do it.

I am noticing that the graph isn't sloping strictly upward until the drop, but has smaller local variations. Although I'm not certain, I don't think memory use would show these small drops if there was no garbage collection going on.
There are minor and major collections in Java. Minor collections occur frequently, whereas major collections are rarer and diminish performance more. Minor collections probably tend to sweep up stuff like short-lived object instances created within methods. A major collection will remove a lot more, which is what probably happened at the end of your graph.
Now, some answers that were posted while I'm typing this give good explanations regarding the differences in garbage collectors, object generations and more. But that still doesn't explain why it would take so absurdly long (nearly 24 hours) before a serious cleaning is done.
Two things of interest that can be set for a JVM at startup are the maximum allowed heap size, and the initial heap size. The maximum is a hard limit, once you reach that, further garbage collection doesn't reduce memory usage and if you need to allocate new space for objects or other data, you'll get an OutOfMemoryError. However, internally there's a soft limit as well: the current heap size. A JVM doesn't immediately gobble up the maximum amount of memory. Instead, it starts at your initial heap size and then increases the heap when it's needed. Think of it a bit as the RAM of your JVM, that can increase dynamically.
If the actual memory use of your application starts to reach the current heap size, a garbage collection will typically be instigated. This might reduce the memory use, so an increase in heap size isn't needed. But it's also possible that the application currently does need all that memory and would exceed the heap size. In that case, it is increased provided that it hasn't already reached the maximum set limit.
Now, what might be your case is that the initial heap size is set to the same value as the maximum. Suppose that would be so, then the JVM will immediately seize all that memory. It will take a very long time before the application has accumulated enough garbage to reach the heap size in memory usage. But at that moment you'll see a large collection. Starting with a small enough heap and allowing it to grow keeps the memory use limited to what's needed.
This is assuming that your graph shows heap use and not allocated heap size. If that's not the case and you are actually seeing the heap itself grow like this, something else is going on. I'll admit I'm not savvy enough regarding the internals of garbage collection and its scheduling to be absolutely certain of what's happening here, most of this is from observation of leaking applications in profilers. So if I've provided faulty info, I'll take this answer down.

As you might have noticed, this does not affect you. The garbage collection only kicks in if the JVM feels there is a need for it to run and this happens for the sake of optimization, there's no use of doing many small collections if you can make a single full collection and do a full cleanup.
The current JVM contains some really interesting algorithms and the garbage collection itself id divided into 3 different regions, you can find a lot more about this here, here's a sample:
Three types of collection algorithms
The HotSpot JVM provides three GC algorithms, each tuned for a specific type of collection within a specific generation. The copy (also known as scavenge) collection quickly cleans up short-lived objects in the new generation heap. The mark-compact algorithm employs a slower, more robust technique to collect longer-lived objects in the old generation heap. The incremental algorithm attempts to improve old generation collection by performing robust GC while minimizing pauses.
Copy/scavenge collection
Using the copy algorithm, the JVM reclaims most objects in the new generation object space (also known as eden) simply by making small scavenges -- a Java term for collecting and removing refuse. Longer-lived objects are ultimately copied, or tenured, into the old object space.
Mark-compact collection
As more objects become tenured, the old object space begins to reach maximum occupancy. The mark-compact algorithm, used to collect objects in the old object space, has different requirements than the copy collection algorithm used in the new object space.
The mark-compact algorithm first scans all objects, marking all reachable objects. It then compacts all remaining gaps of dead objects. The mark-compact algorithm occupies more time than the copy collection algorithm; however, it requires less memory and eliminates memory fragmentation.
Incremental (train) collection
The new generation copy/scavenge and the old generation mark-compact algorithms can't eliminate all JVM pauses. Such pauses are proportional to the number of live objects. To address the need for pauseless GC, the HotSpot JVM also offers incremental, or train, collection.
Incremental collection breaks up old object collection pauses into many tiny pauses even with large object areas. Instead of just a new and an old generation, this algorithm has a middle generation comprising many small spaces. There is some overhead associated with incremental collection; you might see as much as a 10-percent speed degradation.
The -Xincgc and -Xnoincgc parameters control how you use incremental collection. The next release of HotSpot JVM, version 1.4, will attempt continuous, pauseless GC that will probably be a variation of the incremental algorithm. I won't discuss incremental collection since it will soon change.
This generational garbage collector is one of the most efficient solutions we have for the problem nowadays.

I had an app that produced a graph like that and acted as you describe. I was using the CMS collector (-XX:+UseConcMarkSweepGC). Here is what was going on in my case.
I did not have enough memory configured for the application, so over time I was running into fragmentation problems in the heap. This caused GCs with greater and greater frequency, but it did not actually throw an OOME or fail out of CMS to the serial collector (which it is supposed to do in that case) because the stats it keeps only count application paused time (GC blocks the world), application concurrent time (GC runs with application threads) is ignored for those calculations. I tuned some parameters, mainly gave it a whole crap load more heap (with a very large new space), set -XX:CMSFullGCsBeforeCompaction=1, and the problem stopped occurring.

Probably you do have memory leaks that's cleared every 24 hours.

Why is the maximum size of the Java heap fixed?

It is not possible to increase the maximum size of Java's heap after the VM has started. What are the technical reasons for this? Do the garbage collection algorithms depend on having a fixed amount of memory to work with? Or is it for security reasons, to prevent a Java application from DOS'ing other applications on the system by consuming all available memory?

In Sun's JVM, last I knew, the entire heap must be allocated in a contiguous address space. I imagine that for large heap values, it's pretty hard to add to your address space after startup while ensuring it stays contiguous. You probably need to get it at startup, or not at all. Thus, it is fixed.
Even if it isn't all used immediately, the address space for the entire heap is reserved at startup. If it cannot reserve a large enough contiguous block of address space for the value of -Xmx that you pass it, it will fail to start. This is why it's tough to allocate >1.4GB heaps on 32-bit Windows - because it's hard to find contiguous address space in that size or larger, since some DLLs like to load in certain places, fragmenting the address space. This isn't really an issue when you go 64-bit, since there is so much more address space.
This is almost certainly for performance reasons. I could not find a terrific link detailing this further, but here is a pretty good quote from Peter Kessler (full link - be sure to read the comments) that I found when searching. I believe he works on the JVM at Sun.
The reason we need a contiguous memory
region for the heap is that we have a
bunch of side data structures that are
indexed by (scaled) offsets from the
start of the heap. For example, we
track object reference updates with a
"card mark array" that has one byte
for each 512 bytes of heap. When we
store a reference in the heap we have
to mark the corresponding byte in the
card mark array. We right shift the
destination address of the store and
use that to index the card mark array.
Fun addressing arithmetic games you
can't do in Java that you get to (have
to :-) play in C++.
This was in 2004 - I'm not sure what's changed since then, but I am pretty sure it still holds. If you use a tool like Process Explorer, you can see that the virtual size (add the virtual size and private size memory columns) of the Java application includes the total heap size (plus other required space, no doubt) from the point of startup, even though the memory 'used' by the process will be no where near that until the heap starts to fill up...

Historically there has been a reason for this limitiation, which was not to allow Applets in the browser to eat up all of the users memory. The Microsoft VM which never had such a limitiation actually allowed to do this which could lead to some sort of Denial of Service attack against the users computer. It was only a year ago that Sun introduced in the 1.6.0 Update 10 VM a way to let applets specify how much memory they want (limited to a certain fixed share of the physical memory) instead of always limiting them to 64MB even on computers that have 8GB or more available.
Now since the JVM has evolved it should have been possible to get rid of this limitation when the VM is not running inside a browser, but Sun obviously never considered it such a high priority issue even though there have been numerous bug reports been filed to finally allow the heap to grow.

I think the short, snarky, answer is because Sun hasn't found it worth the time and cost to develop.
The most compelling use case for such a feature is on the desktop, IMO, and Java has always been a disaster on the desktop when it comes to the mechanics of launching the JVM. I suspect that those who think the most about those issues tend to focus on the server side and view any other details best left to native wrappers. It is an unfortunate decision, but it should just be one of the decision points when deciding on the right platform for an application.

My gut feel is that it has to do with memory management with respect to the other applications running on the operating system.
If you set the maximum heap size to, for example, the amount of RAM on the box you effectively let the VM decide how much memory it requires (up to this limit). The problem with this is that the VM could effectively cripple the machine it is running on because it will take over all the memory on the box before it decides that it needs to garbage collect.
When you specify max heap size, what you're saying to the VM is, you are allowed to use this amount of memory before you need to start garbage collecting. You cannot have more because if you take more then the other applications running on the box will slow down and you will start swapping to the disk if you use more than this.
Also be aware that they are two values with respect to memory, that is "current heap size" and "max heap size". The current heap size is how much memory the heap size is currently using and, if it requires more it can resize the heap but it cannot resize the heap above the value of maximum heap size.

From IBM's performance tuning tips (so may not be directly applicable to Sun's VMs)
The Java heap parameters influence the behavior of garbage collection. Increasing the heap size supports more object creation. Because a large heap takes longer to fill, the application runs longer before a garbage collection occurs. However, a larger heap also takes longer to compact and causes garbage collection to take longer.
The JVM has thresholds it uses to manage the JVM's storage. When the thresholds are reached, the garbage collector gets invoked to free up unused storage. Therefore, garbage collection can cause significant degradation of Java performance. Before changing the initial and maximum heap sizes, you should consider the following information:
In the majority of cases you should set the maximum JVM heap size to value higher than the initial JVM heap size. This allows for the JVM to operate efficiently during normal, steady state periods within the confines of the initial heap but also to operate effectively during periods of high transaction volume by expanding the heap up to the maximum JVM heap size. In some rare cases where absolute optimal performance is required you might want to specify the same value for both the initial and maximum heap size. This will eliminate some overhead that occurs when the JVM needs to expand or contract the size of the JVM heap. Make sure the region is large enough to hold the specified JVM heap.
Beware of making the Initial Heap Size too large. While a large heap size initially improves performance by delaying garbage collection, a large heap size ultimately affects response time when garbage collection eventually kicks in because the collection process takes more time.
So, I guess the reason that you can't change the value at runtime is because it may not help: either you have enough space in your heap or you don't. Once you run out, a GC cycle will be triggered. If that doesn't free up the space, you're stuffed anyway. You'd need to catch the OutOfMemoryException, increase the heap size, and then retry you calculation, hoping that this time you have enough memory.
In general the VM won't use the maximum heap size unless you need it, so if you think you might need to expand the memory at runtime, you could just specify a large maximum heap size.
I admit that's all a bit unsatisfying, and seems a bit lazy, since I can imagine a reasonable garbage collection strategy which would increase the heap size when GC fails to free enough space. Whether my imagination translates to a high performance GC implementation is another matter though ;)

Question about java garbage collection

I have this class and I'm testing insertions with different data distributions. I'm doing this in my code:
...
AVLTree tree = new AVLTree();
//insert the data from the first distribution
//get results
...
tree = new AVLTree();
//inser the data from the next distribution
//get results
...
I'm doing this for 3 distributions. Each one should be tested an average of 14 times, and the 2 lowest/highest values removed from to compute the average. This should be done 2000 times, each time for 1000 elements. In other words, it goes 1000, 2000, 3000, ..., 2000000.
The problem is, I can only get as far as 100000. When I tried 200000, I ran out of heap space. I increased the available heap space with -Xmx in the command line to 1024m and it didn't even complete the tests with 200000. I tried 2048m and again, it wouldn't work.
What I'm thinking is that the garbage collector isn't getting rid of the old trees once I do tree = new AVL Tree(). But why? I thought that the elements from the old trees would no longer be accessible and their memory would be cleaned up.

The garbage collector should have no trouble cleaning up your old tree objects, so I can only assume there's some other allocation that you're doing that's not being cleaned up.

Java has a good tool to watch the GC in progress (or not in your case), JVisualVM, which comes with the JDK.
Just run that and it will show you which objects are taking up the heap, and you can both trigger and see the progress of GC's. Then you can target those for pools so they can be re-used by you, saving the GC the work.
Also look into this option, which will probably stop the error you're getting that stops the program, and you program will finish, but it may take a long time because your app will fill up the heap then run very slowly.
-XX:-UseGCOverheadLimit

Which JVM you are using and what JVM parameters you have used to configure GC?
Your explaination shows there is a memory leak in your code. If you have any tool like jprofiler then use it to find out where is the memory leak.

There's no reason those trees shouldn't be collected, although I'd expect that before you ran out of memory you should see long pauses as the system ran a full GC. As it's been noted here that that's not what you're seeing, you could try running with flags like -XX:-PrintGC, -XX:-PrintGCDetails,-XX:-PrintGCTimeStamps to give you some more information on exactly what's going on, along with perhaps some sort of running count of roughly where you are. You could also explicitly tell the garbage collector to use a different garbage-collection algorithm.
However, it still seems unlikely to me. What other code is running? is it possible there's something in the AVLTree class itself that's keeping its instances from being GC'd? What about manually logging the finalize() on that class to insure that (some of them, at least) are collectible (e.g. make a few and manually call System.gc())?
GC params here, a nice ref on garbage collection from sun here that's well worth reading.

The Java garbage collector isn't guaranteed to garbage collect after each object's refcount becomes zero. So if you're writing code that is only creating and deleting a lot of objects, it's possible to expend all of the heap space before the gc has a chance to run. Alternatively, Pax's suggestion that there is a memory leak in your code is also a strong possibility.
If you are only doing benchmarking, then you may want to use the java gc function (in the System class I think) between tests, or even re-run you program for each distribution.

We noticed this in a server product. When making a lot of tiny objects that quickly get thrown away, the garbage collector can't keep up. The problem is more pronounced when the tiny objects have pointers to larger objects (e.g. an object that points to a large char[]). The GC doesn't seem to realize that if it frees up the tiny object, it can then free the larger object. Even when calling System.gc() directly, this was still a huge problem (both in 1.5 and 1.6 VMs)!
What we ended up doing and what I recommend to you is to maintain a pool of objects. When your object is no longer needed, throw it into the pool. When you need a new object, grab one from the pool or allocate a new one if the pool is empty. This will also save a small amount of time over pure allocation because Java doesn't have to clear (bzero) the object.
If you're worried about the pool getting too large (and thus wasting memory), you can either remove an arbitrary number of objects from the pool on a regular basis, or use weak references (for example, using java.util.WeakHashMap). One of the advantages of using a pool is that you can track the allocation frequency and totals, and you can adjust things accordingly.
We're using pools of char[] and byte[], and we maintain separate "bins" of sizes in the pool (for example, we always allocate arrays of size that are powers of two). Our product does a lot of string building, and using pools showed significant performance improvements.
Note: In general, the GC does a fine job. We just noticed that with small objects that point to larger structures, the GC doesn't seem to clean up the objects fast enough especially when the VM is under CPU load. Also, System.gc() is just a hint to help schedule the finalizer thread to do more work. Calling it too frequently causes a significant performance hit.

Given that you're just doing this for testing purposes, it might just be good housekeeping to invoke the garbage collector directly using System.gc() (thus forcing it to make a pass). It won't help you if there is a memory leak, but if there isn't, it might buy you back enough memory to get through your test.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.