As part of a memory analysis, we've found the following:
percent live alloc'ed stack class
rank self accum bytes objs bytes objs trace name
3 3.98% 19.85% 24259392 808 3849949016 1129587 359697 byte[]
4 3.98% 23.83% 24259392 808 3849949016 1129587 359698 byte[]
You'll notice that many objects are allocated, but few remain live. This is for a simple reason - the two byte arrays are allocated for each instance of a "client" that is generated. Clients are not reusable - each one can only handle one request and is then thrown away. The byte arrays always have the same size (30000).
We're considering moving to a pool (apache's GenericObjectPool) of byte arrays, as normally there are a known number of active clients at any given moment (so the pool size shouldn't fluctuate much). This way, we can save on memory allocation and garbage collection. The question is, would the pool cause a severe CPU hit? Is this idea a good idea at all?
Thanks for your help!
I think there are good gc related reasons to avoid this sort of allocation behaviour. Depending on the size of the heap & the free space in eden at the time of allocation, simply allocating a 30000 element byte[] could be a serious performance hit given that it could easily be bigger than the TLAB (hence allocation is not a bump the pointer event) & there may even not be enough space in eden available hence allocation directly into tenured which in itself likely to cause another hit down the line due to increased full gc activity (particularly if using cms due to fragmentation).
Having said that, the comments from fdreger are completely valid too. A multithreaded object pool is a bit of a grim thing that is likely to cause headaches. You mention they handle a single request only, if this request is serviced by a single thread only then a ThreadLocal byte[] that is wiped at the end of the request could be a good option. If the request is short lived relatively to your typical young gc period then the young->old reference issue may not be a big problem (as the probability of any given request being handled during a gc is small even if you're guaranteed to get this periodically).
Probably pooling will not help you much if at all - possibly it will make things worse, although it depends on a number of factors (what GC are you using, how long the objects live, how much memory is available, etc.):
The time of GC depends mostly on the number of live objects. Collector (I assume you run a vanilla Java JRE) does not visit dead objects and does not deallocate them one by one. It frees whole areas of memory after copying the live objects away (this keeps memory neat and compacted). 100 dead objects can collect as fast as 100000. On the other hand, all the live objects must be copied - so if you, say, have a pool of 100 objects and only 50 are used at a given time, keeping the unused object is going to cost you.
If your arrays currently tend to live shorter than the time needed to get tenured (copied to the old generation space), there is another problem: your pooled arrays will certainly live long enough. This will produce a situation where there is a lot of references from old generation to young - and GCs are optimized with a reverse situation in mind.
Actually it is quite possible that pooling arrays will make your GC SLOWER than creating new ones; this is usually the case with cheap objects.
Another cost of pooling comes from synchronizing objects across threads and cleaning them up after use. Both are trickier than they sound.
Summing up, unless you are well aware of the internals of your GC and understand how it works under the hood, AND have a results from a profiler that show that managing all the arrays is a bottleneck - DO NOT POOL. In most cases it is a bad idea.
If garbage collection in your case is really a performance hit (often cleaning up the eden space does not take much time if not many objects survive), and it is easy to plug in the object pool, try it, and measure it.
This certainly depends on your application's need.
The pool would work out much better as long as you always have a reference to it, this way the garbage collector simply ignores the pool and will only be declared once (you could always declare it static to be on the safe side). Although it would be persistent memory but I doubt that will be a problem for your application.
Related
For example in a service adapter you might:
a. have an input data model and an output data model, maybe even immutable, with different classes and use Object Mappers to transform between classes and create some short-lived objects along the way
b. have a single data model, some of the classes might be mutable, but the same object that was created for the input is also sent as output
There are other use-cases when you'd have to choose between clear code with many objects and less clear code with less objects and I would like to know if Garbage Collection still has a weight in this decision.
I should make this a comment as IMO it does not qualify as an answer, but it will not fit.
Even if the answer(s) are going to most probably be - do whatever makes your code more readable (and to be honest I still follow that all the time); we have faced this issue of GC in our code base.
Suppose that you want to create a graph of users (we had to - around 1/2 million) and load all their properties in memory and do some aggregations on them and filtering, etc. (it was not my decision), because these graph objects where pretty heavy - once loaded even with 16GB of heap the JVM would fail with OOM or GC would take huge pauses. And it's understandable - lots of data requires lots of memory, you can't run away from it. The solution proposed and that actually worked was to model that with simple BitSets - where each bit would be a property and a potential linkage to some other data; this is by far not readable and extremely complicated to maintain to this day. Lots of shifts, lots of intrinsics of the data - you have to know at all time what the 3-bit means for example, there's no getter for usernameIncome let's say - you have to do quite a lot shifts and map that to a search table, etc. But it would keep the GC pretty low, at least in the ranges where we were OK with that.
So unless you can prove that GC is taken your app time so much - you probably are even safer simply adding more RAM and increasing it(unless you have a leak). I would still go for clear code like 99.(99) % of the time.
Newer versions of Java have quite sophisticated mechanisms to handle very short-living objects so it's not as bad as it was in the past. With a modern JVM I'd say that you don't need to worry about garbage collection times if you create many objects, which is a good thing since there are now many more of them being created on the fly that this was the case with older versions of Java.
What's still valid is to keep the number of created objects low if the creation is coming with high costs, e.g. accessing a database to retrieve data from, network operations, etc.
As other people have said I think it's better to write your code to solve the problem in an optimum way for that problem rather than thinking about what the garbage collector (GC) will do.
The key to working with the GC is to look at the lifespan of your objects. The heap is (typically) divided into two main regions called generations to signify how long objects have been alive (thus young and old generations). To minimise the impact of GC you want your objects to become eligible for collection while they are still in the young generation (either in the Eden space or a survivor space, but preferably Eden space). Collection of objects in the Eden space is effectively free, as the GC does nothing with them, it just ignores them and resets the allocation pointer(s) when a minor GC is finished.
Rather than explicitly calling the GC via System.gc() it's much better to tune your heap. For example, you can set the size of the young generation using command line options like -XX:NewRatio=n, where n signifies the ratio of new to old (e.g. setting it to 3 will make the ratio of new:old 1:3 so the young generation will be 1 quarter of the heap). Alternatively, you can set the size explicitly using -XX:NewSize=n and -XX:MaxNewSize=m. The GC may resize the heap during collections so setting these values to be the same will keep it at a fixed size.
You can profile your code to establish the rate of object creation and how long your objects typically live for. This will give you the information to (ideally) configure your heap to minimise the number of objects being promoted into the old generation. What you really don't want is objects being promoted and then becoming garbage shortly thereafter.
Alternatively, you may want to look at the Zing JVM from Azul (full disclosure, I work for them). This uses a different GC algorithm, called C4, which enables compaction of the heap concurrently with application threads and so eliminates most of the impact of the GC on application latency.
I know that creating an object takes time and that's why the flyweight pattern exists.
What I would like to know is what increases the time of creating a single object the most?
I thought it might be the search of the a slightly larger space in the memory, but I guess it is only slightly larger than each of the fields the object has. Then maybe it is the travel to the correct address in the memory while we are looking for a value of a specific field, but then again: the only thing we added is looking for the address of the object.
There are 3 ways object creation is costly:
1) the object allocation. This is actually pretty cheap (like some nanos), however take into account that
many objects have "embedded" objects which are implicitely also allocated and
Additionally often the time of the constructor running (initializing the object) is more costly than the actual allocation.
2) any allocation consumes Eden space, so the higher the allocation rate, the more CPU is consumed by GC (NewGen GC runs more frequent)
3) CPU caches. If you allocate temporary objects (e.g. Integer when putting to HashMap, those temp objects are are put in the L1 cache evicting some other data. If you use it only once, this does not payoff. Therefore high allocation rate (especially temporarys/immutables) lead to cache misses, causing significant slowdown in case (depending on what the app is actually trying to achieve).
Another issue is life cycle. The VM can handle best short lived or very long lived objects. If your application creates a lot of middle-age-dying objects (e.g. cache's), you will get more frequent Full GC's.
Regarding flyweight patterns. It depends. If its a very smallish object, flyweight frequently will not pay off. However if your usage patterns involves many allocations of the flyweight candidate obejct, flyweight'ing will pay off. That's the reason hotspot caches 10.000 Integer objects internally by default
In modern JVMs the object creation is not as costly as it were. It mostly needs to bump the pointer. In fact, in modern JVMs, many objects are actually secretly allocated on the machine stack, and that's basically free- it takes no time at all.
And regarding flyweight pattern: flyweight pattern is not used as the object creation is costly rather it is used to minimize memory use by sharing as much data as possible with other similar objects.
Imagine a stateless web application that processes many requests, say a thousand per sec. All data that we create inside the processing cycle will be immediately discarded at the end. The app will use serialization/deserialization a lot, so we create and discard about 50-200kb on each request.
I think it could put a big pressure on garbage collector, because GC will have to discard a large amount of short-living objects. Does it make sense to implement a pool of byte arrays to reuse them for serialization/deserialization purposes? Did anybody have such experience?
The garbage collection mechanism is built on the premise that a lot of objects created exist for a very short period of time. The first pool of objects is called the Eden Space (see this SO answer for the origin of the name) and is monitored regularly. So I would expect the garbage collector would be able to handle this.
Like most (all?) performance questions, I would measure your particular use case before applying premature optimisations. Numerous configurations are available for tuning garbage collection including parallel GC strategies (noting your 'stop the world' comment below)
I am building a Java web app, using the Play! Framework. I'm hosting it on playapps.net. I have been puzzling for a while over the provided graphs of memory consumption. Here is a sample:
The graph comes from a period of consistent but nominal activity. I did nothing to trigger the falloff in memory, so I presume this occurred because the garbage collector ran as it has almost reached its allowable memory consumption.
My questions:
Is it fair for me to assume that my application does not have a memory leak, as it appears that all the memory is correctly reclaimed by the garbage collector when it does run?
(from the title) Why is java waiting until the last possible second to run the garbage collector? I am seeing significant performance degradation as the memory consumption grows to the top fourth of the graph.
If my assertions above are correct, then how can I go about fixing this issue? The other posts I have read on SO seem opposed to calls to System.gc(), ranging from neutral ("it's only a request to run GC, so the JVM may just ignore you") to outright opposed ("code that relies on System.gc() is fundamentally broken"). Or am I off base here, and I should be looking for defects in my own code that is causing this behavior and intermittent performance loss?
UPDATE
I have opened a discussion on PlayApps.net pointing to this question and mentioning some of the points here; specifically #Affe's comment regarding the settings for a full GC being set very conservatively, and #G_H's comment about settings for the initial and max heap size.
Here's a link to the discussion, though you unfortunately need a playapps account to view it.
I will report the feedback here when I get it; thanks so much everyone for your answers, I've already learned a great deal from them!
Resolution
Playapps support, which is still great, didn't have many suggestions for me, their only thought being that if I was using the cache extensively this may be keeping objects alive longer than need be, but that isn't the case. I still learned a ton (woo hoo!), and I gave #Ryan Amos the green check as I took his suggestion of calling System.gc() every half day, which for now is working fine.
Any detailed answer is going to depend on which garbage collector you're using, but there are some things that are basically the same across all (modern, sun/oracle) GCs.
Every time you see the usage in the graph go down, that is a garbage collection. The only way heap gets freed is through garbage collection. The thing is there are two types of garbage collections, minor and full. The heap gets divided into two basic "areas." Young and tenured. (There are lots more subgroups in reality.) Anything that is taking up space in Young and is still in use when the minor GC comes along to free up some memory, is going to get 'promoted' into tenured. Once something makes the leap into tenured, it sits around indefinitely until the heap has no free space and a full garbage collection is necessary.
So one interpretation of that graph is that your young generation is fairly small (by default it can be a fairly small % of total heap on some JVMs) and you're keeping objects "alive" for comparatively very long times. (perhaps you're holding references to them in the web session?) So your objects are 'surviving' garbage collections until they get promoted into tenured space, where they stick around indefinitely until the JVM is well and good truly out of memory.
Again, that's just one common situation that fits with the data you have. Would need full details about the JVM configuration and the GC logs to really tell for sure what's going on.
Java won't run the garbage cleaner until it has to, because the garbage cleaner slows things down quite a bit and shouldn't be run that frequently. I think you would be OK to schedule a cleaning more frequently, such as every 3 hours. If an application never consumes full memory, there should be no reason to ever run the garbage cleaner, which is why Java only runs it when the memory is very high.
So basically, don't worry about what others say: do what works best. If you find performance improvements from running the garbage cleaner at 66% memory, do it.
I am noticing that the graph isn't sloping strictly upward until the drop, but has smaller local variations. Although I'm not certain, I don't think memory use would show these small drops if there was no garbage collection going on.
There are minor and major collections in Java. Minor collections occur frequently, whereas major collections are rarer and diminish performance more. Minor collections probably tend to sweep up stuff like short-lived object instances created within methods. A major collection will remove a lot more, which is what probably happened at the end of your graph.
Now, some answers that were posted while I'm typing this give good explanations regarding the differences in garbage collectors, object generations and more. But that still doesn't explain why it would take so absurdly long (nearly 24 hours) before a serious cleaning is done.
Two things of interest that can be set for a JVM at startup are the maximum allowed heap size, and the initial heap size. The maximum is a hard limit, once you reach that, further garbage collection doesn't reduce memory usage and if you need to allocate new space for objects or other data, you'll get an OutOfMemoryError. However, internally there's a soft limit as well: the current heap size. A JVM doesn't immediately gobble up the maximum amount of memory. Instead, it starts at your initial heap size and then increases the heap when it's needed. Think of it a bit as the RAM of your JVM, that can increase dynamically.
If the actual memory use of your application starts to reach the current heap size, a garbage collection will typically be instigated. This might reduce the memory use, so an increase in heap size isn't needed. But it's also possible that the application currently does need all that memory and would exceed the heap size. In that case, it is increased provided that it hasn't already reached the maximum set limit.
Now, what might be your case is that the initial heap size is set to the same value as the maximum. Suppose that would be so, then the JVM will immediately seize all that memory. It will take a very long time before the application has accumulated enough garbage to reach the heap size in memory usage. But at that moment you'll see a large collection. Starting with a small enough heap and allowing it to grow keeps the memory use limited to what's needed.
This is assuming that your graph shows heap use and not allocated heap size. If that's not the case and you are actually seeing the heap itself grow like this, something else is going on. I'll admit I'm not savvy enough regarding the internals of garbage collection and its scheduling to be absolutely certain of what's happening here, most of this is from observation of leaking applications in profilers. So if I've provided faulty info, I'll take this answer down.
As you might have noticed, this does not affect you. The garbage collection only kicks in if the JVM feels there is a need for it to run and this happens for the sake of optimization, there's no use of doing many small collections if you can make a single full collection and do a full cleanup.
The current JVM contains some really interesting algorithms and the garbage collection itself id divided into 3 different regions, you can find a lot more about this here, here's a sample:
Three types of collection algorithms
The HotSpot JVM provides three GC algorithms, each tuned for a specific type of collection within a specific generation. The copy (also known as scavenge) collection quickly cleans up short-lived objects in the new generation heap. The mark-compact algorithm employs a slower, more robust technique to collect longer-lived objects in the old generation heap. The incremental algorithm attempts to improve old generation collection by performing robust GC while minimizing pauses.
Copy/scavenge collection
Using the copy algorithm, the JVM reclaims most objects in the new generation object space (also known as eden) simply by making small scavenges -- a Java term for collecting and removing refuse. Longer-lived objects are ultimately copied, or tenured, into the old object space.
Mark-compact collection
As more objects become tenured, the old object space begins to reach maximum occupancy. The mark-compact algorithm, used to collect objects in the old object space, has different requirements than the copy collection algorithm used in the new object space.
The mark-compact algorithm first scans all objects, marking all reachable objects. It then compacts all remaining gaps of dead objects. The mark-compact algorithm occupies more time than the copy collection algorithm; however, it requires less memory and eliminates memory fragmentation.
Incremental (train) collection
The new generation copy/scavenge and the old generation mark-compact algorithms can't eliminate all JVM pauses. Such pauses are proportional to the number of live objects. To address the need for pauseless GC, the HotSpot JVM also offers incremental, or train, collection.
Incremental collection breaks up old object collection pauses into many tiny pauses even with large object areas. Instead of just a new and an old generation, this algorithm has a middle generation comprising many small spaces. There is some overhead associated with incremental collection; you might see as much as a 10-percent speed degradation.
The -Xincgc and -Xnoincgc parameters control how you use incremental collection. The next release of HotSpot JVM, version 1.4, will attempt continuous, pauseless GC that will probably be a variation of the incremental algorithm. I won't discuss incremental collection since it will soon change.
This generational garbage collector is one of the most efficient solutions we have for the problem nowadays.
I had an app that produced a graph like that and acted as you describe. I was using the CMS collector (-XX:+UseConcMarkSweepGC). Here is what was going on in my case.
I did not have enough memory configured for the application, so over time I was running into fragmentation problems in the heap. This caused GCs with greater and greater frequency, but it did not actually throw an OOME or fail out of CMS to the serial collector (which it is supposed to do in that case) because the stats it keeps only count application paused time (GC blocks the world), application concurrent time (GC runs with application threads) is ignored for those calculations. I tuned some parameters, mainly gave it a whole crap load more heap (with a very large new space), set -XX:CMSFullGCsBeforeCompaction=1, and the problem stopped occurring.
Probably you do have memory leaks that's cleared every 24 hours.
I have this class and I'm testing insertions with different data distributions. I'm doing this in my code:
...
AVLTree tree = new AVLTree();
//insert the data from the first distribution
//get results
...
tree = new AVLTree();
//inser the data from the next distribution
//get results
...
I'm doing this for 3 distributions. Each one should be tested an average of 14 times, and the 2 lowest/highest values removed from to compute the average. This should be done 2000 times, each time for 1000 elements. In other words, it goes 1000, 2000, 3000, ..., 2000000.
The problem is, I can only get as far as 100000. When I tried 200000, I ran out of heap space. I increased the available heap space with -Xmx in the command line to 1024m and it didn't even complete the tests with 200000. I tried 2048m and again, it wouldn't work.
What I'm thinking is that the garbage collector isn't getting rid of the old trees once I do tree = new AVL Tree(). But why? I thought that the elements from the old trees would no longer be accessible and their memory would be cleaned up.
The garbage collector should have no trouble cleaning up your old tree objects, so I can only assume there's some other allocation that you're doing that's not being cleaned up.
Java has a good tool to watch the GC in progress (or not in your case), JVisualVM, which comes with the JDK.
Just run that and it will show you which objects are taking up the heap, and you can both trigger and see the progress of GC's. Then you can target those for pools so they can be re-used by you, saving the GC the work.
Also look into this option, which will probably stop the error you're getting that stops the program, and you program will finish, but it may take a long time because your app will fill up the heap then run very slowly.
-XX:-UseGCOverheadLimit
Which JVM you are using and what JVM parameters you have used to configure GC?
Your explaination shows there is a memory leak in your code. If you have any tool like jprofiler then use it to find out where is the memory leak.
There's no reason those trees shouldn't be collected, although I'd expect that before you ran out of memory you should see long pauses as the system ran a full GC. As it's been noted here that that's not what you're seeing, you could try running with flags like -XX:-PrintGC, -XX:-PrintGCDetails,-XX:-PrintGCTimeStamps to give you some more information on exactly what's going on, along with perhaps some sort of running count of roughly where you are. You could also explicitly tell the garbage collector to use a different garbage-collection algorithm.
However, it still seems unlikely to me. What other code is running? is it possible there's something in the AVLTree class itself that's keeping its instances from being GC'd? What about manually logging the finalize() on that class to insure that (some of them, at least) are collectible (e.g. make a few and manually call System.gc())?
GC params here, a nice ref on garbage collection from sun here that's well worth reading.
The Java garbage collector isn't guaranteed to garbage collect after each object's refcount becomes zero. So if you're writing code that is only creating and deleting a lot of objects, it's possible to expend all of the heap space before the gc has a chance to run. Alternatively, Pax's suggestion that there is a memory leak in your code is also a strong possibility.
If you are only doing benchmarking, then you may want to use the java gc function (in the System class I think) between tests, or even re-run you program for each distribution.
We noticed this in a server product. When making a lot of tiny objects that quickly get thrown away, the garbage collector can't keep up. The problem is more pronounced when the tiny objects have pointers to larger objects (e.g. an object that points to a large char[]). The GC doesn't seem to realize that if it frees up the tiny object, it can then free the larger object. Even when calling System.gc() directly, this was still a huge problem (both in 1.5 and 1.6 VMs)!
What we ended up doing and what I recommend to you is to maintain a pool of objects. When your object is no longer needed, throw it into the pool. When you need a new object, grab one from the pool or allocate a new one if the pool is empty. This will also save a small amount of time over pure allocation because Java doesn't have to clear (bzero) the object.
If you're worried about the pool getting too large (and thus wasting memory), you can either remove an arbitrary number of objects from the pool on a regular basis, or use weak references (for example, using java.util.WeakHashMap). One of the advantages of using a pool is that you can track the allocation frequency and totals, and you can adjust things accordingly.
We're using pools of char[] and byte[], and we maintain separate "bins" of sizes in the pool (for example, we always allocate arrays of size that are powers of two). Our product does a lot of string building, and using pools showed significant performance improvements.
Note: In general, the GC does a fine job. We just noticed that with small objects that point to larger structures, the GC doesn't seem to clean up the objects fast enough especially when the VM is under CPU load. Also, System.gc() is just a hint to help schedule the finalizer thread to do more work. Calling it too frequently causes a significant performance hit.
Given that you're just doing this for testing purposes, it might just be good housekeeping to invoke the garbage collector directly using System.gc() (thus forcing it to make a pass). It won't help you if there is a memory leak, but if there isn't, it might buy you back enough memory to get through your test.