I'm using Guava cache for my application and was wondering the what the default behavior would be if the maximumSize was not set. I understand the behavior when the maximumSize is set as it's explained in https://github.com/google/guava/wiki/CachesExplained#size-based-eviction.
But what happens when the maximumSize is not set and JVM run out of heap space? I assume that garbage collector will run and will free up space which means that entries will be dropped from the cache?
Under the covers a Cache is just a fancy Map, so it has similar space limitations. Like Map it can't contain more than than Integer.MAX_VALUE entries (since that's the return type of size()), so your theoretical upper-bound cache size is ~2 billion elements. You might also be interested in Guava's awesome element-cost analysis, which details the exact number of bytes used by different data structures.
Of course in practice the real concern isn't usually the number of elements in the cache (its size), but the amount of memory consumed by the objects being cached. This is independent from the cache's size - a single cached object could be large enough consume all your heap.
By default Cache doesn't do anything special in this case, and the JVM crashes. Most of the time this is what you want - dropping elements from the cache silently could break your program's assumptions.
If you really do want to drop entries as you approach out-of-memory conditions you can use soft references with CacheBuilder.softValues(). The JVM will attempt to garbage collect soft references when it's at risk of running out of free heap space. I would encourage you to only use this option as a last resort - the JVM has to do extra work to handle soft references, and needing to use them is often a hint that you could be doing something else a different way.
Related
For example in a service adapter you might:
a. have an input data model and an output data model, maybe even immutable, with different classes and use Object Mappers to transform between classes and create some short-lived objects along the way
b. have a single data model, some of the classes might be mutable, but the same object that was created for the input is also sent as output
There are other use-cases when you'd have to choose between clear code with many objects and less clear code with less objects and I would like to know if Garbage Collection still has a weight in this decision.
I should make this a comment as IMO it does not qualify as an answer, but it will not fit.
Even if the answer(s) are going to most probably be - do whatever makes your code more readable (and to be honest I still follow that all the time); we have faced this issue of GC in our code base.
Suppose that you want to create a graph of users (we had to - around 1/2 million) and load all their properties in memory and do some aggregations on them and filtering, etc. (it was not my decision), because these graph objects where pretty heavy - once loaded even with 16GB of heap the JVM would fail with OOM or GC would take huge pauses. And it's understandable - lots of data requires lots of memory, you can't run away from it. The solution proposed and that actually worked was to model that with simple BitSets - where each bit would be a property and a potential linkage to some other data; this is by far not readable and extremely complicated to maintain to this day. Lots of shifts, lots of intrinsics of the data - you have to know at all time what the 3-bit means for example, there's no getter for usernameIncome let's say - you have to do quite a lot shifts and map that to a search table, etc. But it would keep the GC pretty low, at least in the ranges where we were OK with that.
So unless you can prove that GC is taken your app time so much - you probably are even safer simply adding more RAM and increasing it(unless you have a leak). I would still go for clear code like 99.(99) % of the time.
Newer versions of Java have quite sophisticated mechanisms to handle very short-living objects so it's not as bad as it was in the past. With a modern JVM I'd say that you don't need to worry about garbage collection times if you create many objects, which is a good thing since there are now many more of them being created on the fly that this was the case with older versions of Java.
What's still valid is to keep the number of created objects low if the creation is coming with high costs, e.g. accessing a database to retrieve data from, network operations, etc.
As other people have said I think it's better to write your code to solve the problem in an optimum way for that problem rather than thinking about what the garbage collector (GC) will do.
The key to working with the GC is to look at the lifespan of your objects. The heap is (typically) divided into two main regions called generations to signify how long objects have been alive (thus young and old generations). To minimise the impact of GC you want your objects to become eligible for collection while they are still in the young generation (either in the Eden space or a survivor space, but preferably Eden space). Collection of objects in the Eden space is effectively free, as the GC does nothing with them, it just ignores them and resets the allocation pointer(s) when a minor GC is finished.
Rather than explicitly calling the GC via System.gc() it's much better to tune your heap. For example, you can set the size of the young generation using command line options like -XX:NewRatio=n, where n signifies the ratio of new to old (e.g. setting it to 3 will make the ratio of new:old 1:3 so the young generation will be 1 quarter of the heap). Alternatively, you can set the size explicitly using -XX:NewSize=n and -XX:MaxNewSize=m. The GC may resize the heap during collections so setting these values to be the same will keep it at a fixed size.
You can profile your code to establish the rate of object creation and how long your objects typically live for. This will give you the information to (ideally) configure your heap to minimise the number of objects being promoted into the old generation. What you really don't want is objects being promoted and then becoming garbage shortly thereafter.
Alternatively, you may want to look at the Zing JVM from Azul (full disclosure, I work for them). This uses a different GC algorithm, called C4, which enables compaction of the heap concurrently with application threads and so eliminates most of the impact of the GC on application latency.
This is a memory data security question.
Does java garbage collection securely wipe out garbage data?
Apparently after a chunk of data is garbage-collected, I cannot retrieve it anymore, but can a hacker still memory-dump to retrieve the data?
As other users already mentioned here, JVMs don't clean memory securely after garbage collection because it would affect performance badly. That's why many programs (especially security libraries) use mutable structures instead of immutable (char arrays instead of strings etc) and clean data themselves when they are no more needed.
Unfortunately, even such approach doesn't always work. Let's look at this scenario:
You create a char array with a password.
JVM performs a garbage collection and moves your char array to another place in the memory, leaving the memory previously occupied by it intact, just marking it as a free block. So, we now have a 'dirty copy' of your password.
You are done working with your password and explicitly zero all the characters in your char array thinking that everything is safe now.
An attacker makes a dump of your memory and finds your password in the memory where it was placed the first time, before step 2.
I can think of only one possible solution for this problem:
Use G1 garbage collector.
Make your sensitive data a single block (array of primitive values) that is large enough to occupy more than half of the region size, used by G1 (by default, this size depends on the maximum heap size, but you can also specify it manually). That would force the collector to treat your data as so called 'humongous object'. Such objects are not moved in memory by G1 GC.
In such case when you erase some data in your block manually, you can be sure that no other 'dirty copies' of the same data exist somewhere in the heap.
Another solution would be to use off-heap data that you can handle manually as you like, but that wouldn't be pure Java.
This depends on the JVM implementation and possibly options within it but I would assume that it won't clear the data. Garbage collection needs only track which areas are available. Setting all of that data to 0 or something else is a lot of unecessary writes. It's for this reason you will often see APIs use a char array for passwords instead of Strings.
Specifically Oracle JVM won't clear the space, it only copies data between Eden and Survivor spaces, objects that are no longer used just stay there as a garbage that will be overwritten eventually. Similar thing happens in the OldGen, some places are marked as used, and when object becomes eligible for garbage collection, the place it occupied is marked as not used. It will also be overwritten eventaully, given enough application time.
I need to get rough estimation of memory usage of my Infinispan cache ( which is implemented using version 5.3.0) - ( For learning purposes )
Since there is no easy way to do this, I came up with following procedure.
Add cache listener to listen cache put/remove events and log the size of inserted entry using jamm library which uses java.lang.instrument.Instrumentation.getObjectSize. But i'm little skeptical about it, whether it returns right memory usage for cache. Am I doing this measurement correctly ? Am I missing something here or do I need consider more factors to do this ?
If you don't need real-time information, the easiest way to find the actual memory usage is to take a heap dump and look at the cache objects' retained sizes (e.g. with Eclipse MAT).
Your method is going to ignore the overhead of Infinispan's internal structures. Normally, the per-entry overhead should be somewhere around 150 bytes, but sometimes it can be quite big - e.g. when you enable eviction and Infinispan allocates structures based on the configured size (https://issues.jboss.org/browse/ISPN-4126).
I am building a Java web app, using the Play! Framework. I'm hosting it on playapps.net. I have been puzzling for a while over the provided graphs of memory consumption. Here is a sample:
The graph comes from a period of consistent but nominal activity. I did nothing to trigger the falloff in memory, so I presume this occurred because the garbage collector ran as it has almost reached its allowable memory consumption.
My questions:
Is it fair for me to assume that my application does not have a memory leak, as it appears that all the memory is correctly reclaimed by the garbage collector when it does run?
(from the title) Why is java waiting until the last possible second to run the garbage collector? I am seeing significant performance degradation as the memory consumption grows to the top fourth of the graph.
If my assertions above are correct, then how can I go about fixing this issue? The other posts I have read on SO seem opposed to calls to System.gc(), ranging from neutral ("it's only a request to run GC, so the JVM may just ignore you") to outright opposed ("code that relies on System.gc() is fundamentally broken"). Or am I off base here, and I should be looking for defects in my own code that is causing this behavior and intermittent performance loss?
UPDATE
I have opened a discussion on PlayApps.net pointing to this question and mentioning some of the points here; specifically #Affe's comment regarding the settings for a full GC being set very conservatively, and #G_H's comment about settings for the initial and max heap size.
Here's a link to the discussion, though you unfortunately need a playapps account to view it.
I will report the feedback here when I get it; thanks so much everyone for your answers, I've already learned a great deal from them!
Resolution
Playapps support, which is still great, didn't have many suggestions for me, their only thought being that if I was using the cache extensively this may be keeping objects alive longer than need be, but that isn't the case. I still learned a ton (woo hoo!), and I gave #Ryan Amos the green check as I took his suggestion of calling System.gc() every half day, which for now is working fine.
Any detailed answer is going to depend on which garbage collector you're using, but there are some things that are basically the same across all (modern, sun/oracle) GCs.
Every time you see the usage in the graph go down, that is a garbage collection. The only way heap gets freed is through garbage collection. The thing is there are two types of garbage collections, minor and full. The heap gets divided into two basic "areas." Young and tenured. (There are lots more subgroups in reality.) Anything that is taking up space in Young and is still in use when the minor GC comes along to free up some memory, is going to get 'promoted' into tenured. Once something makes the leap into tenured, it sits around indefinitely until the heap has no free space and a full garbage collection is necessary.
So one interpretation of that graph is that your young generation is fairly small (by default it can be a fairly small % of total heap on some JVMs) and you're keeping objects "alive" for comparatively very long times. (perhaps you're holding references to them in the web session?) So your objects are 'surviving' garbage collections until they get promoted into tenured space, where they stick around indefinitely until the JVM is well and good truly out of memory.
Again, that's just one common situation that fits with the data you have. Would need full details about the JVM configuration and the GC logs to really tell for sure what's going on.
Java won't run the garbage cleaner until it has to, because the garbage cleaner slows things down quite a bit and shouldn't be run that frequently. I think you would be OK to schedule a cleaning more frequently, such as every 3 hours. If an application never consumes full memory, there should be no reason to ever run the garbage cleaner, which is why Java only runs it when the memory is very high.
So basically, don't worry about what others say: do what works best. If you find performance improvements from running the garbage cleaner at 66% memory, do it.
I am noticing that the graph isn't sloping strictly upward until the drop, but has smaller local variations. Although I'm not certain, I don't think memory use would show these small drops if there was no garbage collection going on.
There are minor and major collections in Java. Minor collections occur frequently, whereas major collections are rarer and diminish performance more. Minor collections probably tend to sweep up stuff like short-lived object instances created within methods. A major collection will remove a lot more, which is what probably happened at the end of your graph.
Now, some answers that were posted while I'm typing this give good explanations regarding the differences in garbage collectors, object generations and more. But that still doesn't explain why it would take so absurdly long (nearly 24 hours) before a serious cleaning is done.
Two things of interest that can be set for a JVM at startup are the maximum allowed heap size, and the initial heap size. The maximum is a hard limit, once you reach that, further garbage collection doesn't reduce memory usage and if you need to allocate new space for objects or other data, you'll get an OutOfMemoryError. However, internally there's a soft limit as well: the current heap size. A JVM doesn't immediately gobble up the maximum amount of memory. Instead, it starts at your initial heap size and then increases the heap when it's needed. Think of it a bit as the RAM of your JVM, that can increase dynamically.
If the actual memory use of your application starts to reach the current heap size, a garbage collection will typically be instigated. This might reduce the memory use, so an increase in heap size isn't needed. But it's also possible that the application currently does need all that memory and would exceed the heap size. In that case, it is increased provided that it hasn't already reached the maximum set limit.
Now, what might be your case is that the initial heap size is set to the same value as the maximum. Suppose that would be so, then the JVM will immediately seize all that memory. It will take a very long time before the application has accumulated enough garbage to reach the heap size in memory usage. But at that moment you'll see a large collection. Starting with a small enough heap and allowing it to grow keeps the memory use limited to what's needed.
This is assuming that your graph shows heap use and not allocated heap size. If that's not the case and you are actually seeing the heap itself grow like this, something else is going on. I'll admit I'm not savvy enough regarding the internals of garbage collection and its scheduling to be absolutely certain of what's happening here, most of this is from observation of leaking applications in profilers. So if I've provided faulty info, I'll take this answer down.
As you might have noticed, this does not affect you. The garbage collection only kicks in if the JVM feels there is a need for it to run and this happens for the sake of optimization, there's no use of doing many small collections if you can make a single full collection and do a full cleanup.
The current JVM contains some really interesting algorithms and the garbage collection itself id divided into 3 different regions, you can find a lot more about this here, here's a sample:
Three types of collection algorithms
The HotSpot JVM provides three GC algorithms, each tuned for a specific type of collection within a specific generation. The copy (also known as scavenge) collection quickly cleans up short-lived objects in the new generation heap. The mark-compact algorithm employs a slower, more robust technique to collect longer-lived objects in the old generation heap. The incremental algorithm attempts to improve old generation collection by performing robust GC while minimizing pauses.
Copy/scavenge collection
Using the copy algorithm, the JVM reclaims most objects in the new generation object space (also known as eden) simply by making small scavenges -- a Java term for collecting and removing refuse. Longer-lived objects are ultimately copied, or tenured, into the old object space.
Mark-compact collection
As more objects become tenured, the old object space begins to reach maximum occupancy. The mark-compact algorithm, used to collect objects in the old object space, has different requirements than the copy collection algorithm used in the new object space.
The mark-compact algorithm first scans all objects, marking all reachable objects. It then compacts all remaining gaps of dead objects. The mark-compact algorithm occupies more time than the copy collection algorithm; however, it requires less memory and eliminates memory fragmentation.
Incremental (train) collection
The new generation copy/scavenge and the old generation mark-compact algorithms can't eliminate all JVM pauses. Such pauses are proportional to the number of live objects. To address the need for pauseless GC, the HotSpot JVM also offers incremental, or train, collection.
Incremental collection breaks up old object collection pauses into many tiny pauses even with large object areas. Instead of just a new and an old generation, this algorithm has a middle generation comprising many small spaces. There is some overhead associated with incremental collection; you might see as much as a 10-percent speed degradation.
The -Xincgc and -Xnoincgc parameters control how you use incremental collection. The next release of HotSpot JVM, version 1.4, will attempt continuous, pauseless GC that will probably be a variation of the incremental algorithm. I won't discuss incremental collection since it will soon change.
This generational garbage collector is one of the most efficient solutions we have for the problem nowadays.
I had an app that produced a graph like that and acted as you describe. I was using the CMS collector (-XX:+UseConcMarkSweepGC). Here is what was going on in my case.
I did not have enough memory configured for the application, so over time I was running into fragmentation problems in the heap. This caused GCs with greater and greater frequency, but it did not actually throw an OOME or fail out of CMS to the serial collector (which it is supposed to do in that case) because the stats it keeps only count application paused time (GC blocks the world), application concurrent time (GC runs with application threads) is ignored for those calculations. I tuned some parameters, mainly gave it a whole crap load more heap (with a very large new space), set -XX:CMSFullGCsBeforeCompaction=1, and the problem stopped occurring.
Probably you do have memory leaks that's cleared every 24 hours.
I have memory leak in the web application (servlet) that I am working on. I am suspicious of 1 cause and wanted to hear your ideas about it.
I use hashmaps,hashsets etc. as DB (about 20MB data loaded). These maps,sets get reloaded once in every 10 min. There are huge amount of simultaneous requests. I read that, GC passes objects, that are not collected for a time period/cycle, to a generation (old and permanent generations) which is less checked or garbage collected. I think that my usage for static maps,sets is causing me leak problem. What do you think ?
As Romain noted, the static map is a suspect. If for some reason you can't regularly clean it up explicitly, you may consider using a WeakHashMap instead, which is
A hashtable-based Map implementation with weak keys. An entry in a WeakHashMap will automatically be removed when its key is no longer in ordinary use. More precisely, the presence of a mapping for a given key will not prevent the key from being discarded by the garbage collector, that is, made finalizable, finalized, and then reclaimed. When a key has been discarded its entry is effectively removed from the map, so this class behaves somewhat differently than other Map implementations.
Unfortunately, as of Java6 there seems to be no WeakHashSet in the standard library, but several implementations can be found on the net.
It is not a leak if you have removed all references to it. If you're clearing out your map entirely, then it's not the source of a leak. You should consider the fact that the JVM chooses not to GC tenured generation very often as irrelevant to you - all that matters is that you don't have a reference to it, so the JVM could GC it if it wants to.
There are different strategies that JVMs can use to manage GC, so I'm speaking in generalities instead of specifics here, but GCing tenured spaces tends to be very expensive and has a high impact on the application, so the JVM chooses not to do it often in general.
If you're looking at the amount of heap space used, you'll see a sawtooth pattern as items are added and eventually collected. Don't be concerned about where the top of the sawtooth is, be concerned about where the bottom is (and how close to the maximum amount of heap space available that is).
One way to test if it really is a leak is to load test your app for a long period of time. If you have a leak, the base amount of memory that your app is using will go up over time (the bottom of the sawtooth). If you don't, it will remain constant. If you do have a leak, you can use a profiler to help you find it.
Static Maps are a known source of leaks. The reason being that people put stuff in and do not remove them. If every ten minutes you simply clear the cache and then reload then you should be fine.
I would bet that you are not clearing it properly. The GC part is working properly, I would not worry that it is the issue.
You may also want to consider using WeakReference if you have some way of falling back to the real data if part of your cache is GC-ed but then subsequently required.
I suggest that you check the heap contents using a heap dump and a heap analyzer (such as JVisualVM). This will help you find leakage suspects. The fact that the old generation is collected less frequently does not mean that more memory is leaking; remember that while it may seem full, only a portion of it represents live objects, and the other portion is clearned by the next major GC. Like others said, the problem may be because of incomplete cleaning of the static collections.
The permanent generation never receives promoted objects. It is an out-of-heap area reserved for other purposes, such as reflective information of loaded classes, and interned strings.