Effective Java says :
There is a severe performance penalty for using finalizers.
Why is it slower to destroy an object using the finalizers?
Because of the way the garbage collector works. For performance, most Java GCs use a copying collector, where short-lived objects are allocated into an "eden" block of memory, and when the it's time for that generation of objects to be collected, the GC just needs to copy the objects that are still "alive" to a more permanent storage space, and then it can wipe (free) the entire "eden" memory block at once. This is efficient because most Java code will create many thousands of instances of objects (boxed primitives, temporary arrays, etc.) with lifetimes of only a few seconds.
When you have finalizers in the mix, though, the GC can't simply wipe an entire generation at once. Instead, it needs to figure out all the objects in that generation that need to be finalized, and queue them on a thread that actually executes the finalizers. In the meantime, the GC can't finish cleaning up the objects efficiently. So it either has to keep them alive longer than they should be, or it has to delay collecting other objects, or both. Plus you have the arbitrary wait time of actually executing the finalizers.
All these factors add up to a significant runtime penalty, which is why deterministic finalization (using a close() method or similar to explicitly finalize the object's state) is usually preferred.
Having actually run into one such problem:
In the Sun HotSpot JVM, finalizers are processed on a thread that is given a fixed, low priority. In a high-load application, it's easy to create finalization-required objects faster than the low-priority finalization thread can process them. Meanwhile, the space on the heap used by the finalization-pending objects is unavailable for other uses. Eventually, your application may spend all of its time garbage collecting, because all of the available memory is in use by objects pending finalization.
This is, of course, in addition to the other many reasons to not use finalizers that are described in Effective Java.
I just picked up my copy Effective Java off my desk to see what he's referring to.
If you read Chapter 2, Section 6, he goes into good detail about the various performance hits.
You can't know when the finalizer will run, or even if it will at all. Because those resources may never be claimed, you will have to run with fewer resources.
I would recommend reading the entirety of the section - it explains things much better than I can parrot here.
If you read the documentation of finalize() closely, you will notice that finalizers enable an object to prevent being collected by the GC.
If no finalizer is present, the object simply can be removed and does not need any more attention. But if there is a finalizer, it needs to be checked afterwards, if the object didn't become "visible" again.
Without knowing exactly how the current Java garbage collection is implemented (actually, because there are different Java implementations out there, there are also different GCs), you can assume that the GC has to do some additional work if an object has a finalizer, because of this feature.
My thought is this:
Java is a garbage collected language, which deallocates memory based on its own internal algorithms. Every so often, the GC scans the heap, determines which objects are no longer referenced, and de-allocates the memory.
A finalizer interrupts this and forces the deallocation of memory outside of the GC cycle, potentially causing inefficiencies.
I think best practices are to use finalizers only when ABSOLUTELY necessary such as freeing file handles or closing DB connections which should be done deterministically.
One reason I can think of is that explicit memory cleanup is unnecessary if your resources are all Java Objects, and not native code.
Related
As far as I understand the documentation of System.gc() this call will point the GC towards regions of memory that the caller was ‘working’ on. There’s no guarantee for any clean up whatsoever happening after the method returns.
But say there was now obsolet data and the GC ‘decided’ to free the memory used by that data. Does this mean the freeing happened before the method returns? And if yes, is there a way to delay the freeing of memory itself? Would it make sense?
Say the statement above is true;
I am aware that simply delegating the call of System.gc() to another thread would make no sense following the logic implied by the documentation.
Would it on the other hand make sense to delegate references of obsolet data to another thread while simultaneously voiding the previous references anywhere else and then calling the GC on that thread?
For instance; say a singleton thread instance acts as a consumer and it simply consumes objects.
public static void consumeForGC(Object… args)
The objects are passed by reference which should (must?!) hinder the GC from freeing their allocated memory space. So when now calling the GC in the scope of this consumeForGC(…) in which the last known references to the given arguments are, does this achieve similar behavior to simply calling it (preferably at the end) of a caller and waiting for the call to return? Besides being very hacky, it would probably only increase the chances for some allocations to be released sooner, but at least it could give some control over which those are. The rest of the program could also carry on because it doesn’t have to wait for the GC to finish whatever it will actually do.
I hope this question is not too irrelevant nonetheless I am curious to hear what you guys think about this
As far as I understand the documentation of System.gc() this call will point the GC towards regions of memory that the caller was ‘working’ on.
It does not really "point towards memory". Rather invoking System.gc() can trigger a GC cycle that will run with somewhat different parameters compared to those automatically triggered by memory pressure or (in some concurrent collectors) background timers.
A GC cycle is always application-global, cleaning up some or all of the objects that are deemed unreachable from GC roots.
As a rule of thumb one shouldn't trigger System.gc without analysis of GC logs and application profiling to identify whether doing so improves some metrics. Used incorrectly it'll lead to excessive CPU consumption, decreased throughput and increased latency.
There’s no guarantee for any clean up whatsoever happening after the method returns.
The specification is vague, in practice it depends on the JVM, selected garbage collector and config flags. E.g. on hotspot there are flags DisableExplicitGC, ExplicitGCInvokesConcurrent and several other flags controlling its behavior.
But changing these parameters can impact other parts of the system, e.g. direct ByteBuffer allocations can resort to calling System.gc to trigger reclamation of unused buffers. If manual GCing is disabled or insufficiently aggressive it could lead to OOMEs when allocating direct buffers.
Does this mean the freeing happened before the method returns? And if yes, is there a way to delay the freeing of memory itself? Would it make sense?
It does not promise to clean any specific objects, or any objects at all for the matter. But if it does trigger a GC then it'll only return once that cycle is complete, but that does not guarantee that all unreachable objects have been collected.
After having read some articles about JVMs I was wondering if there are specific reasons one sees only usage of trace-based garbage collectors.
I know the JVM specification leaves these kind of design choices to the implementors of JVMs.
Is the assumption that reference-counting garbage collectors are inferior in regards to circular references the only reason or are there more reasons?
Garbage collectors either use tracing or are reference counted. (The EpsilonGC is a no-reclamation GC which uses neither, but that will simply blow up when it reaches the memory limit.)
Generally the JVM uses tracing reference collectors with a stop-the-world pause occasionally because it adds the least impact to the runtime. Reference counted garbage collectors are more predictable, but incur runtime costs during the application's threads to deal with the reference counts.
It's worth noting that whichever garbage collector is picked, it will have different effects on the code generation. For example, JVMs which have single threaded garbage collectors (such as when running on a single CPU) may have less overhead with assigning references than when a multi-threaded or multi-generational garbage collector runs. In those cases, mark cards or region pointers may be needed to highlight where and how references need to be checked. The exact requirements are dependent upon the garbage collector itself; for example, with Shenandoah and ZGC the generated code is different from using G1GC or CMS/Parallel.
So it would be possible to have a JVM GC that used reference counting; instead of generating a reference to a card mark, you could increase or decrease the object count.
However, reference counting GCs have a particular weakness in that they can't detect cycles. If you have A pointing to B and B pointing to A, but otherwise are unreachable, they can't be GC'd if you're using a pure reference counting strategy. Different languages that use reference counting deal with this in different ways; for example, Swift requires the programmer to determine a 'weak' reference which will be broken thereby allowing the cycle to be cleaned up.
The JVM doesn't (easily) provide a way of annotating such weak circles, so it's possible that a reference counting GC with Java would end up leaking such references, or it may require a full tracing pass periodically to evict such cycles. (Yes, you can use various Weak/Phantom references but they're almost a code smell in themselves.)
The other thing you can say is that a reference counted GC doesn't have any additional overhead when in a steady state, because once the object graphs are constructed, you don't need to visit them again. This may result in better cache locality for objects.
If a JVM gets into an allocation free state then the GC doesn't need to run, but likely young GCs will continue, and there are other periodic operations (biased lock revocation etc.) which may inadvertently trigger a GC. Generally though a JVM can be tuned so that its object creation set is in the eden gen only, which is a fairly minimal operation and won't require re-scanning the whole heap periodically.
If you're interested, you can give writing your own GC a go to find out.
This question already has answers here:
How to force garbage collection in Java?
(25 answers)
Closed 8 years ago.
I have a complex java application running on a large dataset. The application performs reasonably fast but as time goes it seems to eat lots of memory and slow down. Is there a way to run the JVM garbage collector without re-starting the application?
No, You cant force garbage collection.
Even using
System.gc();
You can just make a request for garbage collection but it depends on JVM to do it or not.
Also Garbage collector are smart enough to collect unused memory when required so instead of forcing garbage collection you should check if you are handling objects in a wrong way.
If you are handling objects in a wrong way (like keeping reference to unnecessary objects) there is hardly anything JVM can do to free the memory.
From Doc
Calling the gc method suggests that the Java Virtual Machine expend
effort toward recycling unused objects in order to make the memory
they currently occupy available for quick reuse. When control returns
from the method call, the Java Virtual Machine has made a best effort
to reclaim space from all discarded objects.
Open Bug regarding System.gc() documentation
The documentation for System.gc() is extremely misleading and fails to
make reference to the recommended practise of never calling
System.gc().
The choice of language leaves it unclear what the behaviour would be
when System.gc() is called and what external factors will influence
the behaviour.
Few useful link to visit when you think you should force JVM to free up some memory
1. How does garbage collection work
2. When does System.gc() do anything
3. Why is it bad practice to call System.gc()?
All says
1. You dont have control over GC in Java even System.gc() dont guarantee it.
2. Also its bad practise as forcing it may have adverse effect on performance.
3. Revisit your design and let JVM do his work :)
you should not relay on System.gc() - if you feel like you need to force GC to run it usually means that there is something wrong with your code/design. GC will run and clear your unused objects if they are ready to be created - please verify your design and think more about memory management, look as well for loops in object references.
The
System.gc()
call in java, suggest to the vm to run garbage collection. Though it doesn't guarantee that it will actually do it. Nevertheless the best solution you have. As mentioned in other responses jvisualvm utility (present in JDK since JDK 6 update 7), provides a garbage functionality as well.
EDIT:
your question open my appetite for the topic and I came across this resource:
oracle gc resource
The application performs reasonably fast but as time goes it seems to eat lots of memory and slow down.
These are a classic symptoms of a Java memory. It is likely that somewhere in your application there is a data structure that just keeps growing. As the heap gets close to full, the JVM spends an increasing proportion of its time running the GC in a (futile) attempt to claw back some space.
Forcing the GC won't fix this, because the GC can't collect the data structure. In fact forcing the GC to run just makes the application slower.
The cure for the problem is to find what is causing the memory leak, and fix it.
Performance gain/drop depends how often you need garbage collection and how much memory your jvm has and how much your program needs.
There is no certainity(its just a hint to the interpreter) of garbage collection when you call System.gc() but at least has a probability. With enough number of calls, you can achieve some statistically derived performance multiplier for only your system setup.
Below graph shows an example program's executions' consumptions and jvm was given only 1GB(no gc),1GB(gc),3GB(gc),3GB(no gc) heaps respectively to each trials.
At first, when jvm was given only 1GB memory while program needed 3.75GB, it took more than 50 seconds for the producer thread pool to complete their job because having less garbage management lead to poor object creation rate.
Second example is about %40 faster because System.gc() is called between each production of 150MB object data.
At third example, jvm is given 3GB memory space while keeping System.gc() on. More memory has given more performance as expected.
But when I turned System.gc() off at the same 3GB environment, it was faster!
Even if we cannot force it, we can have some percentage gain or drain of performance trying System.g() if we try long enough. At least on my windows-7 64 bit operating system with latest jvm .
Garbage collector runs automatically. You can't force the garbage collector.
I do not suggest that you do that but to force the garbage collector to run from within your java code you can just use all the available memory, this works because the garbage collector will run before the JVM throws OutOfMemoryError...
try {
List<Object> tempList = new ArrayList<Object>();
while (true) {
tempList.add(new byte[Integer.MAX_VALUE]);
}
} catch (OutOfMemoryError OME) {
// OK, Garbage Collector will have run now...
}
My answer is going to be different than the others but it will lead to the same point.
Explain:
YES it is possible to force the garbage collector with two methods used at the same time and in the same order this are:
System.gc ();
System.runFinalization ();
this two methods call will force the garbage collector to execute the finalise() method of any unreachable object and free the memory. however the performance of the software will down considerable this is because garbage runs in his own thread and to that one is not way to controlled and depending of the algorithm used by the garbage collector could lead to a unnecessary over processing, It is better if you check your code because it must be broken to you need use the garbage collector to work in a good manner.
NOTE: just to keep on mind this will works only if in the finalize method is not a reassignment of the object, if this happens the object will keep alive an it will have a resurrection which is technically possible.
After reading this question, I was reminded of when I was taught Java and told never to call finalize() or run the garbage collector because "it's a big black box that you never need to worry about". Can someone boil the reasoning for this down to a few sentences? I'm sure I could read a technical report from Sun on this matter, but I think a nice, short, simple answer would satisfy my curiosity.
The short answer: Java garbage collection is a very finely tuned tool. System.gc() is a sledge-hammer.
Java's heap is divided into different generations, each of which is collected using a different strategy. If you attach a profiler to a healthy app, you'll see that it very rarely has to run the most expensive kinds of collections because most objects are caught by the faster copying collector in the young generation.
Calling System.gc() directly, while technically not guaranteed to do anything, in practice will trigger an expensive, stop-the-world full heap collection. This is almost always the wrong thing to do. You think you're saving resources, but you're actually wasting them for no good reason, forcing Java to recheck all your live objects “just in case”.
If you are having problems with GC pauses during critical moments, you're better off configuring the JVM to use the concurrent mark/sweep collector, which was designed specifically to minimise time spent paused, than trying to take a sledgehammer to the problem and just breaking it further.
The Sun document you were thinking of is here: Java SE 6 HotSpot™ Virtual Machine Garbage Collection Tuning
(Another thing you might not know: implementing a finalize() method on your object makes garbage collection slower. Firstly, it will take two GC runs to collect the object: one to run finalize() and the next to ensure that the object wasn't resurrected during finalization. Secondly, objects with finalize() methods have to be treated as special cases by the GC because they have to be collected individually, they can't just be thrown away in bulk.)
Don't bother with finalizers.
Switch to incremental garbage collection.
If you want to help the garbage collector, null off references to objects you no longer need. Less path to follow= more explicitly garbage.
Don't forget that (non-static) inner class instances keep references to their parent class instance. So an inner class thread keeps a lot more baggage than you might expect.
In a very related vein, if you're using serialization, and you've serialized temporary objects, you're going to need to clear the serialization caches, by calling ObjectOutputStream.reset() or your process will leak memory and eventually die.
Downside is that non-transient objects are going to get re-serialized.
Serializing temporary result objects can be a bit more messy than you might think!
Consider using soft references. If you don't know what soft references are, have a read of the javadoc for java.lang.ref.SoftReference
Steer clear of Phantom references and Weak references unless you really get excitable.
Finally, if you really can't tolerate the GC use Realtime Java.
No, I'm not joking.
The reference implementation is free to download and Peter Dibbles book from SUN is really good reading.
As far as finalizers go:
They are virtually useless. They aren't guaranteed to be called in a timely fashion, or indeed, at all (if the GC never runs, neither will any finalizers). This means you generally shouldn't rely on them.
Finalizers are not guaranteed to be idempotent. The garbage collector takes great care to guarantee that it will never call finalize() more than once on the same object. With well-written objects, it won't matter, but with poorly written objects, calling finalize multiple times can cause problems (e.g. double release of a native resource ... crash).
Every object that has a finalize() method should also provide a close() (or similar) method. This is the function you should be calling. e.g., FileInputStream.close(). There's no reason to be calling finalize() when you have a more appropriate method that is intended to be called by you.
Assuming finalizers are similar to their .NET namesake then you only really need to call these when you have resources such as file handles that can leak. Most of the time your objects don't have these references so they don't need to be called.
It's bad to try to collect the garbage because it's not really your garbage. You have told the VM to allocate some memory when you created objects, and the garbage collector is hiding information about those objects. Internally the GC is performing optimisations on the memory allocations it makes. When you manually try to collect the garbage you have no knowledge about what the GC wants to hold onto and get rid of, you are just forcing it's hand. As a result you mess up internal calculations.
If you knew more about what the GC was holding internally then you might be able to make more informed decisions, but then you've missed the benefits of GC.
The real problem with closing OS handles in finalize is that the finalize are executed in no guaranteed order. But if you have handles to the things that block (think e.g. sockets) potentially your code can get into deadlock situation (not trivial at all).
So I'm for explicitly closing handles in a predictable orderly manner. Basically code for dealing with resources should follow the pattern:
SomeStream s = null;
...
try{
s = openStream();
....
s.io();
...
} finally {
if (s != null) {
s.close();
s = null;
}
}
It gets even more complicated if you write your own classes that work via JNI and open handles. You need to make sure handles are closed (released) and that it will happen only once. Frequently overlooked OS handle in Desktop J2SE is Graphics[2D]. Even BufferedImage.getGrpahics() can potentially return you the handle that points into a video driver (actually holding the resource on GPU). If you won't release it yourself and leave it garbage collector to do the work - you may find strange OutOfMemory and alike situation when you ran out of video card mapped bitmaps but still have plenty of memory. In my experience it happens rather frequently in tight loops working with graphics objects (extracting thumbnails, scaling, sharpening you name it).
Basically GC does not take care of programmers responsibility of correct resource management. It only takes care of memory and nothing else. The Stream.finalize calling close() IMHO would be better implemented throwing exception new RuntimeError("garbage collecting the stream that is still open"). It will save hours and days of debugging and cleaning code after the sloppy amateurs left the ends lose.
Happy coding.
Peace.
The GC does a lot of optimization on when to properly finalize things.
So unless you're familiar with how the GC actually works and how it tags generations, manually calling finalize or start GC'ing will probably hurt performance than help.
Avoid finalizers. There is no guarantee that they will be called in a timely fashion. It could take quite a long time before the Memory Management system (i.e., the garbage collector) decides to collect an object with a finalizer.
Many people use finalizers to do things like close socket connections or delete temporary files. By doing so you make your application behaviour unpredictable and tied to when the JVM is going to GC your object. This can lead to "out of memory" scenarios, not due to the Java Heap being exhausted, but rather due to the system running out of handles for a particular resource.
One other thing to keep in mind is that introducing the calls to System.gc() or such hammers may show good results in your environment, but they won't necessarily translate to other systems. Not everyone runs the same JVM, there are many, SUN, IBM J9, BEA JRockit, Harmony, OpenJDK, etc... This JVM all conform to the JCK (those that have been officially tested that is), but have a lot of freedom when it comes to making things fast. GC is one of those areas that everyone invests in heavily. Using a hammer will often times destroy that effort.
Python uses the reference count method to handle object life time. So an object that has no more use will be immediately destroyed.
But, in Java, the GC(garbage collector) destroys objects which are no longer used at a specific time.
Why does Java choose this strategy and what is the benefit from this?
Is this better than the Python approach?
There are drawbacks of using reference counting. One of the most mentioned is circular references: Suppose A references B, B references C and C references B. If A were to drop its reference to B, both B and C will still have a reference count of 1 and won't be deleted with traditional reference counting. CPython (reference counting is not part of python itself, but part of the C implementation thereof) catches circular references with a separate garbage collection routine that it runs periodically...
Another drawback: Reference counting can make execution slower. Each time an object is referenced and dereferenced, the interpreter/VM must check to see if the count has gone down to 0 (and then deallocate if it did). Garbage Collection does not need to do this.
Also, Garbage Collection can be done in a separate thread (though it can be a bit tricky). On machines with lots of RAM and for processes that use memory only slowly, you might not want to be doing GC at all! Reference counting would be a bit of a drawback there in terms of performance...
Actually reference counting and the strategies used by the Sun JVM are all different types of garbage collection algorithms.
There are two broad approaches for tracking down dead objects: tracing and reference counting. In tracing the GC starts from the "roots" - things like stack references, and traces all reachable (live) objects. Anything that can't be reached is considered dead. In reference counting each time a reference is modified the object's involved have their count updated. Any object whose reference count gets set to zero is considered dead.
With basically all GC implementations there are trade offs but tracing is usually good for high through put (i.e. fast) operation but has longer pause times (larger gaps where the UI or program may freeze up). Reference counting can operate in smaller chunks but will be slower overall. It may mean less freezes but poorer performance overall.
Additionally a reference counting GC requires a cycle detector to clean up any objects in a cycle that won't be caught by their reference count alone. Perl 5 didn't have a cycle detector in its GC implementation and could leak memory that was cyclic.
Research has also been done to get the best of both worlds (low pause times, high throughput):
http://cs.anu.edu.au/~Steve.Blackburn/pubs/papers/urc-oopsla-2003.pdf
Darren Thomas gives a good answer. However, one big difference between the Java and Python approaches is that with reference counting in the common case (no circular references) objects are cleaned up immediately rather than at some indeterminate later date.
For example, I can write sloppy, non-portable code in CPython such as
def parse_some_attrs(fname):
return open(fname).read().split("~~~")[2:4]
and the file descriptor for that file I opened will be cleaned up immediately because as soon as the reference to the open file goes away, the file is garbage collected and the file descriptor is freed. Of course, if I run Jython or IronPython or possibly PyPy, then the garbage collector won't necessarily run until much later; possibly I'll run out of file descriptors first and my program will crash.
So you SHOULD be writing code that looks like
def parse_some_attrs(fname):
with open(fname) as f:
return f.read().split("~~~")[2:4]
but sometimes people like to rely on reference counting to always free up their resources because it can sometimes make your code a little shorter.
I'd say that the best garbage collector is the one with the best performance, which currently seems to be the Java-style generational garbage collectors that can run in a separate thread and has all these crazy optimizations, etc. The differences to how you write your code should be negligible and ideally non-existent.
I think the article "Java theory and practice: A brief history of garbage collection" from IBM should help explain some of the questions you have.
One big disadvantage of Java's tracing GC is that from time to time it will "stop the world" and freeze the application for a relatively long time to do a full GC. If the heap is big and the the object tree complex, it will freeze for a few seconds. Also each full GC visits the whole object tree over and over again, something that is probably quite inefficient. Another drawback of the way Java does GC is that you have to tell the jvm what heap size you want (if the default is not good enough); the JVM derives from that value several thresholds that will trigger the GC process when there is too much garbage stacking up in the heap.
I presume that this is actually the main cause of the jerky feeling of Android (based on Java), even on the most expensive cellphones, in comparison with the smoothness of iOS (based on ObjectiveC, and using RC).
I'd love to see a jvm option to enable RC memory management, and maybe keeping GC only to run as a last resort when there is no more memory left.
Garbage collection is faster (more time efficient) than reference counting, if you have enough memory. For example, a copying gc traverses the "live" objects and copies them to a new space, and can reclaim all the "dead" objects in one step by marking a whole memory region. This is very efficient, if you have enough memory. Generational collections use the knowledge that "most objects die young"; often only a few percent of objects have to be copied.
[This is also the reason why gc can be faster than malloc/free]
Reference counting is much more space efficient than garbage collection, since it reclaims memory the very moment it gets unreachable. This is nice when you want to attach finalizers to objects (e.g. to close a file once the File object gets unreachable). A reference counting system can work even when only a few percent of the memory is free. But the management cost of having to increment and decrement counters upon each pointer assignment cost a lot of time, and some kind of garbage collection is still needed to reclaim cycles.
So the trade-off is clear: if you have to work in a memory-constrained environment, or if you need precise finalizers, use reference counting. If you have enough memory and need the speed, use garbage collection.
Reference counting is particularly difficult to do efficiently in a multi-threaded environment. I don't know how you'd even start to do it without getting into hardware assisted transactions or similar (currently) unusual atomic instructions.
Reference counting is easy to implement. JVMs have had a lot of money sunk into competing implementations, so it shouldn't be surprising that they implement very good solutions to very difficult problems. However, it's becoming increasingly easy to target your favourite language at the JVM.
The latest Sun Java VM actually have multiple GC algorithms which you can tweak. The Java VM specifications intentionally omitted specifying actual GC behaviour to allow different (and multiple) GC algorithms for different VMs.
For example, for all the people who dislike the "stop-the-world" approach of the default Sun Java VM GC behaviour, there are VM such as IBM's WebSphere Real Time which allows real-time application to run on Java.
Since the Java VM spec is publicly available, there is (theoretically) nothing stopping anyone from implementing a Java VM that uses CPython's GC algorithm.
Late in the game, but I think one significant rationale for RC in python is its simplicity. See this email by Alex Martelli, for example.
(I could not find a link outside google cache, the email date from 13th october 2005 on python list).