How to do user control cache eviction/ garbage collection in java? - java

I have a situation where there are 2 files A and B, and data is being written continuously in both of them (like a stream).
Now I know that both files A and B are going to be competing for memory and the garbage collector is going to decide what page for what file will be replaced.
I want to control garbage collection by making the garbage collector favor file A (i.e. garbage collector should always choose eviction of pages of file B compared to A). Other possibility is to force writing of file B to disk instead of caching in memory.
Can these things happen in java?

I suspect you are confusing memory management with garbage collection. Yes, garbage collection is a form of memory management, but it's not what you are talking about when discussing "which pages of memory will be swapped out to disk when memory space is low" That's not garbage collection because there are still active references to the A and B files. The Garbage Collector won't do anything until there are no references to an object.
You want to control memory page swapping not garbage collection. I'm sure I'll be corrected in comments if I'm wrong about this, but I don't think you can control in Java which pages of memory get swapped to disk when available memory is low.

You cannot forcefully ask Java to do garbage collection.
But you can call System.gc() to request the JVM to do a garbage collection.
To make sure an object is ready for garbage collection you can assign it to null. That way you can make sure that when the garbage collector runs it gets this object and is removed from the heap.

Java has automatic garbage collection and identifies which objects are in use and which are not, and deleting the unused objects.
A good source about garbage collection within Java is here

The description of your problem lacks certain details, specifically, are the writes to your files sequential or is there random access involved?
As geneSummons correctly points out, you have memory management in the JVM confused with that of the Operating System. Even sun.misc.Unsafe will not allow you control over paging activity at the OS level from a Java application.
What you may want to look at is using memory mapped files, but that does depend on whether you are using random access for your writes. If all you're doing is writing sequentially this is most likely no use. Although this does not give you control over the paging of the files at the OS level it may provide you with a more efficient way of solving your problem.
There is a useful article on this subject, https://howtodoinjava.com/java-7/nio/java-nio-2-0-memory-mapped-files-mappedbytebuffer-tutorial/

Related

Exclusion of elements in a Java file

I have some doubts about the garbage collector and how I can clear memory in Java.
I have a program that writes a binary search tree to a file and I made a function that inserts an element and another that removes an element, but in the method that removes I put the elements that I remove in a space in the file that I call "empty blocks" (which is a stack). In the C language there is a method that freed the memory that was free(), in Java there is the garbage collector that is at the discretion of Java. How can I free the memory of these blocks in the file (elements excluded).
Is there a way to free the memory of an element on file in Java (the element is of type int)?
I put the elements that I remove in a space in the file that I call “empty blocks ”(Which is a stack)
Whatever data structure you use to track your data will be in an object of some class.
When that object no longer has any references pointing to it, that object becomes a candidate for garbage collection. No need for you to do anything except not hang on to any reference longer than needed.
The garbage collector may clear the unneeded object immediately, or may clear it later. Either way, we as Java programmers do not care. Eventually the memory will be freed up.
If the reference variable pointing to an object is a local variable, that reference is dropped when the local variable goes out of scope.
If the reference variable is a member field on another object, the
object in question will be released when the other object becomes
garbage.
If the reference variable is static, you should assign null explicitly to let the referenced object become garbage. In Java, static variables stay in memory throughout the execution run of your app.
In the first two cases, you can release the object sooner by setting the reference variable to null. Generally this is not needed, but doing so may be wise if a large amount of memory is at stake. Ditto if other precious resources are being needlessly held.
Is there a way to free the memory of an element on file in Java (the element is of type int)?
Your question is really hard to understand, but I think you are asking about freeing up disk blocks in a data structure stored in a file1.
There is no Java support for this. If you write a data structure to a file, the problem of reclaiming space in the file is yours, not Java's. Indeed, I don't think that a typical OS will allow you to (literally) free disk blocks in the middle of a file2.
There may be 3rd-party libraries that support this kind of thing, but I don't have the background knowledge to make a recommendation.
If I have correctly understood what you are asking, your discussion of C's malloc / free versus Java's garbage collection is only peripherally relevant. Both of these schemes are for managing memory, not space in a random access file. Now you could conceivably implement similar schemes for managing space in a file, but you would need to take account of the different characteristics of memory and disk I/O. (Even if you are mapping the file into memory.)
1 - If you are actually talking about managing objects in heap memory in Java, your best bet is to just let the garbage collector deal with it; see Basil's answer. There are also 3rd-party libraries for storing objects in off-heap memory, but it is unclear if they would help you. I understand that such libraries typically leave it to the programmer to decide when to free an object. (They are not garbage collected.)
2 - It would be a bad idea. If the disk blocks thus freed were then used in a different file, you would get a lot of file fragmentation. That would be bad for file I/O performance.

does java garbage collection securely wipe out garbage data?

This is a memory data security question.
Does java garbage collection securely wipe out garbage data?
Apparently after a chunk of data is garbage-collected, I cannot retrieve it anymore, but can a hacker still memory-dump to retrieve the data?
As other users already mentioned here, JVMs don't clean memory securely after garbage collection because it would affect performance badly. That's why many programs (especially security libraries) use mutable structures instead of immutable (char arrays instead of strings etc) and clean data themselves when they are no more needed.
Unfortunately, even such approach doesn't always work. Let's look at this scenario:
You create a char array with a password.
JVM performs a garbage collection and moves your char array to another place in the memory, leaving the memory previously occupied by it intact, just marking it as a free block. So, we now have a 'dirty copy' of your password.
You are done working with your password and explicitly zero all the characters in your char array thinking that everything is safe now.
An attacker makes a dump of your memory and finds your password in the memory where it was placed the first time, before step 2.
I can think of only one possible solution for this problem:
Use G1 garbage collector.
Make your sensitive data a single block (array of primitive values) that is large enough to occupy more than half of the region size, used by G1 (by default, this size depends on the maximum heap size, but you can also specify it manually). That would force the collector to treat your data as so called 'humongous object'. Such objects are not moved in memory by G1 GC.
In such case when you erase some data in your block manually, you can be sure that no other 'dirty copies' of the same data exist somewhere in the heap.
Another solution would be to use off-heap data that you can handle manually as you like, but that wouldn't be pure Java.
This depends on the JVM implementation and possibly options within it but I would assume that it won't clear the data. Garbage collection needs only track which areas are available. Setting all of that data to 0 or something else is a lot of unecessary writes. It's for this reason you will often see APIs use a char array for passwords instead of Strings.
Specifically Oracle JVM won't clear the space, it only copies data between Eden and Survivor spaces, objects that are no longer used just stay there as a garbage that will be overwritten eventually. Similar thing happens in the OldGen, some places are marked as used, and when object becomes eligible for garbage collection, the place it occupied is marked as not used. It will also be overwritten eventaully, given enough application time.

java - How can Garbage Collector quickly know which objects do not have references to them any more?

I understand that in Java, if an object doesn't have any references to it any more, the garbage collector will reclaim it back some time later.
But how does the garbage collector know that an object has or has not references associated to it?
Is garbage collector using some kind of hashmap or table?
Edit:
Please note that I am not asking how generally gc works. really, I am not asking that.
I am asking specifically that How gc knows which objects are live and which are dead, with efficiencies.
That's why I say in my question that is gc maintain some kind of hashmap or set, and consistently update the number of references an object has?
A typical modern JVM uses several different types of garbage collectors.
One type that's often used for objects that have been around for a while is called Mark-and-Sweep. It basically involves starting from known "live" objects (the so-called garbage collection roots), following all chains of object references, and marking every reachable object as "live".
Once this is done, the sweep stage can reclaim those objects that haven't been marked as "live".
For this process to work, the JVM has to know the location in memory of every object reference. This is a necessary condition for a garbage collector to be precise (which Java's is).
Java has a variety of different garbage collection strategies, but they all basically work by keeping track which objects are reachable from known active objects.
A great summary can be found in the article How Garbage Collection works in Java but for the real low-down, you should look at Tuning Garbage Collection with the 5.0 Java[tm] Virtual Machine
An object is considered garbage when it can no longer be reached from any pointer in the running program. The most straightforward garbage collection algorithms simply iterate over every reachable object. Any objects left over are then considered garbage. The time this approach takes is proportional to the number of live objects, which is prohibitive for large applications maintaining lots of live data.
Beginning with the J2SE Platform version 1.2, the virtual machine incorporated a number of different garbage collection algorithms that are combined using generational collection. While naive garbage collection examines every live object in the heap, generational collection exploits several empirically observed properties of most applications to avoid extra work.
The most important of these observed properties is infant mortality. ...
I.e. many objects like iterators only live for a very short time, so younger objects are more likely to be eligible for garbage collection than much older objects.
For more up to date tuning guides, take a look at:
Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning
Java Platform, Standard Edition HotSpot Virtual Machine Garbage Collection Tuning Guide (Java SE 8)
Incidentally, be careful of trying to second guess your garbage collection strategy, I've known many a programs performance for be trashed by over zealous use of System.gc() or inappropriate -XX options.
GC will know that object can be removed as quickly as it is possible. You are not expected to manage this process.
But you can ask GC very politely to run using System.gc(). It is just a tip to the system. GC does not have to run at that moment, it does not have to remove your specific object etc. Because GC is the BIG boss and we (Java programmers) are just its slaves... :(
The truth is that the garbage collector does not, in general, quickly know which objects no longer have any incoming references. And, in fact, an object can be garbage even when there are incoming references it.
The garbage collector uses a traversal of the object graph to find the objects that are reachable. Objects that are not reached in this traversal are deemed garbage, even if they are part of a cycle of references. The delay between an object being unreachable, and the garbage collector actually collecting the object, could be arbitrarily long.
There is no efficient way - it will still require traversal of the heap, but there is a hacky way: when the heap is divided into smaller pieces (thus no need to scan the entire heap). This is the reason we have generational garbage collectors, so that the scanning takes less time.
This is relatively "easy" to answer when your entire application is stopped and you can analyze the graph of objects. It all starts from GC roots (I'll let you find the documentation for what these are), but basically these are "roots" that are not collected by the GC.
From here a certain scan starts that analyzes the "live" objects: objects that have a direct (or transitive) connection to these roots, thus not reclaimable. In graph theory this is know to "color/traverse" your graph by using 3 colors: black, grey and white. White means it is not connected to the roots, grey means it's sub-graph is not yet traversed, black means traversed and connected to the roots. So basically to know what exactly is dead/alive right now - you simply need to take all your heap that is white initially and color it to black. Everything that is white is garbage. It is interesting that "garbage" is really identified by a GC by knowing what is actually alive. There are some drawings to visualize this here for example.
But this is the simple scenario: when your application is entirely stopped (for seconds at times) and you can scan the heap. This is called a STW - stop the world event and people hate these usually. This is what parallel collectors do: stop everything, do whatever GC has to (including finding garbage), let the application threads start after that.
What happens when you app is running and you are scanning the heap? Concurrently? G1/CMS do this. Think about it: how can you reason about a leaf from a graph being alive or not when your app can change that leaf via a different thread.
Shenandoah for example, solves this by "intercepting" changes over the graph. While running concurrently with your application, it will catch all the changes and insert these to some thread local special queues, called SATB Queues (snapshot at the begging queues); instead of altering the heap directly. When that is finished, a very short STW event will occur and these queues will be drained. Still under the STW what that drain has "caused" is computed, i.e. : extra coloring of the graph. This is far simplified, just FYI. G1 and CMS do it differently AFAIK.
So in theory, the process is not really that complicated, but implementing it concurrently is the most challenging part.

Are there any objects that are not subject to garbage collection?

In java(1.6 or earlier) , are there any type of objects that are not subject to garbage collection?
All java objects are subject to garbage collection. However native resources are not directly managed by the garbage collector, some like window handles (JFrame) are freed by the garbage collector when a finalize() method is implemented others need manual resource management.
Also the jvm does not have to collect existing objects before it shuts down, this can cause subtle bugs like data not being flushed to disk.
Last there are extensions to the java spec for real time systems or smart cards which include unmanaged memory for performance and resource reasons. However this does not apply to the standard jvm.
Maybe you have heard about weak, soft and phantom references. Check this
http://weblogs.java.net/blog/2006/05/04/understanding-weak-references
It depends what you mean by saying 'objects'. All primitive types except string and all data that was not allocated on the JVM heap (using operator new) are not subject to GC. Everything else is subject to GC.
Depending on the implementation even static fields are kept in an "object" (which you can see in a heap dump) which are cleaned up when the Class is discarded.
What you could be referring to is proxied data structures. These include GUI components, Threads, and direct/memory mapped ByteBuffers. In every case, the Object is on the heap, however there are data structure(s) not on the heap.

Why do finalizers have a "severe performance penalty"?

Effective Java says :
There is a severe performance penalty for using finalizers.
Why is it slower to destroy an object using the finalizers?
Because of the way the garbage collector works. For performance, most Java GCs use a copying collector, where short-lived objects are allocated into an "eden" block of memory, and when the it's time for that generation of objects to be collected, the GC just needs to copy the objects that are still "alive" to a more permanent storage space, and then it can wipe (free) the entire "eden" memory block at once. This is efficient because most Java code will create many thousands of instances of objects (boxed primitives, temporary arrays, etc.) with lifetimes of only a few seconds.
When you have finalizers in the mix, though, the GC can't simply wipe an entire generation at once. Instead, it needs to figure out all the objects in that generation that need to be finalized, and queue them on a thread that actually executes the finalizers. In the meantime, the GC can't finish cleaning up the objects efficiently. So it either has to keep them alive longer than they should be, or it has to delay collecting other objects, or both. Plus you have the arbitrary wait time of actually executing the finalizers.
All these factors add up to a significant runtime penalty, which is why deterministic finalization (using a close() method or similar to explicitly finalize the object's state) is usually preferred.
Having actually run into one such problem:
In the Sun HotSpot JVM, finalizers are processed on a thread that is given a fixed, low priority. In a high-load application, it's easy to create finalization-required objects faster than the low-priority finalization thread can process them. Meanwhile, the space on the heap used by the finalization-pending objects is unavailable for other uses. Eventually, your application may spend all of its time garbage collecting, because all of the available memory is in use by objects pending finalization.
This is, of course, in addition to the other many reasons to not use finalizers that are described in Effective Java.
I just picked up my copy Effective Java off my desk to see what he's referring to.
If you read Chapter 2, Section 6, he goes into good detail about the various performance hits.
You can't know when the finalizer will run, or even if it will at all. Because those resources may never be claimed, you will have to run with fewer resources.
I would recommend reading the entirety of the section - it explains things much better than I can parrot here.
If you read the documentation of finalize() closely, you will notice that finalizers enable an object to prevent being collected by the GC.
If no finalizer is present, the object simply can be removed and does not need any more attention. But if there is a finalizer, it needs to be checked afterwards, if the object didn't become "visible" again.
Without knowing exactly how the current Java garbage collection is implemented (actually, because there are different Java implementations out there, there are also different GCs), you can assume that the GC has to do some additional work if an object has a finalizer, because of this feature.
My thought is this:
Java is a garbage collected language, which deallocates memory based on its own internal algorithms. Every so often, the GC scans the heap, determines which objects are no longer referenced, and de-allocates the memory.
A finalizer interrupts this and forces the deallocation of memory outside of the GC cycle, potentially causing inefficiencies.
I think best practices are to use finalizers only when ABSOLUTELY necessary such as freeing file handles or closing DB connections which should be done deterministically.
One reason I can think of is that explicit memory cleanup is unnecessary if your resources are all Java Objects, and not native code.

Categories

Resources