Garbage Collection behavior - java

During start up of my application, database is queried, objects are created (from the result of the query) and are inserted in a a Arraylist. The arraylist is later looped and another data structure is created out of it. The arraylist (which is huge in size) is later garbage collected. My question is, is this a strain on a garbage collector to collect such a big object at once. What if I create a QUEUE data structure instead of arraylist. Reading the object from the queue would make them eligible for GC. Is that lesser strain on the GC? I am aware that GC could run anytime and there are no guarantees of it execution. More that the timing of execution, what I would like to understand is is it more work for the GC to collect from a contiguous location of memory (arraylist) as against a QUEUE , in which memory allocation is not contiguous?

is it more work for the GC to collect from a contiguous location of memory (arraylist) as against a QUEUE , in which memory allocation is not contiguous?
It is more work to clean up a linked list based queue than an ArrayList. This is becaume an ArrayList has two objects, the Queue has one object per element.
If you want to reduce the GC load, process the data as you read it. This way you won't need a queue or a list and you might find you have processed all the data by the time it has downloaded. i.e. it could be quite a bit faster too.

The biggest strain here comes from keeping objects, which are "huge in size" in memory. It can cause GC to work more frequently if other objects need to be created on a heap or even lead to "out of memory" exception when the size of your DB and ArrayList increase.
Any solution that would allow you to decrease the size of memory allocated to "huge" objects will help. If you can build your queue in such a way that queue elements are released fast without waiting for all other objects to be read from a DB, go for it.
As Peter mentioned in his answer, it would be even better to process an object as soon as it was read from a DB without queuing it or adding to a list.
One of the possible solutions would be to re-design your data access layer and use ResultSet (http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html), which is available in any Java platform that I can think of. Since ResultSet is kept on a DB side, you can read records one at a time and decrease the strain on your memory significantly.
Another approach would be to implement pagination, e.g. by changing your original query in such a way that only a portion of ListArray is read from a DB at a time.

Related

How to do user control cache eviction/ garbage collection in java?

I have a situation where there are 2 files A and B, and data is being written continuously in both of them (like a stream).
Now I know that both files A and B are going to be competing for memory and the garbage collector is going to decide what page for what file will be replaced.
I want to control garbage collection by making the garbage collector favor file A (i.e. garbage collector should always choose eviction of pages of file B compared to A). Other possibility is to force writing of file B to disk instead of caching in memory.
Can these things happen in java?
I suspect you are confusing memory management with garbage collection. Yes, garbage collection is a form of memory management, but it's not what you are talking about when discussing "which pages of memory will be swapped out to disk when memory space is low" That's not garbage collection because there are still active references to the A and B files. The Garbage Collector won't do anything until there are no references to an object.
You want to control memory page swapping not garbage collection. I'm sure I'll be corrected in comments if I'm wrong about this, but I don't think you can control in Java which pages of memory get swapped to disk when available memory is low.
You cannot forcefully ask Java to do garbage collection.
But you can call System.gc() to request the JVM to do a garbage collection.
To make sure an object is ready for garbage collection you can assign it to null. That way you can make sure that when the garbage collector runs it gets this object and is removed from the heap.
Java has automatic garbage collection and identifies which objects are in use and which are not, and deleting the unused objects.
A good source about garbage collection within Java is here
The description of your problem lacks certain details, specifically, are the writes to your files sequential or is there random access involved?
As geneSummons correctly points out, you have memory management in the JVM confused with that of the Operating System. Even sun.misc.Unsafe will not allow you control over paging activity at the OS level from a Java application.
What you may want to look at is using memory mapped files, but that does depend on whether you are using random access for your writes. If all you're doing is writing sequentially this is most likely no use. Although this does not give you control over the paging of the files at the OS level it may provide you with a more efficient way of solving your problem.
There is a useful article on this subject, https://howtodoinjava.com/java-7/nio/java-nio-2-0-memory-mapped-files-mappedbytebuffer-tutorial/

does java garbage collection securely wipe out garbage data?

This is a memory data security question.
Does java garbage collection securely wipe out garbage data?
Apparently after a chunk of data is garbage-collected, I cannot retrieve it anymore, but can a hacker still memory-dump to retrieve the data?
As other users already mentioned here, JVMs don't clean memory securely after garbage collection because it would affect performance badly. That's why many programs (especially security libraries) use mutable structures instead of immutable (char arrays instead of strings etc) and clean data themselves when they are no more needed.
Unfortunately, even such approach doesn't always work. Let's look at this scenario:
You create a char array with a password.
JVM performs a garbage collection and moves your char array to another place in the memory, leaving the memory previously occupied by it intact, just marking it as a free block. So, we now have a 'dirty copy' of your password.
You are done working with your password and explicitly zero all the characters in your char array thinking that everything is safe now.
An attacker makes a dump of your memory and finds your password in the memory where it was placed the first time, before step 2.
I can think of only one possible solution for this problem:
Use G1 garbage collector.
Make your sensitive data a single block (array of primitive values) that is large enough to occupy more than half of the region size, used by G1 (by default, this size depends on the maximum heap size, but you can also specify it manually). That would force the collector to treat your data as so called 'humongous object'. Such objects are not moved in memory by G1 GC.
In such case when you erase some data in your block manually, you can be sure that no other 'dirty copies' of the same data exist somewhere in the heap.
Another solution would be to use off-heap data that you can handle manually as you like, but that wouldn't be pure Java.
This depends on the JVM implementation and possibly options within it but I would assume that it won't clear the data. Garbage collection needs only track which areas are available. Setting all of that data to 0 or something else is a lot of unecessary writes. It's for this reason you will often see APIs use a char array for passwords instead of Strings.
Specifically Oracle JVM won't clear the space, it only copies data between Eden and Survivor spaces, objects that are no longer used just stay there as a garbage that will be overwritten eventually. Similar thing happens in the OldGen, some places are marked as used, and when object becomes eligible for garbage collection, the place it occupied is marked as not used. It will also be overwritten eventaully, given enough application time.

OutOfMemoryError:GC Overhead limit exceeded

In one of our java application we have got
OutOfMemoryError:GC Overhead limit exceeded.
We have used HashMaps in someplaces for storing some data.From logs we can I identify that its reproducing at the same place.
I wanted to ask if Garbage Collector spends more time in clearing up the hashmaps?
Upon looking at the code( i cant share here ), I have found that that there is a Hashmap created like
Hashmap topo = new HashMap();
but this hashmap is never used.
Is this a kind of memory leak in my application ?
If this Hashmap is created inside a method which is doing some processing and it is not used elsewhere also this method is accessed my multiple threads say 20 .Then in such a case would it impact,creating Hashmap as above, Garbage collector to spend more time in recovering heap and throw OOME.
Please let me know if you need some more details.
n one of our java application we have got OutOfMemoryError:GC Overhead
limit exceeded. We have used HashMaps in someplaces for storing some
data.From logs we can I identify that its reproducing at the same
place.
If the Hashmap is simply ever building and most likely marked as static, which means you keep adding things to this hashmap and never delete. Then one fine day it will lead to OutOfMemoryError.
I wanted to ask if Garbage Collector spends more time in clearing up
the hashmaps?
Garbage collector spends time on the objects which are not referenced, weakly referenced, soft referenced. Wherever it find such objects, depending on the need it will clear them.
Upon looking at the code( i cant share here ), I have found that that there is a Hashmap
created like Hashmap topo = new HashMap(); , but this hashmap is never used. Is this a
kind of memory leak in my application ?
if this Hashmap is created inside a method which is doing some
processing and it is not used elsewhere also this method is accessed
my multiple threads say 20 . Then in such a case would it
impact,creating Hashmap as above, Garbage collector to spend more time
in recovering heap and throw OOME.
If it is hashmap local to a methid, and the method exits after doing some processing, then it should be garbage collected as soon as method exits. As the hashmap is local to the method, so each thread will have a separate copy of this map and once thread finishes the method execution, map is eligible for GC.
You need to look for long-lifetime objects & structures, which might be the actual problem, rather than wildly grasping at some clueless manager's idea of potential problem.
See:
How to find memory leaks using visualvm
How to find a Java Memory Leak
Look out especially for static/ or application-lifetime Maps or Lists, which are added to during the lifetime rather than just at initialization. It will most likely be one, or several, of these that are accumulating.
Note also that inner classes (Listeners, Observers) can capture references to their containing scope & prevent these from being GC'ed indefinitely.
Please let me know if you need some more details.
You need some more details. You need to profile your application to see what objects are consuming the heap space.
Then, if some of the sizeable objects are no longer actually being used by your application, you have a memory leak. Look at the references to these objects to find out why they're still being held in memory when they're no longer useful, and then modify your code to no longer hold these references.
Alternatively, you may find that all of the objects in memory are what you would expect as your working set. Then either you need to increase the heap size, or refactor your application to work with a smaller working set (e.g. streaming events one at a time rather than reading an entire list; storing the last seesion details in the database rather than memory; etc.).

Remove ArrayList Object from Memory

I have a bunch of Objects in an ArrayList, if I call ArrayList.remove(object) Do I need to do anything else to remove the object from memory? I am adding and removing objects from this list at a fairly very quick pace, so if it doesn't get removed from memory I it will start taking up space and start to slow down the Game.
- When you call ArrayList.remove(object), you just remove the objects from the List Not from the Memory.
- It will depend on the Garbage Collector to decide when its gonna remove the object from the heap, under normal circumstances its an object is ready for garbage collection as it has No reference to it anymore.
- There is a classic example of why String which is an object in Java should not be used for storing password instead char[] should be used.
See this link...
Why is char[] preferred over String for passwords?
Java does automatic garbage collection. So once an object is no longer referred to, it can be deleted, that doesn't mean it will be deleted. Garbage collection is automatic, you can ask that it be done by calling System.gc() however this is just a sugestion to run it.
http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/System.html#gc%28%29
No, you don't have to do anything else, as long as that's the only place that's referencing the object. Welcome to the joys of a garbage-collected language! Java will clean up old, unreferenced objects when it decides that it needs to reclaim some memory.
if you chew through the heap quick enough, you can nudge the gc along with some jvm args ... we have an app that handles billions of operations a day and have tuned it heavily with the ergonomic gc settings I'd encourage you to play with the adaptive size policy, and the max pause setting primarily. Run the program with a profiler exercising the arraylist as you normally would for a while (several minutes) and see how the ambient state of the various heap generations looks. You may end up having to also tweak the memory allocations to generations.
ArrayList.remove removes the object from the array, and then if the object is not referenced to by any other objects, the GC will delete that object.

what's more efficient? to empty an object or create a new one?

how expensive is 'new'? I mean, should I aim at reusing the same object or if the object is 'out of scope' it's the same as emptying it?
example, say a method creates a list:
List<Integer> list = new ArrayList<Integer>();
at the end of the method the list is no longer in use - does it mean that there's no memory allocated to it anymore or does it mean that there's a null pointer to it (since it was 'created').
Alternately, I can send a 'list' to the method and empty it at the end of the method with: list.removeAll(list); will that make any difference from memory point of view?
Thanks!
its an array list, so creating a new object means allocating a slab of memory and zeroing it, plus any bookkeeping overhead. Clearing the list means zeroing the memory. This view would lead you to believe that clearing an existing object is faster. But, it's likely that the JVM is optimized to make memory allocations fast, so probably none of this matters. So just write clear, readable code, and don't worry about it. This is java after all, not c.
at the end of the method the list is no longer in use - does it mean that there's no memory allocated to it anymore or does it mean that there's a null pointer to it (since it was 'created').
Means there are no references to it and object is eligible for GC.
Alternately, I can send a 'list' to the method and empty it at the end of the method with: list.removeAll(list); will that make any difference from memory point of view?
It's tradeoff between time/space. Removing elements from list is time consuming, even though you don't need to create new objects.
With the latest JVMs GC collection capabilities, it is ok to create new object WHEN REQUIRED (but avoiding object creation in loop is best). Longer references to an object sometimes make that object NOT eligible for GC and may cause memory leak if not handled properly.
I don't know much about memory footprints in java, but I think emptying a List to reuse it, is not such a good idea because of the performance impact of emptying the List. And I think it is also in an OO perspective not a good idea, because you should have one object with just one purpose.
At the end of a method the object is indeed out of scope. But that doesn't mean it is garbage collected or even eligible for garbage collection, because others might still reference that List. So basically: if there are no objects references to that List then it might be elegible for garbage collection, but if it will be garbage collected it still unsure, if the List is still stored in the Young Generation space it can either be in the Eden space or Tenured space.
Eden space is where objects are first allocated, when garbage collection happens and the object is still alive it will be moved to survivor space. If it still survives past that it will move on to the Tenured space, where I believe not much garbage collection happens. But all this depends how long an object lives, who refers to this object and where it is allocated
how expensive is 'new'?
It definitely incurs some overhead. But it depends on how complex the object is. If you are creating an object with just few primitives, not that expensive. But if you are creating objects inside objects, may be collections of objects, if your constructor is reading some properties file to initialize object's member variables, EXPENSIVE!
But to be frank, if we need to create a new object, we have create it, there is no alternative. And if we don't need to and if we are still creating that is kind of bad programming.
at the end of the method the list is no longer in use - does it mean
that there's no memory allocated to it anymore or does it mean that
there's a null pointer to it (since it was 'created').
Once the object does not have any reference to it, it becomes out of scope, and it becomes eligible for garbage collection. Hence even if it has some memory allocated, it will be reclaimed by the GC at some later point, whenever it runs, we need not worry about it. (And we cannot guarantee when will GC run).
Emptying the collection at the end, I don't think will make things any better, because the same thing will happen to all the individual objects in the collection, as what happens to the collection itself. They will become eligible for GC.
For small lists, it is probably a bit cheaper cheaper to clear() the list.
For the asymptotic case of really large lists in a really large heap, it boils down to whether the GC can zero a large chunk of memory faster than the for loop in clear() can. And I think it probably can.
However, my advice would be to ignore this unless you have convincing evidence (from profiling) that you have a high turn-over of ArrayList objects. (It is a bad idea to optimize based solely on your intuition.)
It depends on how costly the object is, both in terms of initialization required and how large it's memory footprint is. It also depends heavily on the kind of application (what else does the application spend time on).
For your example with the ArrayList, its already very hard to give a definite answer - depending on how many entries there are in the list, clear() can be very expensive or very cheap, while a new ArrayList has almost constant cost.
The general rule of thumb is: Don't bother with reusing objects until you have measured that you have a performance problem, and then be very sure that creating the objects is the cause of that problem. Most likely there are more rewarding optimization opportunities in your application. A profiler will help identify the places where you spend the most time. Focus on those and better algoryhtms.

Categories

Resources