Java optimization to prevent heapspace out of memory - java

Ok, I have a problem in a particular situation that my program get the out of memory error from heap space.
Let's assume we have two ArrayList, the first one contains many T objects, the second one contains W object that are created from the T objects of first List.
And we cycle through it in this way (after the cycle the list :
public void funct(ArrayList<T> list)
{
ArrayList<W> list2 = new ArrayList<W>();
for (int i = 0 ; i < list.size() ; i++)
{
W temp = new W();
temp.set(list.get(i));
temp.saveToDB();
list2.add(temp);
}
// more code! from this point on the `list` is useless
}
My code is pretty similar to this one, but when list contains tons of objects I often get the heap space out of memory (during the for cycle), I'd like to solve this problem.
I do not know very well how the GC works in java, but surely in the previous example there are a lot of possible optimization.
Since the list is not used anymore after the for cycle I thought as first optimization to change from for loop to do loop and empty the list as we cycle through it:
public void funct(ArrayList<T> list)
{
ArrayList<W> list2 = new ArrayList<W>();
while (list.size() > 0)
{
W temp = new W();
temp.set(list.remove(0));
temp.saveToDB();
list2.add(temp);
}
// more code! from this point on the `list` is useless
}
Is this modification useful?
How can I do a better optimization to the above code? and how can I prevent heap space out of memory error? (increasing the XMX and XMS value is not a possibility).

You can try to set the -XX:MaxNewSize=40% of you Xmx AND -XX:NewSize=40% of you Xmx
This params will speedup the GC calls, because your creation rate is high.
For more help : check here

It really depends on many things. How big are the W and T objects?
One optimization you could surely do is ArrayList list2 = new ArrayList(list.size());
This way your listarray does not need to adjust its size many times.
That will not do much difference tough. The real problem is probably the size and number of your W and T objects. Have you thought of using different data structures to manage a smaller portion of objects at time?

If you did some memory profiling, you would have discovered that the largest source of heap exhaustion are the W instances, which you retain by adding them to list2. The ArrayList itself adds a very small overhead per object contained (just 4 bytes if properly pre-sized, worst case 8 bytes), so even if you retain list, this cannot matter much.
You will not be able to lessen the heap pressure without changing your approach towards the non-retention of each and every W instance you have created in your loop.

You continue to reference all the items from the original list :
temp.set(list.get(i)) // you probably store somewhere the passed reference
If the T object has a big size and you don't need all of its fields, try to use a projection of it.
temp.set( extractWhatINeed( list.get(i) ) )
This will involve creating a new class with fewer fields than T (the return type of the extract method).
Now, when you don't reference the original items, they are eligible for GC (when the list itself will not be referenced anymore).

Related

Java: Does clear() the big size list help in quick garbage collection?

Load 1.5 Million Records from Database 1
Load 1.5 Million Records from Database 2
List<DannDB> dDb = fromNamedQuery(); //return em.createNamedQuery("").getResultList();
List<LannDB> lDb = fromNamedQuery();
Compare its data.
Update/persist into Database (Using JPA)
and program ends after two hours.
Same iteration happens every third hour and many a times give Out of Memory.
Does following statement work, does object becomes out of scope with this?
dDb.clear();
or
dDb = null
or what else I can do?
Assuming that your goal is to reduce the occurrence of OOMEs over all other considerations ...
Assigning null to the List object will make the entire list eligible for garbage collection. Calling clear() will have a similar effect, though it will depend on the List implementation. (For example, calling clear() on an ArrayList doesn't release the backing array. It just nulls the array cells.)
If you can recycle an ArrayList for a list of roughly the same size as the original, you can avoid the garbage while growing the list. (But we don't know this is an ArrayList!)
Another factor in your use-case is that:
List<DannDB> dDb = fromNamedQuery();
is (presumably) going to create a new list anyway. That would render a clear() pointless. (Just assign null to dDb, or let the variable go out of scope or be reassigned the new list.)
A final issue is that it is conceivable that the list is finalizable. That could mean that the list object takes longer to delete.
Overall, I can't say which of assigning null and calling clear() will be better for the memory footprint. Or that either of these will make a significant difference. But there is no reason why you can't try both alternatives, and observe what happens.
The only other things I can suggest are:
Increase the heap size (and the RAM footprint).
Change the application so that you don't need to hold entire database snapshots in memory. Depending on the nature of the comparison, you could do it in "chunks" or by streaming the records1.
The last one is the only solution that is scalable; i.e. that will work with an ever larger number of records. (Modulo the time taken to deal with more records.)
Running System.gc() is unlikely to help. And since the real problem is that you are getting OOMEs, anything that tries to get the JVM to shrink the heap by giving memory back to the OS is counterproductive.
1 - Those of you are old enough will remember the classic way of implementing a payroll system with magnetic tape storage. If you can select from the two data sources in the same key order, you may be able to use the classic approach to compare them. For example, reading two resultsets in parallel.
In the case of SQL, you can get your two ResultSets and compare their data iteratively. This way, you don't have ro save all your data in first place.
I assume that your data looks like this for demonstration purposes:
String email1
String email2
int someInt
abc#def.ghi
jkl#mno.pqr
1234567
xyz#gmail.com
8901234
To detect a difference between two ResultSets of this database:
boolean equals(ResultSet a, ResultSet b) {
while(a.next() && b.next()) {
String aEmail1 = a.getString(1);
String bEmail1 = b.getString(1);
if(!aEmail1.equals(bEmail1)) return false;
String aEmail2 = a.getString(2);
String bEmail2 = b.getString(2);
if(!aEmail2.equals(bEmail2)) return false;
int aSomeInt = a.getInt(3);
int bSomeInt = b.getInt(3);
if(aSomeInt!=bSomeInt) return false;
if(a.isLast()!=b.isLast())
throw new IllegalArgumentException(
"ResultSets have different amounts of rows!"
);
}
return true;
}
To set the contents of ResultSet oldData (also its corresponding database connection) to ResultSet newData:
void updateA(ResultSet oldData, ResultSet newData) {
while(oldData.next() && newData.next()) {
String newEmail1 = newData.getString(1);
oldData.updateString(1,newEmail1);
String newEmail2 = newData.getString(2);
oldData.updateString(2,newEmail2);
int newSomeInt = newData.getInt(3);
oldData.updateInt(3,newSomeInt);
if(oldData.isLast()!=newData.isLast())
throw new IllegalArgumentException(
"ResultSets have different amounts of rows!"
);
}
}
You can ofcourse leave out the if(a.isLast()!=newData.isLast)) ... and if(oldData.isLast()!=newData.isLast()) ... if you don't care that the two sets don't have the same amount of rows.
The thing is that, by default, once allocated heap memory size does not shrink (I mean the memory size allocated from the operating system). If your Java application at one time needed 2 GB of RAM it will keep that reserved from the operating system by default.
If you can, try to change the design of your application to not firstly load all data into memory, but only load what you really need to do your work.
If you really need the two big batches at the same time think about using the following Java command line argument: "-XX:+UseAdaptiveSizePolicy", which would make it possible to shrink the heap space after big memory usages.
You can also call the garbage collector via "System.gc();", but that a) does not shrink the allocated heap memory without the suggested command line argument, and b) really, you should not think about this. Java will run it on it's own by time.
Edit: Improved my first explanation a bit.
Best for memory usage would be for the list to not go out of scope. So it would be better (memory wise) to just modify the content one by one, keeping only one temporary entry object instead of a whole other list.
So you could create a getNextFromNamedQuery() and hasNextInNamedQuery() method and set the data at the current index.
e.g.:
int i=0;
while(hasNextInNamedQuery()) {
if(dDb.size()<=i) dDb.add(getNextFromQuery());
else dDb.set(i,getNextFromQuery());
i++;
}

Does java compiler insert free when pointer is allocated and go out of scope in a block?

I am scratching my head trying to understand the point of the following code
Map<String Set<MyOtherObj>> myMap = myapi.getMyMap();
final MyObj[] myObjList;
{
final List<MyObj> list = new ArrayList<>(myMap.size());
for (Entry<String, Set<MyOtherObj>> entry : myMap.entrySet()) {
final int myCount = MyUtility.getCount(entry.getValue());
if (myCount <= 0)
continue;
list.add(new MyObj(entry.getKey(), myCount));
}
if (list.isEmpty())
return;
myObjList = list.toArray(new MyObj[list.size()]);
}
Which can be rewrite into the following
Map<String Set<MyOtherObj>> myMap = myapi.getMyMap();
final List<MyObj> list = new ArrayList<>(myMap.size());
for (Entry<String, Set<MyOtherObj>> entry : myMap.entrySet()) {
final int myCount = MyUtility.getCount(entry.getValue());
if (myCount <= 0)
continue;
list.add(new MyObj(entry.getKey(), myCount));
}
if (list.isEmpty())
return;
The only reason I can think of why we put the ArrayList in a block and then reassign the content to an array is
The size of ArrayList is bigger than the size of list, so reassigning ArrayList to array save space
There is some sort of compiler magic or gc magic that deallocates and reclaim the memory use by ArrayList immediately after the block scope ends (eg. like rust), otherwise we are now sitting on up to 2 times amount of space until gc kicks in.
So my question is, does the first code sample make sense, is it more efficient?
This code currently executes 20k message per second.
As stated in this answer:
Scope is a language concept that determines the validity of names. Whether an object can be garbage collected (and therefore finalized) depends on whether it is reachable.
So, no, the scope is not relevant to garbage collection, but for maintainable code, it’s recommended to limit the names to the smallest scope needed for their purpose. This, however, does not apply to your scenario, where a new name is introduced to represent the same thing that apparently still is needed.
You suggested the possible motivation
The size of ArrayList is bigger than the size of list, so reassigning ArrayList to array save space
but you can achieve the same when declaring the variable list as ArrayList<MyObj> rather than List<MyObj> and call trimToSize() on it after populating it.
There’s another possible reason, the idea that subsequently using a plain array was more efficient than using the array encapsulated in an ArrayList. But, of course, the differences between these constructs, if any, rarely matter.
Speaking of esoteric optimizations, specifying an initial array size when calling toArray was believed to be an advantage, until someone measured and analyzed, to find that, i.e. myObjList = list.toArray(new MyObj[0]); would be actually more efficient in real life.
Anyway, we can’t look into the author’s mind, which is the reason why any deviation from straight-forward code should be documented.
Your alternative suggestion:
There is some sort of compiler magic or gc magic that deallocates and reclaim the memory use by ArrayList immediately after the block scope ends (eg. like rust), otherwise we are now sitting on up to 2 times amount of space until gc kicks in.
is missing the point. Any space optimization in Java is about minimizing the amount of memory occupied by objects still alive. It doesn’t matter whether unreachable objects have been identified as such, it’s already sufficient that they are unreachable, hence, potentially reclaimable. The garbage collector will run when there is an actual need for memory, i.e. to serve a new allocation request. Until then, it doesn’t matter whether the unused memory contains old objects or not.
So the code may be motivated by a space saving attempt and in that regard, it’s valid, even without an immediate freeing. As said, you could achieve the same in a simpler fashion by just calling trimToSize() on the ArrayList. But note that if the capacity does not happen to match the size, trimToSize()’s shrinking of the array doesn’t work differently behind the scenes, it implies creating a new array and letting the old one become subject to garbage collection.
But the fact that there’s no immediate freeing and there’s rarely a need for immediate freeing should allow the conclusion that space saving attempts like this would only matter in practice, when the resulting object is supposed to persist a very long time. When the lifetime of the copy is shorter than the time to the next garbage collection, it didn’t save anything and all that remains, is the unnecessary creation of a copy. Since we can’t predict the time to the next garbage collection, we can only make a rough categorization of the object’s expected lifetime (long or not so long)…
The general approach is to assume that in most cases, the higher capacity of an ArrayList is not a problem and the performance gain matters more. That’s why this class maintains a higher capacity in the first place.
No, it is done for the same reason as empty lines are added to the code.
The variables in the block are scoped to that block, and can no longer be used after the block. So one does not need to pay attention to those block variables.
So this is more readable:
A a;
{ B b; C c; ... }
...
Than:
A a;
B b;
C c;
...
...
It is an attempt to structure the code more readable. For instance above one can read "a declaration of A a; and then a block probably filling a.
Life time analysis in the JVM is fine. Just as there is absolutely no need to set variables to null at the end of their usage.
Sometimes blocks are also abused to repeat blocks with same local variables:
A a1;
{ B b; C c; ... a1 ... }
A a2;
{ B b; C c; ... a2 ... }
A a3;
{ B b; C c; ... a3 ... }
Needless to say that this is the opposite of making code better style.

How to garbage collect arrays of objects?

I'm working in huge program in java and now I'm trying to avoid loitering to improve it's memory usage, I instantiate some objects in the constructor, and keep instantiated till the end of the program but they are not always used. My question is specificly about garbage collecting arrays of Objects.
For example when the user presses a menu item a JDialog is invoked with lots of components in it, these components were instantiated at the moment that the program runs, but i want to instantiate them when necessary and free them when not.
For example:
JRadioButton Options = new JRadioButton[20];
for (int i = 0; i < 20; i++) {
Options[i] = new JRadioButton(Labels[i]);
}
If i want to free the arrays, what shoud i do?
This:
for (int i = 0; i < 20; i++) {
Options[i] = null;
Labels[i] = null;
}
Or simply:
Options = null;
Labels = null;
Thanks in advance
First, a Java object will be garbage collected only if it is not reachable (and it might have other references than your array). Then GC runs at nearly unpredictable times (so the memory might be freed much later).
Clearing the array's elements won't release the whole array, but could release each element (provided it becomes unreachable).
setting a variable to null might release the array (and of course all the elements).
But for a so small program, perhaps GC is never happening.
Read at least GC on wikipedia, and perhaps the GC handbook
Notice that the aliveness of some object is a whole program property (actually a whole process property: liveness of values is relevant in a particular execution, not in your source code). In other words, you could do Options = null; and still have the object in Options[24] reachable by some other reference path.
If Options holds the only reference to the array, either works to make the objects unreachable and release the objects to the garbage collector.
If something else is still referencing the array, it won't be released anyway, so the first option is the only one that will release the contents. Note that the first option will only release the contents, Options will still reference the actual Array unless you also set Options to null.
Doing
Options = null;
Labels = null;
should be enough to release those objects. There is no need to null the elements unless there is another reference to the array. However when there are other references to the array I do not think it is wise to null the elements. The other references are there for a reason. When they no longer need the array and its contents they should release their references.
Both will do but first one is recommended and then do second one.
Here is the source code from ArrayList clear() method
// Let gc do its work
for (int i = 0; i < size; i++)
elementData[i] = null;
Another way to do same thing is
Arrays.fill(Options, null);
It does not do any thing different iterates and sets array elements to null.

Short-lived vs Long-lived for static data

I understand how the generational garbage collection HotSpot JVM uses works. We have a method that returns a list of static data, along the lines of
public List<String> getList() {
List<String> ret = new ArrayList<String>();
ret.add("foo");
ret.add("bar");
return ret;
}
I was considering rewriting this as
private static List<String> list = null;
...
public List<String> getList() {
if (list == null) { .. initialize here .. }
return list;
}
this would only create the list once. This single list instance would eventually make its way into the tenured generation. Since this is a rather large app, using this design pattern in many places would mean there are a lot of these static lists in the tenured generation - increasing memory usage of the app.
If we follow the pattern of creating and returning a new list every time, said list would never make it out of eden before being garbage collected. There would be a bit more work involved - having to create and fill the list, and work in garbage collecting - but we would use less memory overall since the list doesn't last long.
This question is more academic in nature, as either pattern will work. Storing a static list will increase memory usage by an insignificant amount, and creating the list every time will increase workload by an insignificant amount. Which pattern to use probably depends on a number of factors - how often the list is used, how much memory pressure the app is under, etc. Which pattern would you guys target, andy why?
Storing a static list will increase memory usage by an insignificant amount
It will use more space when there is no real need for it. It will be the same if there is one usage of it. But if you have more than one copy, the static version is more efficient.
Often it's not the best case which will kill you, it's the worst case. In the worst case there will be large numbers ?millions? of copies of the collection.
As Peter Lawrey points out, in the non-static case, sometimes you will be using a lot more memory than necessary while the GC is waiting to do its thing.
I would concentrate on what's more readable. To me, Guava's ImmutableList.of in a static final field says immediately this data isn't going to change.
(Or Collections.unmodifiableList(Arrays.asList(...)) if you don't want Guava)

vector and garbage collector

I'm running a java program that uses many vectors. I'm afraid my use of them is causing the garbage collector not to work.
I have many threads that do:
vec.addAll(<collection>);
and other threads that do:
vec.remove(0);
I have printouts that show the vector is empty from time to time but I was wondering if the memory is actually freed.
Do I need to worry?
If the objects in your Vectors are not referenced anywhere (by the Vector or by any other code), then they will get collected at the garbage collector's discretion. 99.999% of the time the garbage collector won't need your help with this.
However, even after the garbage collector frees the objects, it may not give heap memory back to the operating system, so your process may appear to hold more memory than it should.
Additionally, I'm very not familiar with the implementation of the Vector class (as others have pointed out, you should really be using ArrayList instead), but when you call .remove() I don't think the underlying array is ever resized downward. So if you stuff several thousand objects into a Vector and then delete them all, it will probably still have several thousand bytes of empty array allocated. The solution in this case is to call vector.trimToSize().
The memory will eventually be freed if there are no references to the objects in question. Seeing your vectors become empty indicates that at least they are not holding on to references. If there is nothing else with references to those objects, they will be cleaned up, but only when the garbage collector chooses to do so (which is always before you run out of memory).
(Caveat: Obviously calling remove(0) will only remove the first element from the Vector, not multiple elements.)
Assuming your Vector is empty then you do not need to worry about garbage collection if the objects in your vector are not being referenced elsewhere. However, if there are still other references to the objects then there is no way they can be garbage collected.
To verify this, I'd recommend running a profiler (e.g. JProfiler) and periodically "snapping" the object count for the type of object being stored in your Vector, and then monitor this count to see if it increased over time.
One other piece of advice: Vector is obsolete; You should consider using LinkedList or ArrayList instead, which are thread-unsafe equivalents. If you wish to make them thread-safe you should initialise them using Collections.synchronizedList(new ArrayList());
No, you don't. If the vector is empty it is not referencing objects you once put in there. If you want to know what is holding on to memory, you can get a profiler and look at what is consuming the memory.
No.
Just trust the implementors of the standard library. They probably have done a good job.
If you really worry about memory leaks, call this from time to time :
System.out.println("Total Memory"+Runtime.getRuntime().totalMemory());
System.out.println("Free Memory"+Runtime.getRuntime().freeMemory());
The implementation of java.util.Vector.remove(int) from JDK 1.6 is:
public synchronized E remove(int index) {
modCount++;
if (index >= elementCount)
throw new ArrayIndexOutOfBoundsException(index);
Object oldValue = elementData[index];
int numMoved = elementCount - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index, numMoved);
elementData[--elementCount] = null; // Let gc do its work
return (E)oldValue;
}
As you can see, if elements are removed from the vector, the vector will not keep a reference to to them, and hence not impede their garbage collection.
However, the vector itself (and the potentially large array backing it) might not be reclaimed.

Categories

Resources