Short-lived vs Long-lived for static data - java

I understand how the generational garbage collection HotSpot JVM uses works. We have a method that returns a list of static data, along the lines of
public List<String> getList() {
List<String> ret = new ArrayList<String>();
ret.add("foo");
ret.add("bar");
return ret;
}
I was considering rewriting this as
private static List<String> list = null;
...
public List<String> getList() {
if (list == null) { .. initialize here .. }
return list;
}
this would only create the list once. This single list instance would eventually make its way into the tenured generation. Since this is a rather large app, using this design pattern in many places would mean there are a lot of these static lists in the tenured generation - increasing memory usage of the app.
If we follow the pattern of creating and returning a new list every time, said list would never make it out of eden before being garbage collected. There would be a bit more work involved - having to create and fill the list, and work in garbage collecting - but we would use less memory overall since the list doesn't last long.
This question is more academic in nature, as either pattern will work. Storing a static list will increase memory usage by an insignificant amount, and creating the list every time will increase workload by an insignificant amount. Which pattern to use probably depends on a number of factors - how often the list is used, how much memory pressure the app is under, etc. Which pattern would you guys target, andy why?

Storing a static list will increase memory usage by an insignificant amount
It will use more space when there is no real need for it. It will be the same if there is one usage of it. But if you have more than one copy, the static version is more efficient.
Often it's not the best case which will kill you, it's the worst case. In the worst case there will be large numbers ?millions? of copies of the collection.

As Peter Lawrey points out, in the non-static case, sometimes you will be using a lot more memory than necessary while the GC is waiting to do its thing.
I would concentrate on what's more readable. To me, Guava's ImmutableList.of in a static final field says immediately this data isn't going to change.
(Or Collections.unmodifiableList(Arrays.asList(...)) if you don't want Guava)

Related

Java: Does clear() the big size list help in quick garbage collection?

Load 1.5 Million Records from Database 1
Load 1.5 Million Records from Database 2
List<DannDB> dDb = fromNamedQuery(); //return em.createNamedQuery("").getResultList();
List<LannDB> lDb = fromNamedQuery();
Compare its data.
Update/persist into Database (Using JPA)
and program ends after two hours.
Same iteration happens every third hour and many a times give Out of Memory.
Does following statement work, does object becomes out of scope with this?
dDb.clear();
or
dDb = null
or what else I can do?
Assuming that your goal is to reduce the occurrence of OOMEs over all other considerations ...
Assigning null to the List object will make the entire list eligible for garbage collection. Calling clear() will have a similar effect, though it will depend on the List implementation. (For example, calling clear() on an ArrayList doesn't release the backing array. It just nulls the array cells.)
If you can recycle an ArrayList for a list of roughly the same size as the original, you can avoid the garbage while growing the list. (But we don't know this is an ArrayList!)
Another factor in your use-case is that:
List<DannDB> dDb = fromNamedQuery();
is (presumably) going to create a new list anyway. That would render a clear() pointless. (Just assign null to dDb, or let the variable go out of scope or be reassigned the new list.)
A final issue is that it is conceivable that the list is finalizable. That could mean that the list object takes longer to delete.
Overall, I can't say which of assigning null and calling clear() will be better for the memory footprint. Or that either of these will make a significant difference. But there is no reason why you can't try both alternatives, and observe what happens.
The only other things I can suggest are:
Increase the heap size (and the RAM footprint).
Change the application so that you don't need to hold entire database snapshots in memory. Depending on the nature of the comparison, you could do it in "chunks" or by streaming the records1.
The last one is the only solution that is scalable; i.e. that will work with an ever larger number of records. (Modulo the time taken to deal with more records.)
Running System.gc() is unlikely to help. And since the real problem is that you are getting OOMEs, anything that tries to get the JVM to shrink the heap by giving memory back to the OS is counterproductive.
1 - Those of you are old enough will remember the classic way of implementing a payroll system with magnetic tape storage. If you can select from the two data sources in the same key order, you may be able to use the classic approach to compare them. For example, reading two resultsets in parallel.
In the case of SQL, you can get your two ResultSets and compare their data iteratively. This way, you don't have ro save all your data in first place.
I assume that your data looks like this for demonstration purposes:
String email1
String email2
int someInt
abc#def.ghi
jkl#mno.pqr
1234567
xyz#gmail.com
8901234
To detect a difference between two ResultSets of this database:
boolean equals(ResultSet a, ResultSet b) {
while(a.next() && b.next()) {
String aEmail1 = a.getString(1);
String bEmail1 = b.getString(1);
if(!aEmail1.equals(bEmail1)) return false;
String aEmail2 = a.getString(2);
String bEmail2 = b.getString(2);
if(!aEmail2.equals(bEmail2)) return false;
int aSomeInt = a.getInt(3);
int bSomeInt = b.getInt(3);
if(aSomeInt!=bSomeInt) return false;
if(a.isLast()!=b.isLast())
throw new IllegalArgumentException(
"ResultSets have different amounts of rows!"
);
}
return true;
}
To set the contents of ResultSet oldData (also its corresponding database connection) to ResultSet newData:
void updateA(ResultSet oldData, ResultSet newData) {
while(oldData.next() && newData.next()) {
String newEmail1 = newData.getString(1);
oldData.updateString(1,newEmail1);
String newEmail2 = newData.getString(2);
oldData.updateString(2,newEmail2);
int newSomeInt = newData.getInt(3);
oldData.updateInt(3,newSomeInt);
if(oldData.isLast()!=newData.isLast())
throw new IllegalArgumentException(
"ResultSets have different amounts of rows!"
);
}
}
You can ofcourse leave out the if(a.isLast()!=newData.isLast)) ... and if(oldData.isLast()!=newData.isLast()) ... if you don't care that the two sets don't have the same amount of rows.
The thing is that, by default, once allocated heap memory size does not shrink (I mean the memory size allocated from the operating system). If your Java application at one time needed 2 GB of RAM it will keep that reserved from the operating system by default.
If you can, try to change the design of your application to not firstly load all data into memory, but only load what you really need to do your work.
If you really need the two big batches at the same time think about using the following Java command line argument: "-XX:+UseAdaptiveSizePolicy", which would make it possible to shrink the heap space after big memory usages.
You can also call the garbage collector via "System.gc();", but that a) does not shrink the allocated heap memory without the suggested command line argument, and b) really, you should not think about this. Java will run it on it's own by time.
Edit: Improved my first explanation a bit.
Best for memory usage would be for the list to not go out of scope. So it would be better (memory wise) to just modify the content one by one, keeping only one temporary entry object instead of a whole other list.
So you could create a getNextFromNamedQuery() and hasNextInNamedQuery() method and set the data at the current index.
e.g.:
int i=0;
while(hasNextInNamedQuery()) {
if(dDb.size()<=i) dDb.add(getNextFromQuery());
else dDb.set(i,getNextFromQuery());
i++;
}

Does java compiler insert free when pointer is allocated and go out of scope in a block?

I am scratching my head trying to understand the point of the following code
Map<String Set<MyOtherObj>> myMap = myapi.getMyMap();
final MyObj[] myObjList;
{
final List<MyObj> list = new ArrayList<>(myMap.size());
for (Entry<String, Set<MyOtherObj>> entry : myMap.entrySet()) {
final int myCount = MyUtility.getCount(entry.getValue());
if (myCount <= 0)
continue;
list.add(new MyObj(entry.getKey(), myCount));
}
if (list.isEmpty())
return;
myObjList = list.toArray(new MyObj[list.size()]);
}
Which can be rewrite into the following
Map<String Set<MyOtherObj>> myMap = myapi.getMyMap();
final List<MyObj> list = new ArrayList<>(myMap.size());
for (Entry<String, Set<MyOtherObj>> entry : myMap.entrySet()) {
final int myCount = MyUtility.getCount(entry.getValue());
if (myCount <= 0)
continue;
list.add(new MyObj(entry.getKey(), myCount));
}
if (list.isEmpty())
return;
The only reason I can think of why we put the ArrayList in a block and then reassign the content to an array is
The size of ArrayList is bigger than the size of list, so reassigning ArrayList to array save space
There is some sort of compiler magic or gc magic that deallocates and reclaim the memory use by ArrayList immediately after the block scope ends (eg. like rust), otherwise we are now sitting on up to 2 times amount of space until gc kicks in.
So my question is, does the first code sample make sense, is it more efficient?
This code currently executes 20k message per second.
As stated in this answer:
Scope is a language concept that determines the validity of names. Whether an object can be garbage collected (and therefore finalized) depends on whether it is reachable.
So, no, the scope is not relevant to garbage collection, but for maintainable code, it’s recommended to limit the names to the smallest scope needed for their purpose. This, however, does not apply to your scenario, where a new name is introduced to represent the same thing that apparently still is needed.
You suggested the possible motivation
The size of ArrayList is bigger than the size of list, so reassigning ArrayList to array save space
but you can achieve the same when declaring the variable list as ArrayList<MyObj> rather than List<MyObj> and call trimToSize() on it after populating it.
There’s another possible reason, the idea that subsequently using a plain array was more efficient than using the array encapsulated in an ArrayList. But, of course, the differences between these constructs, if any, rarely matter.
Speaking of esoteric optimizations, specifying an initial array size when calling toArray was believed to be an advantage, until someone measured and analyzed, to find that, i.e. myObjList = list.toArray(new MyObj[0]); would be actually more efficient in real life.
Anyway, we can’t look into the author’s mind, which is the reason why any deviation from straight-forward code should be documented.
Your alternative suggestion:
There is some sort of compiler magic or gc magic that deallocates and reclaim the memory use by ArrayList immediately after the block scope ends (eg. like rust), otherwise we are now sitting on up to 2 times amount of space until gc kicks in.
is missing the point. Any space optimization in Java is about minimizing the amount of memory occupied by objects still alive. It doesn’t matter whether unreachable objects have been identified as such, it’s already sufficient that they are unreachable, hence, potentially reclaimable. The garbage collector will run when there is an actual need for memory, i.e. to serve a new allocation request. Until then, it doesn’t matter whether the unused memory contains old objects or not.
So the code may be motivated by a space saving attempt and in that regard, it’s valid, even without an immediate freeing. As said, you could achieve the same in a simpler fashion by just calling trimToSize() on the ArrayList. But note that if the capacity does not happen to match the size, trimToSize()’s shrinking of the array doesn’t work differently behind the scenes, it implies creating a new array and letting the old one become subject to garbage collection.
But the fact that there’s no immediate freeing and there’s rarely a need for immediate freeing should allow the conclusion that space saving attempts like this would only matter in practice, when the resulting object is supposed to persist a very long time. When the lifetime of the copy is shorter than the time to the next garbage collection, it didn’t save anything and all that remains, is the unnecessary creation of a copy. Since we can’t predict the time to the next garbage collection, we can only make a rough categorization of the object’s expected lifetime (long or not so long)…
The general approach is to assume that in most cases, the higher capacity of an ArrayList is not a problem and the performance gain matters more. That’s why this class maintains a higher capacity in the first place.
No, it is done for the same reason as empty lines are added to the code.
The variables in the block are scoped to that block, and can no longer be used after the block. So one does not need to pay attention to those block variables.
So this is more readable:
A a;
{ B b; C c; ... }
...
Than:
A a;
B b;
C c;
...
...
It is an attempt to structure the code more readable. For instance above one can read "a declaration of A a; and then a block probably filling a.
Life time analysis in the JVM is fine. Just as there is absolutely no need to set variables to null at the end of their usage.
Sometimes blocks are also abused to repeat blocks with same local variables:
A a1;
{ B b; C c; ... a1 ... }
A a2;
{ B b; C c; ... a2 ... }
A a3;
{ B b; C c; ... a3 ... }
Needless to say that this is the opposite of making code better style.

When will the new String() object in memory gets cleared after invoking intern() method

List<String> list = new ArrayList<>();
for (int i = 0; i < 1000; i++)
{
StringBuilder sb = new StringBuilder();
String string = sb.toString();
string = string.intern()
list.add(string);
}
In the above sample, after invoking string.intern() method, when will the 1000 objects created in heap (sb.toString) be cleared?
Edit 1:
If there is no guarantee that these objects could be cleared. Assuming that GC haven't run, is it obsolete to use string.intern() itself? (In terms of the memory usage?)
Is there any way to reduce memory usage / object creation while using intern() method?
Your example is a bit odd, as it creates 1000 empty strings. If you want to get such a list with consuming minimum memory, you should use
List<String> list = Collections.nCopies(1000, "");
instead.
If we assume that there is something more sophisticated going on, not creating the same string in every iteration, well, then there is no benefit in calling intern(). What will happen, is implemen­tation dependent. But when calling intern() on a string that is not in the pool, it will be just added to the pool in the best case, but in the worst case, another copy will be made and added to the pool.
At this point, we have no savings yet, but potentially created additional garbage.
Interning at this point can only save you some memory, if there are duplicates somewhere. This implies that you construct duplicate strings first, to look up their canonical instance via intern() afterwards, so having the duplicate string in memory until garbage collected, is unavoidable. But that’s not the real problem with interning:
in older JVMs, there was special treatment of interned string that could result in worse garbage collection performance or even running out of resources (i.e. the fixed size “PermGen” space).
in HotSpot, the string pool holding the interned strings is a fixed size hash table, yielding hash collisions, hence, poor performance, when referencing significantly more strings than the table size.
Before Java 7, update 40, the default size was about 1,000, not even sufficient to hold all string constants for any nontrivial application without hash collisions, not to speak of manually added strings. Later versions use a default size of about 60,000, which is better, but still a fixed size that should discourage you from adding an arbitrary number of strings
the string pool has to obey inter-thread semantics mandated by the language specification (as it is used to for string literals), hence, need to perform thread safe updates that can degrade the performance
Keep in mind that you pay the price of the disadvantages named above, even in the cases that there are no duplicates, i.e. there is no space saving. Also, the acquired reference to the canonical string has to have a much longer lifetime than the temporary object used to look it up, to have any positive effect on the memory consumption.
The latter touches your literal question. The temporary instances are reclaimed when the garbage collector runs the next time, which will be when the memory is actually needed. There is no need to worry about when this will happen, but well, yes, up to that point, acquiring a canonical reference had no positive effect, not only because the memory hasn’t been reused up to that point, but also, because the memory was not actually needed until then.
This is the place to mention the new String Deduplication feature. This does not change string instances, i.e. the identity of these objects, as that would change the semantic of the program, but change identical strings to use the same char[] array. Since these character arrays are the biggest payload, this still may achieve great memory savings, without the performance disadvan­tages of using intern(). Since this deduplication is done by the garbage collector, it will only applied to strings that survived long enough to make a difference. Also, this implies that it will not waste CPU cycles when there still is plenty of free memory.
However, there might be cases, where manual canonicalization might be justified. Imagine, we’re parsing a source code file or XML file, or importing strings from an external source (Reader or data base) where such canonicalization will not happen by default, but duplicates may occur with a certain likelihood. If we plan to keep the data for further processing for a longer time, we might want to get rid of duplicate string instances.
In this case, one of the best approaches is to use a local map, not being subject to thread synchronization, dropping it after the process, to avoid keeping references longer than necessary, without having to use special interaction with the garbage collector. This implies that occurrences of the same strings within different data sources are not canonicalized (but still being subject to the JVM’s String Deduplication), but it’s a reasonable trade-off. By using an ordinary resizable HashMap, we also do not have the issues of the fixed intern table.
E.g.
static List<String> parse(CharSequence input) {
List<String> result = new ArrayList<>();
Matcher m = TOKEN_PATTERN.matcher(input);
CharBuffer cb = CharBuffer.wrap(input);
HashMap<CharSequence,String> cache = new HashMap<>();
while(m.find()) {
result.add(
cache.computeIfAbsent(cb.subSequence(m.start(), m.end()), Object::toString));
}
return result;
}
Note the use of the CharBuffer here: it wraps the input sequence and its subSequence method returns another wrapper with different start and end index, implementing the right equals and hashCode method for our HashMap, and computeIfAbsent will only invoke the toString method, if the key was not present in the map before. So, unlike using intern(), no String instance will be created for already encountered strings, saving the most expensive aspect of it, the copying of the character arrays.
If we have a really high likelihood of duplicates, we may even save the creation of wrapper instances:
static List<String> parse(CharSequence input) {
List<String> result = new ArrayList<>();
Matcher m = TOKEN_PATTERN.matcher(input);
CharBuffer cb = CharBuffer.wrap(input);
HashMap<CharSequence,String> cache = new HashMap<>();
while(m.find()) {
cb.limit(m.end()).position(m.start());
String s = cache.get(cb);
if(s == null) {
s = cb.toString();
cache.put(CharBuffer.wrap(s), s);
}
result.add(s);
}
return result;
}
This creates only one wrapper per unique string, but also has to perform one additional hash lookup for each unique string when putting. Since the creation of a wrapper is quiet cheap, you really need a significantly large number of duplicate strings, i.e. small number of unique strings compared to the total number, to have a benefit from this trade-off.
As said, these approaches are very efficient, because they use a purely local cache that is just dropped afterwards. With this, we don’t have to deal with thread safety nor interact with the JVM or garbage collector in a special way.
You can open JMC and check for GC under Memory tab inside MBean Server of the particular JVM when it performed and how much did it cleared. Still, there is no fixed guarantee of the time when it would be called. You can initiate GC under Diagnostic Commands on a specific JVM.
Hope it helps.

Java optimization to prevent heapspace out of memory

Ok, I have a problem in a particular situation that my program get the out of memory error from heap space.
Let's assume we have two ArrayList, the first one contains many T objects, the second one contains W object that are created from the T objects of first List.
And we cycle through it in this way (after the cycle the list :
public void funct(ArrayList<T> list)
{
ArrayList<W> list2 = new ArrayList<W>();
for (int i = 0 ; i < list.size() ; i++)
{
W temp = new W();
temp.set(list.get(i));
temp.saveToDB();
list2.add(temp);
}
// more code! from this point on the `list` is useless
}
My code is pretty similar to this one, but when list contains tons of objects I often get the heap space out of memory (during the for cycle), I'd like to solve this problem.
I do not know very well how the GC works in java, but surely in the previous example there are a lot of possible optimization.
Since the list is not used anymore after the for cycle I thought as first optimization to change from for loop to do loop and empty the list as we cycle through it:
public void funct(ArrayList<T> list)
{
ArrayList<W> list2 = new ArrayList<W>();
while (list.size() > 0)
{
W temp = new W();
temp.set(list.remove(0));
temp.saveToDB();
list2.add(temp);
}
// more code! from this point on the `list` is useless
}
Is this modification useful?
How can I do a better optimization to the above code? and how can I prevent heap space out of memory error? (increasing the XMX and XMS value is not a possibility).
You can try to set the -XX:MaxNewSize=40% of you Xmx AND -XX:NewSize=40% of you Xmx
This params will speedup the GC calls, because your creation rate is high.
For more help : check here
It really depends on many things. How big are the W and T objects?
One optimization you could surely do is ArrayList list2 = new ArrayList(list.size());
This way your listarray does not need to adjust its size many times.
That will not do much difference tough. The real problem is probably the size and number of your W and T objects. Have you thought of using different data structures to manage a smaller portion of objects at time?
If you did some memory profiling, you would have discovered that the largest source of heap exhaustion are the W instances, which you retain by adding them to list2. The ArrayList itself adds a very small overhead per object contained (just 4 bytes if properly pre-sized, worst case 8 bytes), so even if you retain list, this cannot matter much.
You will not be able to lessen the heap pressure without changing your approach towards the non-retention of each and every W instance you have created in your loop.
You continue to reference all the items from the original list :
temp.set(list.get(i)) // you probably store somewhere the passed reference
If the T object has a big size and you don't need all of its fields, try to use a projection of it.
temp.set( extractWhatINeed( list.get(i) ) )
This will involve creating a new class with fewer fields than T (the return type of the extract method).
Now, when you don't reference the original items, they are eligible for GC (when the list itself will not be referenced anymore).

Java Collection#clear reclaim memory

With the following function:
Collection#clear
how can I attempt to reclaim memory that could be freed from an invocation? Code sample:
public class Foo
{
private static Collection<Bar> bars;
public static void main(String[] args){
bars = new ArrayList<Bar>();
for(int i = 0; i < 100000;i++)
{
bars.add(new Bar());
}
bars.clear();
//how to get memory back here
}
}
EDIT
What I am looking for is similar to how ArrayList.remove reclaims memory by copying the new smaller array.
It is more efficient to only reclaim memory when you need to. In this case it is much simpler/faster to let the GC do it asynchronous when there is a need to do. You can give the JVM a hint using System.gc() but this is likely to be slower and complicate your program.
how ArrayList.remove reclaims memory by copying the new smaller array.
It doesn't do this. It never shrinks the array, nor would you need to.
If you really need to make the collection smaller, which I seriously doubt, you can create a new ArrayList which has a copy of the elements you want to keep.
bars= null ;
would be the best. clear doesn't guarantee to release any memory, only to reset the logical contents to "empty".
In fact, bars= null ; doesn't guarantee that memory will be immediately released. However, it would make the object previously pointed by bars and all its dependents "ready for garbage collection" ("finalization", really, but let's keep this simple). If the JVM finds itself needing memory, it will collect these objects (other simplification here: this depends on the exact garbage collection algorithm the JVM is configured to use).
You can't.
At some point after there are no more references to the objects, the GC will collect them for you.
EDIT: To force the ArrayList to release its reference to the giant empty array, call trimToSize()
You can't force memory reclamation, that will happen when garbage collection occurs.
If you use clear() you will clear the references to objects that were contained in the collection. If there are no other references to those objects, then they will be reclaimed next time GC is run.
The collection itself (which just contains references, not the objects referred to), will not be resized. The only way to get back the storage used by the collection is to set the reference bars to null so it will eventually be reclaimed.

Categories

Resources