Recently I have come across an article about memory optimization in Android, but I think my question is more of a general Java type. I couldn't find any information on this, so I will be grateful if you could point me to a good resource to read.
The article I'm talking about can be found here.
My question relates to the following two snippets:
Non-optimal version:
List<Chunk> mTempChunks = new ArrayList<Chunk>();
for (int i = 0; i<10000; i++){
mTempChunks.add(new Chunk(i));
}
for (int i = 0; i<mTempChunks.size(); i++){
Chunk c = mTempChunks.get(i);
Log.d(TAG,"Chunk data: " + c.getValue());
}
Optimized version:
Chunk c;
int length = mTempChunks.size();
for (int i = 0; i<length; i++){
c = mTempChunks.get(i);
Log.d(TAG,"Chunk data: " + c.getValue());
}
The article also contains the following lines (related to the first snippet):
In the second loop of the code snippet above, we are creating a new chunk object for each iteration of the loop. So it will essentially create 10,000 objects of type ‘Chunk’ and occupy a lot of memory.
What I'm striving to understand is why a new object creation is mentioned, since I can only see creation of a reference to an already existing object on the heap. I know that a reference itself costs 4-8 bytes depending on a system, but they go out of scope very quickly in this case, and apart from this I don't see any additional overhead.
Maybe it's the creation of a reference to an existing object that is considered expensive when numerous?
Please tell me what I'm missing out here, and what is the real difference between the two snippets in terms of memory consumption.
Thank you.
There are two differences:
Non-optimal:
i < mTempChunks.size()
Chunk c = mTempChunks.get(i);
Optimal:
i < length
c = mTempChunks.get(i);
In the non-optimal code, the size() method is called for each iteration of the loop, and a new reference to a Chunk object is created. In the optimal code, the overhead of repeatedly calling size() is avoided, and the same reference is recycled.
However, the author of that article seems to be wrong in suggesting that 10000 temporary objects are created in the second non-optimal loop. Certainly, 10000 temp objects are created, but in the first, not the second loop, and there's no way to avoid that. In the second non-optimal loop, 10000 references are created. So in a way it is less than optimal, although the author mistakes the trees for the forest.
Further References:
1. Avoid Creating Unnecessary Objects.
2. Use Enhanced For Loop Syntax.
EDIT:
I have been accused of being a charlatan. For those who say that calling size() has no overhead, I can do no better than quoting the official docs:
3. Avoid Internal Getters/Setters.
EDIT 2:
In my answer, I initially made the mistake of saying that memory for references is allocated at compile-time on the stack. I realize now that that statement is wrong; that's actually the way things work in C++, not Java. The world of Java is C++ upside down: while memory for references is indeed allocated on the stack, in Java even that happens at runtime. Mind blown!
References:
1. Runtime vs compile time memory allocation in java.
2. Where is allocated variable reference, in stack or in the heap?.
3. The Structure of the Java Virtual Machine - Frames.
Related
I am scratching my head trying to understand the point of the following code
Map<String Set<MyOtherObj>> myMap = myapi.getMyMap();
final MyObj[] myObjList;
{
final List<MyObj> list = new ArrayList<>(myMap.size());
for (Entry<String, Set<MyOtherObj>> entry : myMap.entrySet()) {
final int myCount = MyUtility.getCount(entry.getValue());
if (myCount <= 0)
continue;
list.add(new MyObj(entry.getKey(), myCount));
}
if (list.isEmpty())
return;
myObjList = list.toArray(new MyObj[list.size()]);
}
Which can be rewrite into the following
Map<String Set<MyOtherObj>> myMap = myapi.getMyMap();
final List<MyObj> list = new ArrayList<>(myMap.size());
for (Entry<String, Set<MyOtherObj>> entry : myMap.entrySet()) {
final int myCount = MyUtility.getCount(entry.getValue());
if (myCount <= 0)
continue;
list.add(new MyObj(entry.getKey(), myCount));
}
if (list.isEmpty())
return;
The only reason I can think of why we put the ArrayList in a block and then reassign the content to an array is
The size of ArrayList is bigger than the size of list, so reassigning ArrayList to array save space
There is some sort of compiler magic or gc magic that deallocates and reclaim the memory use by ArrayList immediately after the block scope ends (eg. like rust), otherwise we are now sitting on up to 2 times amount of space until gc kicks in.
So my question is, does the first code sample make sense, is it more efficient?
This code currently executes 20k message per second.
As stated in this answer:
Scope is a language concept that determines the validity of names. Whether an object can be garbage collected (and therefore finalized) depends on whether it is reachable.
So, no, the scope is not relevant to garbage collection, but for maintainable code, it’s recommended to limit the names to the smallest scope needed for their purpose. This, however, does not apply to your scenario, where a new name is introduced to represent the same thing that apparently still is needed.
You suggested the possible motivation
The size of ArrayList is bigger than the size of list, so reassigning ArrayList to array save space
but you can achieve the same when declaring the variable list as ArrayList<MyObj> rather than List<MyObj> and call trimToSize() on it after populating it.
There’s another possible reason, the idea that subsequently using a plain array was more efficient than using the array encapsulated in an ArrayList. But, of course, the differences between these constructs, if any, rarely matter.
Speaking of esoteric optimizations, specifying an initial array size when calling toArray was believed to be an advantage, until someone measured and analyzed, to find that, i.e. myObjList = list.toArray(new MyObj[0]); would be actually more efficient in real life.
Anyway, we can’t look into the author’s mind, which is the reason why any deviation from straight-forward code should be documented.
Your alternative suggestion:
There is some sort of compiler magic or gc magic that deallocates and reclaim the memory use by ArrayList immediately after the block scope ends (eg. like rust), otherwise we are now sitting on up to 2 times amount of space until gc kicks in.
is missing the point. Any space optimization in Java is about minimizing the amount of memory occupied by objects still alive. It doesn’t matter whether unreachable objects have been identified as such, it’s already sufficient that they are unreachable, hence, potentially reclaimable. The garbage collector will run when there is an actual need for memory, i.e. to serve a new allocation request. Until then, it doesn’t matter whether the unused memory contains old objects or not.
So the code may be motivated by a space saving attempt and in that regard, it’s valid, even without an immediate freeing. As said, you could achieve the same in a simpler fashion by just calling trimToSize() on the ArrayList. But note that if the capacity does not happen to match the size, trimToSize()’s shrinking of the array doesn’t work differently behind the scenes, it implies creating a new array and letting the old one become subject to garbage collection.
But the fact that there’s no immediate freeing and there’s rarely a need for immediate freeing should allow the conclusion that space saving attempts like this would only matter in practice, when the resulting object is supposed to persist a very long time. When the lifetime of the copy is shorter than the time to the next garbage collection, it didn’t save anything and all that remains, is the unnecessary creation of a copy. Since we can’t predict the time to the next garbage collection, we can only make a rough categorization of the object’s expected lifetime (long or not so long)…
The general approach is to assume that in most cases, the higher capacity of an ArrayList is not a problem and the performance gain matters more. That’s why this class maintains a higher capacity in the first place.
No, it is done for the same reason as empty lines are added to the code.
The variables in the block are scoped to that block, and can no longer be used after the block. So one does not need to pay attention to those block variables.
So this is more readable:
A a;
{ B b; C c; ... }
...
Than:
A a;
B b;
C c;
...
...
It is an attempt to structure the code more readable. For instance above one can read "a declaration of A a; and then a block probably filling a.
Life time analysis in the JVM is fine. Just as there is absolutely no need to set variables to null at the end of their usage.
Sometimes blocks are also abused to repeat blocks with same local variables:
A a1;
{ B b; C c; ... a1 ... }
A a2;
{ B b; C c; ... a2 ... }
A a3;
{ B b; C c; ... a3 ... }
Needless to say that this is the opposite of making code better style.
I am trying to understand the actual process behind object creations in Java - and I suppose other programming languages.
Would it be wrong to assume that object initialization in Java is the same as when you use malloc for a structure in C?
Example:
Foo f = new Foo(10);
typedef struct foo Foo;
Foo *f = malloc(sizeof(Foo));
Is this why objects are said to be on the heap rather than the stack? Because they are essentially just pointers to data?
In C, malloc() allocates a region of memory in the heap and returns a pointer to it. That's all you get. Memory is uninitialized and you have no guarantee that it's all zeros or anything else.
In Java, calling new does a heap based allocation just like malloc(), but you also get a ton of additional convenience (or overhead, if you prefer). For example, you don't have to explicitly specify the number of bytes to be allocated. The compiler figures it out for you based on the type of object you're trying to allocate. Additionally, object constructors are called (which you can pass arguments to if you'd like to control how initialization occurs). When new returns, you're guaranteed to have an object that's initialized.
But yes, at the end of the call both the result of malloc() and new are simply pointers to some chunk of heap-based data.
The second part of your question asks about the differences between a stack and a heap. Far more comprehensive answers can be found by taking a course on (or reading a book about) compiler design. A course on operating systems would also be helpful. There are also numerous questions and answers on SO about the stacks and heaps.
Having said that, I'll give a general overview I hope isn't too verbose and aims to explain the differences at a fairly high level.
Fundamentally, the main reason to have two memory management systems, i.e. a heap and a stack, is for efficiency. A secondary reason is that each is better at certain types of problems than the other.
Stacks are somewhat easier for me to understand as a concept, so I start with stacks. Let's consider this function in C...
int add(int lhs, int rhs) {
int result = lhs + rhs;
return result;
}
The above seems fairly straightforward. We define a function named add() and pass in the left and right addends. The function adds them and returns a result. Please ignore all edge-case stuff such as overflows that might occur, at this point it isn't germane to the discussion.
The add() function's purpose seems pretty straightforward, but what can we tell about its lifecycle? Especially its memory utilization needs?
Most importantly, the compiler knows a priori (i.e. at compile time) how large the data types are and how many will be used. The lhs and rhs arguments are sizeof(int), 4 bytes each. The variable result is also sizeof(int). The compiler can tell that the add() function uses 4 bytes * 3 ints or a total of 12 bytes of memory.
When the add() function is called, a hardware register called the stack pointer will have an address in it that points to the top of the stack. In order to allocate the memory the add() function needs to run, all the function-entry code needs to do is issue one single assembly language instruction to decrement the stack pointer register value by 12. In doing so, it creates storage on the stack for three ints, one each for lhs, rhs, and result. Getting the memory space you need by executing a single instruction is a massive win in terms of speed because single instructions tend to execute in one clock tick (1 billionth of a second a 1 GHz CPU).
Also, from the compiler's view, it can create a map to the variables that looks an awful lot like indexing an array:
lhs: ((int *)stack_pointer_register)[0]
rhs: ((int *)stack_pointer_register)[1]
result: ((int *)stack_pointer_register)[2]
Again, all of this is very fast.
When the add() function exits it has to clean up. It does this by subtracting 12 bytes from the stack pointer register. It's similar to a call to free() but it only uses one CPU instruction and it only takes one tick. It's very, very fast.
Now consider a heap-based allocation. This comes into play when we don't know a priori how much memory we're going to need (i.e. we'll only learn about it at runtime).
Consider this function:
int addRandom(int count) {
int numberOfBytesToAllocate = sizeof(int) * count;
int *array = malloc(numberOfBytesToAllocate);
int result = 0;
if array != NULL {
for (i = 0; i < count; ++i) {
array[i] = (int) random();
result += array[i];
}
free(array);
}
return result;
}
Notice that the addRandom() function doesn't know at compile time what the value of the count argument will be. Because of this, it doesn't make sense to try to define array like we would if we were putting it on the stack, like this:
int array[count];
If count is huge it could cause our stack to grow too large and overwrite other program segments. When this stack overflow happens your program crashes (or worse).
So, in cases where we don't know how much memory we'll need until runtime, we use malloc(). Then we can just ask for the number of bytes we need when we need it, and malloc() will go check if it can vend that many bytes. If it can, great, we get it back, if not, we get a NULL pointer which tells us the call to malloc() failed. Notably though, the program doesn't crash! Of course you as the programmer can decide that your program isn't allowed to run if resource allocation fails, but programmer-initiated termination is different than a spurious crash.
So now we have to come back to look at efficiency. The stack allocator is super fast - one instruction to allocate, one instruction to deallocate, and it's done by the compiler, but remember the stack is meant for things like local variables of a known size so it tends to be fairly small.
The heap allocator on the other hand is several orders of magnitude slower. It has to do a lookup in tables to see if it has enough free memory to be able to vend the amount of memory the user wants. It has to update those tables after it vends the memory to make sure no one else can use that block (this bookkeeping may require the allocator to reserve memory for itself in addition to what it plans to vend). The allocator has to employ locking strategies to make sure it vends memory in a thread-safe way. And when memory is finally free()d, which happens at different times and in no predictable order typically, the allocator has to find contiguous blocks and stitch them back together to repair heap fragmentation. If that sounds like it's going to take more than a single CPU instruction to accomplish all of that, you're right! It's very complicated and it takes a while.
But heaps are big. Much larger than stacks. We can get lots of memory from them and they're great when we don't know at compile time how much memory we'll need. So we trade off speed for a managed memory system that declines us politely instead of crashing when we try to allocate something too large.
I hope that helps answer some of your questions. Please let me know if you'd like clarification on any of the above.
I have these two dummy pieces of code (let's consider they are written in either Java or C#, all variables are local):
Code 1:
int a;
int b = 0;
for (int i = 1; i < 10 ; i++)
{
a = 10;
b += i;
// a lot of more code that doesn't involve assigning new values to "a"
}
Code 2:
int b = 0;
for (int i = 1; i < 10 ; i++)
{
int a = 10;
b += i;
// a lot of more code that doesn't involve assigning new values to "a"
}
At first glance I would say both codes consume the same amount of memory, but Code 1 is more CPU efficient because it creates and allocates variable a just once.
Then I read that Garbage Collectors are extremely efficient to the point that Code 2 would be the more Memory (and CPU?) efficient: keeping variable a inside the loop makes it belongs to Gen0, so it would be garbage collected before variable b.
So, when used with a Garbage Collected language, Code 2 is the more efficient. Am I right?
A few points:
ints (and other primitives) are never allocated on heap. They live directly on the thread stack, "allocation" and "deallocation" are simple moves of a pointer, and happen once (when the function is entered, and immediately after return), regardless of scope.
primitives, that are accessed often, are usually stored in a register for speed, again, regardless of scope.
in your case a (and possibly, b as well, together with the whole loop) will be "optimized away", the optimizer is smart enough to detect a situation when a variable value changes, but is never read, and skip redundant operations. Or, if there is code that actually looks at a, but does not modify it, it will likely be replaced by the optimizer by a constant value of "10", that'll just appear inline everywhere where a is referenced.
New objects (if you did something like String a = new String("foo") for example instead of int) are always allocated in young generation, and only get transferred into old gen after they survive a few minor collections. This means that, for most of the cases, when an object is allocated inside a function, and never referenced from outside, it will never make it to the old gen regardless of its exact scope, unless your heap structure desperately needs tuning.
As pointed out in the comment, sometimes the VM might decide to allocate a large object directly in the old gen (this is true for java too, not just .net), so the point above only apply in most cases, but not always. However, in relation to this question, this does not make any difference, because if the decision is made to allocate an object in old gen, it is made without regard of the scope of its initial reference anyway.
From performance and memory standpoint your two snippets are identical. From the readability perspective though, it is always a good idea to declare all variables in the narrowest possible scope.
Before the code in snippet 2 is actually executed it's going to end up being transformed to look like the code in snippet 1 behind the scenes (whether it be a compiler or runtime). As a result, the performance of the two snippets is going to be identical, as they'll compile into functionally the same code at some point.
Note that for very short lived variables it's actually quite possible for them to not have memory allocated for them at all. They may well be stored entirely in a register, involving 0 memory allocation.
The class variables are like this:
Button[] tab_but = new Button[440];
static int ii;
After initializing tab_but, I'm testing the following job.
for (int j = 0; j < 9999; j++) {
String newLabel = String.valueOf(ii);
for (int i = 0; i < 440; i++) {
tab_but[i].setLabel(newLabel);
}
ii += 1;
}
And it gets 'out of memory' finally.
As I profiled it, Object [] allocation was increasing rapidly with running it.
I think I did only replacing the label, so the previous label object(String) should be cleaned. right?
Why does that kind of memory leak occur?
Please advise and thanks.
I strongly suspect there's something you haven't shown us here. 10000 strings is nothing in terms of memory. If each string is, say, 64 bytes (and that's almost certainly larger than reality) then those 10000 strings take up 640K. I'm assuming you have rather more memory than that, and you haven't set the maximum heap size to something tiny?
Could you provide a short but complete program which demonstrates the problem here?
I wonder whether it's not the strings which are causing the problem, but the fact that you've got 4.4 million UI events being generated - and because you're never letting the UI handle them, they're all building up with no way of them getting cleared. That would make rather more sense (even though it's still not that many objects) - but I'm unsure why you'd see this in real life - obviously the example you've given isn't a particularly realistic one, and you must have come up with it having run out of memory in a more normal program...
I believe when you do the String new Label = String.valueOf(ii);you're creating a new string. When you assign it to the label with setLabel(), there's a reference saved that's overwritten the next time around. Thus, a memory leak.
The garbage collector in Java isn't instant. When there are no more references to an object it becomes available to be garbage collected.
You're creating (and discarding) 9999 String objects. You're running out of memory before they can be collected.
Which of the following would be more optimal on a Java 6 HotSpot VM?
final Map<Foo,Bar> map = new HashMap<Foo,Bar>(someNotSoLargeNumber);
for (int i = 0; i < someLargeNumber; i++)
{
doSomethingWithMap(map);
map.clear();
}
or
final int someNotSoLargeNumber = ...;
for (int i = 0; i < someLargeNumber; i++)
{
final Map<Foo,Bar> map = new HashMap<Foo,Bar>(someNotSoLargeNumber);
doSomethingWithMap(map);
}
I think they're both as clear to the intent, so I don't think style/added complexity is an issue here.
Intuitively it looks like the first one would be better as there's only one 'new'. However, given that no reference to the map is held onto, would HotSpot be able to determine that a map of the same size (Entry[someNotSoLargeNumber] internally) is being created for each loop and then use the same block of memory (i.e. not do a lot of memory allocation, just zeroing that might be quicker than calling clear() for each loop)?
An acceptable answer would be a link to a document describing the different types of optimisations the HotSpot VM can actually do, and how to write code to assist HotSpot (rather than naive attmepts at optimising the code by hand).
Don't spend your time on such micro optimizations unless your profiler says you should do it. In particular, Sun claims that modern garbage collectors do very well with short-living objects and new() becomes cheaper and cheaper
Garbage collection and performance on DeveloperWorks
That's a pretty tight loop over a "fairly large number", so generally I would say move the instantiation outside of the loop. But, overall, my guess is you aren't going to notice much of a difference as I am willing to bet that your doSomethingWithMap will take up the majority of time to allow the GC to catch up.