Why heaps are used for managing object references in java? - java

Functions and variables are stored on stack while string and object references are stored on heap. Why is there a difference on how they are stored?

Some of the comments above offer links to the difference. The stack is temporary. Chunks of the stack are used as your code nests. Think of a big chunk of memory in which you keep track only of the very top of the memory. When you call a method, the system knows how much memory is required to remember where to return to when the method exits and enough space for the variables required for the method. The stack pointer then points higher up in memory, giving you all the space it just skipped over. When your method returns, the stack pointer is returned to where it was before your method was called. Any variables that were there are now gone.
It's not entirely that simple in a complex world like Java, but I still think of the stack in assembly language, which is where I first encountered it. (I'm old.) Close enough for this discussion.
The heap is different. The heap is managed memory with complex structures that keep track of the memory you've used. If you say new Foo(), Java knows how big a Foo is and it asks the heap for enough space to hold one. Much more complex things happen around managing that. But when your method returns, that object still exists. If it were allocated on the stack, there'd be real problems, because the stack unwinds when your method returns. But your memory in the heap is still allocated, and your object can continue to exist.
Again, it's not that simple, but maybe it makes sense.
Space on the stack exists only as long as your method is running. (I presume if you nest inside {}, it might allocate more space. I don't know.) Space on the heap persists until objects are freed, but that can be far longer than the duration of a method call.

Related

What is the difference between stack data structure and stack memory? [duplicate]

I'm studying for my data organization final and I'm going over stacks and heaps because I know they will be on the final and I'm going to need to know the differences.
I know what the Stack is and what the Heap is.
But I'm confused on what a stack is and what a heap is.
The Stack is a place in the RAM where memory is stored, if it runs out of space, a stackoverflow occurs. Objects are stored here by default, it reallocates memory when objects go out of scope, and it is faster.
The Heap is a place in the RAM where memory is stored, if it runs out of space, the OS will assign it more. For an object to be stored on the Heap it needs to be told by using the, new, operator, and will only be deallocated if told. fragmentation problems can occur, it is slower then the Stack, and it handles large amounts of memory better.
But what is a stack, and what is a heap? is it the way memory is stored? for example a static array or static vector is a stack type and a dynamic array, linked list a heap type?
Thank you all!
"The stack" and "the heap" are memory lumps used in a specific way by a program or operating system. For example, the call stack can hold data pertaining to function calls and the heap is a region of memory specifically used for dynamically allocating space.
Contrast these with stack and heap data structures.
A stack can be thought of as an array where the last element in will be the first element out. Operations on this are called push and pop.
A heap is a data structure that represents a special type of graph where each node's value is greater than that of the node's children.
On a side note, keep in mind that "the stack" or "the heap" or any of the stack/heap data structures are unique to any given programming language but are simply concepts in the field of computer science.
I won't get into virtual memory (read about that if you want) so let's simplify and say you have RAM of some size.
You have your code with static initialized data, with some static uninitialized data (static in C++ means like global vars). You have your code.
When you compile something compiler (and linker) will organize and translate your code to machine code (byte code, ones and zeroes) in a following way:
Binary file (and object files) is organized into segments (portions of RAM).
First you have DATA segment. This is the segment that contains values of initialized variables. so if u have variables i.e. int a=3, b = 4 they will go to DATA segment (4 bytes of RAM containing 00000003h, and other 4 bytes containing 000000004h, hexadecimal notation). They are stored consecutively.
Then you have Code segment. All your code is translated into machine code (1s and 0s) and stored in this segment consecutively.
Then you have BSS segment. There goes uninitialized global vars (all static vars that weren't initialized).
Then you have STACK segment. This is reserved for stack. Stack size is determined by operating system by default. You can change this value but i won't get into this now. All local variables go here. When you call some function first func args are pushed to stack, then return address (where to come back when u exit function), then some computer registers are pushed here, and finally all local variables declared in the function get their reserved space on stack.
And you have HEAP segment. This is part of the RAM (size is also determined by OS) where the objects and data are stored using operator new.
Then all of the segments are piled one after the other DATA, CODE, BSS, STACK, HEAP. There are some other segments, but they are not of interest here, and that is loaded in RAM by the operating system. Binary file also has some headers containing information from which location (address in memory) your code begins.
So in short, they are all parts of RAM, since everything that is being executed is loaded into RAM (can't be in ROM (read only), nor HDD since HDD its just for storing files.
When specifically referring to C++'s memory model, the heap and stack refer to areas of memory. It is easy to confuse this with the stack data structure and heap data structure. They are, however, separate concepts.
When discussing programming languages, stack memory is called 'the stack' because it behaves like a stack data structure. The heap is a bit of a misnomer, as it does not necessarily (or likely) use a heap data structure. See Why are two different concepts both called "heap"? for a discussion of why C++'s heap and the data structure's names are the same, despite being two different concepts.
So to answer your question, it depends on the context. In the context of programming languages and memory management, the heap and stack refer to areas of memory with specific properties. Otherwise, they refer to specific data structures.
The technical definition of "a stack" is a Last In, First Out (LIFO) data structure where data is pushed onto and pulled off of the top. Just like with a stack of plates in the real world, you wouldn't pull one out from the middle or bottom, you [usually] wouldn't pull data out of the middle of or the bottom of a data structure stack. When someone talks about the stack in terms of programming, it can often (but not always) mean the hardware stack, which is controlled by the stack pointer register in the CPU.
As far as "a heap" goes, that generally becomes much more nebulous in terms of a definition everyone can agree on. The best definition is likely "a large amount of free memory from which space is allocated for dynamic memory management." In other words, when you need new memory, be it for an array, or an object created with the new operator, it comes from a heap that the OS has reserved for your program. This is "the heap" from the POV of your program, but just "a heap" from the POV of the OS.
The important thing for you to know about stacks is the relationship between the stack and function/method calls. Every function call reserves space on the stack, called a stack frame. This space contains your auto variables (the ones declared inside the function body). When you exit from the function, the stack frame and all the auto variables it contains disappear.
This mechanism is very cheap in terms of CPU resources used, but the lifetime of these stack-allocated variables is obviously limited by the scope of the function.
Memory allocations (objects) on the heap, on the other hand, can live "forever" or as long as you need them without regards to the flow of control of your program. The down side is since you don't get automatic lifetime management of these heap allocated objects, you have to either 1) manage the lifetime yourself, or 2) use special mechanisms like smart pointers to manage the lifetime of these objects. If you get it wrong your program has memory leaks, or access data that may change unexpectedly.
Re: Your question about A stack vs THE stack: When you are using multiple threads, each thread has a separate stack so that each thread can flow into and out of functions/methods independently. Most single threaded programs have only one stack: "the stack" in common terminology.
Likewise for heaps. If you have a special need, it is possible to allocate multiple heaps and choose at allocation time which heap should be used. This is much less common (and a much more complicated topic than I have mentioned here.)

When to create variables (memory management)

You create a variable to store a value that you can refer to that variable in the future. I've heard that you must set a variable to 'null' once you're done using it so the garbage collector can get to it (if it's a field var).
If I were to have a variable that I won't be referring to agaon, would removing the reference/value vars I'm using (and just using the numbers when needed) save memory? For example:
int number = 5;
public void method() {
System.out.println(number);
}
Would that take more space than just plugging '5' into the println method?
I have a few integers that I don't refer to in my code ever again (game loop), but I've seen others use reference vars on things that really didn't need them. Been looking into memory management, so please let me know, along with any other advice you have to offer about managing memory
I've heard that you must set a variable to 'null' once you're done using it so the garbage collector can get to it (if it's a field var).
This is very rarely a good idea. You only need to do this if the variable is a reference to an object which is going to live much longer than the object it refers to.
Say you have an instance of Class A and it has a reference to an instance of Class B. Class B is very large and you don't need it for very long (a pretty rare situation) You might null out the reference to class B to allow it to be collected.
A better way to handle objects which don't live very long is to hold them in local variables. These are naturally cleaned up when they drop out of scope.
If I were to have a variable that I won't be referring to agaon, would removing the reference vars I'm using (and just using the numbers when needed) save memory?
You don't free the memory for a primitive until the object which contains it is cleaned up by the GC.
Would that take more space than just plugging '5' into the println method?
The JIT is smart enough to turn fields which don't change into constants.
Been looking into memory management, so please let me know, along with any other advice you have to offer about managing memory
Use a memory profiler instead of chasing down 4 bytes of memory. Something like 4 million bytes might be worth chasing if you have a smart phone. If you have a PC, I wouldn't both with 4 million bytes.
In your example number is a primitive, so will be stored as a value.
If you want to use a reference then you should use one of the wrapper types (e.g. Integer)
So notice variables are on the stack, the values they refer to are on the heap. So having variables is not too bad but yes they do create references to other entities. However in the simple case you describe it's not really any consequence. If it is never read again and within a contained scope, the compiler will probably strip it out before runtime. Even if it didn't the garbage collector will be able to safely remove it after the stack squashes. If you are running into issues where you have too many stack variables, it's usually because you have really deep stacks. The amount of stack space needed per thread is a better place to adjust than to make your code unreadable. The setting to null is also no longer needed
It's really a matter of opinion. In your example, System.out.println(5) would be slightly more efficient, as you only refer to the number once and never change it. As was said in a comment, int is a primitive type and not a reference - thus it doesn't take up much space. However, you might want to set actual reference variables to null only if they are used in a very complicated method. All local reference variables are garbage collected when the method they are declared in returns.
Well, the JVM memory model works something like this: values are stored on one pile of memory stack and objects are stored on another pile of memory called the heap. The garbage collector looks for garbage by looking at a list of objects you've made and seeing which ones aren't pointed at by anything. This is where setting an object to null comes in; all nonprimitive (think of classes) variables are really references that point to the object on the stack, so by setting the reference you have to null the garbage collector can see that there's nothing else pointing at the object and it can decide to garbage collect it. All Java objects are stored on the heap so they can be seen and collected by the garbage collector.
Nonprimitive (ints, chars, doubles, those sort of things) values, however, aren't stored on the heap. They're created and stored temporarily as they're needed and there's not much you can do there, but thankfully the compilers nowadays are really efficient and will avoid needed to store them on the JVM stack unless they absolutely need to.
On a bytecode level, that's basically how it works. The JVM is based on a stack-based machine, with a couple instructions to create allocate objects on the heap as well, and a ton of instructions to manipulate, push and pop values, off the stack. Local variables are stored on the stack, allocated variables on the heap.* These are the heap and the stack I'm referring to above. Here's a pretty good starting point if you want to get into the nitty gritty details.
In the resulting compiled code, there's a bit of leeway in terms of implementing the heap and stack. Allocation's implemented as allocation, there's really not a way around doing so. Thus the virtual machine heap becomes an actual heap, and allocations in the bytecode are allocations in actual memory. But you can get around using a stack to some extent, since instead of storing the values on a stack (and accessing a ton of memory), you can stored them on registers on the CPU which can be up to a hundred times (maybe even a thousand) faster than storing it on memory. But there's cases where this isn't possible (look up register spilling for one example of when this may happen), and using a stack to implement a stack kind of makes a lot of sense.
And quite frankly in your case a few integers probably won't matter. The compiler will probably optimize them out by itself in this case anyways. Optimization should always happen after you get it running and notice it's a tad slower than you'd prefer it to be. Worry about making simple, elegant, working code first then later make it fast (and hopefully) simple, elegant, working code.
Java's actually very nicely made so that you shouldn't have to worry about nulling variables very often. Whenever you stop needing to use something, it will usually incidentally be disappearing from the scope of your program (and thus becoming eligible for garbage collection). So I guess the real lesson here is to use local variables as often as you can.
*There's also a constant pool, a local variable pool, and a couple other things in memory but you have close to no control over the size of those things and I want to keep this fairly simple.

why use the stack instead of heap?

I see only disadvantage of this: you can get StackOverflow :) Why not use only Heap?
In Java, C, C++ the parameters to functions are passed on stack. The plain variables inside functions bodies are created in stack.
As I know the stack is limited per thread, has some default values, but relative low: 1-8 Mb.
Why not use the Heap instead of Stack. Both are in memory, just the OS make a separation from Address A to B is Heap and from C to D is Stack.
There are variable arguments. It says there are 10 variable of 4 byte each. If you read 11 than you maybe read some data a "memory" trash, and maybe exactly that you want for hacking or maybe you get a Segmentation fault ... if the OS detects you as bad boy. :) - So security can't be a reason for use Stack.
Performance is one of many reasons: memory in the stack is trivial to book-keep; it has no holes; it can be mapped directly into the cache; it is attached on a per-thread basis.
In contrast, memory in the heap is, well, a heap of stuff; it is more difficult to book-keep; it can have holes.
Check out this answer (excellent, in my opinion) explaining some other differences.
Others have already mentioned that the stack can be faster due to simplicity of incrementing/decrementing the stack pointer. This is, however, quite a ways from the whole story.
First of all, if you're using a garbage collector that compacts the heap (i.e., most modern collectors) allocation on the heap isn't much different from allocation on the stack. You simply keep a pointer to boundary between allocated and free memory, and to allocate some space, you just move that pointer, just like you would on the stack. Objects that will have extremely short lives (like the locals in most functions) cost next to nothing in a GC cycle too. Keeping a live object accessible takes (a little) work, but an object that's no longer accessible normally involves next to no work.
There is, however, often still a substantial advantage to using the stack for most variables. Many typical programs tend to run for fairly extended periods of time using nearly constant amounts of stack space. They enter one function, create some variables, use them for a while, pop them off the stack, then repeat the same cycle in another function.
This means most of the memory toward the top of the stack is almost always in the cache. Most function calls are re-using memory that was just vacated by the previous function call. By reusing the same memory continuously, you end up with considerably better cache usage.
By contrast, when you allocate items in the heap, you typically end up allocating separate space for nearly every item. You cache is in a constant state of "churn", throwing away the memory for objects you're no longer user to make space for newly allocated ones. Unless you use a minuscule heap, the chances of re-using an address while it's still in the cache are nearly nonexistent.
I'm sure this is answered a million times online, but...
Because you don't want every method call to be a memory allocation (slow). So, you pre-allocate your stack.
Some more reasons listed here (including security).
The answer is that you get holes when you allocate and de-allocate on the heap. This means that it gets more and more difficult to allocate memory since the places that are available are different sizes. The stack only reserves what is needed and gives it all back when you get out of scope. No hassle.
If everything was on the stack, each time you passed those values on, they would have to be copied. However, unlike the heap, it doesn't need to be cleverly managed - items on the heap require garbage collection.
So they work in two different ways that suit two different uses. The stack is a quick and lightweight home for values to be held for a short time whereas the heap allows you to pass objects around without copying them.
Neither stack nor heap is perfect for every scenario - that is why they both exist.
Using the heap requires "requesting" a bit of memory from the heap, using new or some similar function. Then, when it's finished, you delete the it again. This is very useful for variables that are long-lived and/or that take up quite a bit of space (or take up an "unknown at compile-time" space - for example if you read a string into a variable from a file, you don't necessarily know how much space it needs, and it's REALLY annoying to get a message from the program saying "String too large on line X in file Y").
On the other hand, the stack is "free" both when it comes to allocating and de-allocating (technically, any function that uses stack-space will need one extra instruction for the allocation of the stackspace, but compared to the several hundred or thousands that a call to new will involve, it's not noticeable). Of course, class objects will still have to have their respective constructors called, which may take almost any amount of time to complete, but that is true regardless of how/where the storage is allocated from.

Does java recursion algorithms consume more heap causing additional garbage collection?

If I have two algorithms that produce the same results, where the first is based on recursion and the other is a loop based one, which will cause more garbage collection with regards to pure program flow management ?
Recursion alone will cause additional stack use, as each layer of the call is a new element in the call stack, but will not use any extra heap (aside from perhaps a small object or two uses to keep track of stack information - those may be allocated on the heap, I am not sure).
However, in a typical recursive algorithm, any extra heap objects will likely not last very long, and will be cleaned up in the next young generation collection. So they won't result in very much more garbage collection.
JVM keeps stack frames separately from heap. So they are not garbage collected. Recursion will not affect heap as long as you dont initialize objects inside method calls. Here is an article. But still its slower than iteration because additional time to allocate stack frame is still needed.
Basically (over-simplified), Java memory is divided into two segments, the stack and the heap. Call/returns are kept track of in the stack, in an obvious fashion where an entry is "pushed" on a call and "popped" on return. This scheme is "self managing", and does not need garbage collection to keep it tidy.
Of course, other heap-based objects may be allocated in each call frame, but that has nothing to do with recursion. And generally speaking if one uses a recursive algorithm it eliminates the need for a data structure to track progress that would have been heap-allocated in the non-recursive case.

Is the stack garbage collected in Java?

The heap memory is garbage collected in Java.
Is the stack garbage collected as well?
How is stack memory reclaimed?
The memory on the stack contains method-parameters and local variables (to be precise: the references for objects and variables itself for primitive types). That will be automatically removed if you leave the method. If the variables are references (to objects) the objects itself are on the heap and handled by the garbage collector.
So the stack isn't garbage collected in the same way as the heap, but stack is a form of automatic memory-management in it's own (which predates garbage collection).
A more detailed answer is given by Thomas Pornin, look into that for more details.
The stack is not garbage collected in Java.
The stack allocated for a given method call is freed when the method returns. Since that's a very simple LIFO structure, there's no need for garbage collection.
One place where the stack and garbage collection interact is that references on the stack are GC roots (which means that they are the root references from which reachability is decided).
The stack could be garbage collected. However, in most JVM implementations, it is handled as, well, a "stack", which by definition precludes garbage collection.
What we call the stack is the accumulation of method activation contexts: for each invoked method, this is the conceptual structure which contains the method arguments, local variables, a hidden pointer to the context for the calling method, and a slot to save the instruction pointer. The activation context is not accessible as such from the Java language itself. A context becomes useless when the method exits (with a return or because of a thrown exception). It so happens that when a method A calls a method B, it is guaranteed that when A regains control, the context for B has become useless. This implies that the lifetime of the context for B is a subrange of the lifetime of the context for A. Therefore, activation contexts (for a given thread) can be allocated with a LIFO ("Last In, First Out") discipline. In simpler words, a stack: a new activation context is pushed on top of the stack of contexts, and the context on top will be the first to be disposed of.
In practice, the activation contexts (also called stack frames) are concatenated, in stack order, in a dedicated area. That area is obtained from the operating system when the thread is started, and the operating system gets it back when the thread terminates. The top of the stack is designated by a specific pointer, often contained in a CPU register (this depends on whether the JVM is interpreting or compiling code). The "pointer to caller's context" is virtual; the caller's context is necessarily located just below in stack order. The GC does not intervene: the area for the stack is created and reclaimed synchronously, from the thread activity itself. This is also how it works in many languages such as C, which do not have a GC at all.
Now nothing prevents a JVM implementation from doing otherwise, e.g. allocating activation contexts in the heap and having them collected by the GC. This is not usually done in Java Virtual Machines since stack allocation is faster. But some other languages need to do such things, most notably those which play with continuations while still using a GC (e.g. Scheme and its call-with-current-continuation function), because such games break the LIFO rule explained above.
The stack part of the memory works just like a "stack". I know it sounds bad, but that's exactly how it works. Data is added to the top, on top of each other (pushed onto the stack) and is automatically removed from the top (popped off the stack) as your program runs. It is not garbage collected - and it doesn't need to be since that memory is automatically reclaimed once data is popped off the stack. And when I say reclaimed I don't mean it gets de-allocated - it's just that the location in the stack memory where the next data will be stored is decreased, as data is popped off.
Of course that's not to say that you don't need to worry at all about the stack. If you run a recursive function many times it will eventually use up all the stack space. The same if you call many functions, especially if they have many parameters and/or local variables.
But the bottom line is that the memory of the stack is used and reclaimed as functions enter and leave scope - automatically. So at the end of your program's execution all the stack memory would be free and then released back to the operating system.
If you refer to the memory used on the stack, it is not garbage collected.
The java virtual machine uses explicit bytecode instructions to reserve and release memory on the stack, these instructions are generated by the compiler and manage the lifetime of primitives like int,boolean,double and object-references on the stack.
There have been plans to implement a so called tail call optimisation, which would remove some entries from the stack once it is known that they are no longer used, but I don't know any jvm which already supports this.
So no there is no garbage collection for the stack itself, only the compiler generated push and pop instructions to manage the memory use.
The stack itself is part of a thread. The stack is allocated when the thread object is created and garbage collected after the thread terminated and the thread object is no longer referenced.
All objects in Java are allocated on the heap. (At least as far as the spec goes, the actual implementation may allocate them on the stack if they transparently behave as if they were on the heap.)
Exactly what is collectible is a bit subtle. If the only reference to an object is in a single stack frame, and it can be shown that reference will not be used again, then the object may be collected. If the object is only used to read a field, then that field read may be optimised forward and the object collected earlier than you might expect.
This doesn't usually matter unless you are using finalisers (or presumably References). In that case you should be careful and use locks/volatile to enforce a happens-before relationship.
When threads stop, then typically the entire stack will be deallocated.
No one, data is pushed and popped from stack as you have inner variables in methods, during method calls, etc. You don't need to care about this.
Everything located on stack is treated as global roots by a garbage collector. So, yes, you definitely can say that stack is "garbage collected".
No. The stack is not garbage collected in Java.
Each thread has its own stack and contains :
Method specific values (which are short lived) and
References to the objects, created on heap, and are being referred to by the method
These values are pushed as stack frames to the stack for every method call. Since the stack follows 'Last-in First-out' order, at the end of every method call, each stack frame containing all the method specific data and the references to objects, if any, is popped out.
Hence, data in stack are automatically cleaned once the method/program goes out of scope.

Categories

Resources