Why are Java objects allocated in a heap? - java

Quoting Wikipedia 'A heap is a useful data structure when you need to remove the object with the highest (or lowest) priority'.
I am familiar with what a heap is and the kind of problems I can solve with them, but I was wondering why this data structure is the one used for the allocation of Objects in Java? Also, what determines the priority of an Object?

The quoted text is referring to a kind of data structure called a heap.
The word heap is also used for a form of dynamic memory management.
This is a case where one IT English word has taken on two different and independent meanings. (This is a fairly common phenomenon in normal English ...)
I was wondering why this data structure is the one used for the allocation of Objects in Java?
Simply, it isn't. A dynamic memory heap (such as the Java heap) is not organized using a heap data structure.
In fact, the Java heap isn't really a data structure at all. Rather it is an area of memory in which objects are allocated. Space is reclaimed by tracing the reachable objects, and then deleting the remaining objects and consolidating the remaining space.
By contrast, a C or C++ heap cannot be traced and consolidated (because there is insufficient reliable type information to allow pointers to be identified unambiguously). Therefore a C / C++ heap will include a data structure to organize the free space. However, this isn't a heap data structure in the sense of the quoted text. Typically it is an array of lists of "nodes" of the same size.

I will explain that with a reference to C++.
You got local variables that get created on the stack when initializing the variable and destroyed when leaving the block. Basically that means that every local variable lives inside the stack frame of the block. Hence, dies the block, dies the variable.
If you don't know in advance how big your object is going to be, you have to allocate memory on the heap. An example would be a dynamically resizable array. In C++ this is done with the "new" operator (or malloc, calloc, realloc etc.). In Java you are doing this with the "new" operator too. That means you are responsible for creating and releasing the memory.
Objects on the heap don't just get destroyed when you leave a block. Except you define it in your main function and the program exits after that.
In C++ you either call delete or free() to free the created memory of your heap object. In Java on the other hand, the garbage collector does this for you. It is doing that by basically keeping a reference count to the instance (of course its a bit more complicated than that).

Related

What is the difference between stack data structure and stack memory? [duplicate]

I'm studying for my data organization final and I'm going over stacks and heaps because I know they will be on the final and I'm going to need to know the differences.
I know what the Stack is and what the Heap is.
But I'm confused on what a stack is and what a heap is.
The Stack is a place in the RAM where memory is stored, if it runs out of space, a stackoverflow occurs. Objects are stored here by default, it reallocates memory when objects go out of scope, and it is faster.
The Heap is a place in the RAM where memory is stored, if it runs out of space, the OS will assign it more. For an object to be stored on the Heap it needs to be told by using the, new, operator, and will only be deallocated if told. fragmentation problems can occur, it is slower then the Stack, and it handles large amounts of memory better.
But what is a stack, and what is a heap? is it the way memory is stored? for example a static array or static vector is a stack type and a dynamic array, linked list a heap type?
Thank you all!
"The stack" and "the heap" are memory lumps used in a specific way by a program or operating system. For example, the call stack can hold data pertaining to function calls and the heap is a region of memory specifically used for dynamically allocating space.
Contrast these with stack and heap data structures.
A stack can be thought of as an array where the last element in will be the first element out. Operations on this are called push and pop.
A heap is a data structure that represents a special type of graph where each node's value is greater than that of the node's children.
On a side note, keep in mind that "the stack" or "the heap" or any of the stack/heap data structures are unique to any given programming language but are simply concepts in the field of computer science.
I won't get into virtual memory (read about that if you want) so let's simplify and say you have RAM of some size.
You have your code with static initialized data, with some static uninitialized data (static in C++ means like global vars). You have your code.
When you compile something compiler (and linker) will organize and translate your code to machine code (byte code, ones and zeroes) in a following way:
Binary file (and object files) is organized into segments (portions of RAM).
First you have DATA segment. This is the segment that contains values of initialized variables. so if u have variables i.e. int a=3, b = 4 they will go to DATA segment (4 bytes of RAM containing 00000003h, and other 4 bytes containing 000000004h, hexadecimal notation). They are stored consecutively.
Then you have Code segment. All your code is translated into machine code (1s and 0s) and stored in this segment consecutively.
Then you have BSS segment. There goes uninitialized global vars (all static vars that weren't initialized).
Then you have STACK segment. This is reserved for stack. Stack size is determined by operating system by default. You can change this value but i won't get into this now. All local variables go here. When you call some function first func args are pushed to stack, then return address (where to come back when u exit function), then some computer registers are pushed here, and finally all local variables declared in the function get their reserved space on stack.
And you have HEAP segment. This is part of the RAM (size is also determined by OS) where the objects and data are stored using operator new.
Then all of the segments are piled one after the other DATA, CODE, BSS, STACK, HEAP. There are some other segments, but they are not of interest here, and that is loaded in RAM by the operating system. Binary file also has some headers containing information from which location (address in memory) your code begins.
So in short, they are all parts of RAM, since everything that is being executed is loaded into RAM (can't be in ROM (read only), nor HDD since HDD its just for storing files.
When specifically referring to C++'s memory model, the heap and stack refer to areas of memory. It is easy to confuse this with the stack data structure and heap data structure. They are, however, separate concepts.
When discussing programming languages, stack memory is called 'the stack' because it behaves like a stack data structure. The heap is a bit of a misnomer, as it does not necessarily (or likely) use a heap data structure. See Why are two different concepts both called "heap"? for a discussion of why C++'s heap and the data structure's names are the same, despite being two different concepts.
So to answer your question, it depends on the context. In the context of programming languages and memory management, the heap and stack refer to areas of memory with specific properties. Otherwise, they refer to specific data structures.
The technical definition of "a stack" is a Last In, First Out (LIFO) data structure where data is pushed onto and pulled off of the top. Just like with a stack of plates in the real world, you wouldn't pull one out from the middle or bottom, you [usually] wouldn't pull data out of the middle of or the bottom of a data structure stack. When someone talks about the stack in terms of programming, it can often (but not always) mean the hardware stack, which is controlled by the stack pointer register in the CPU.
As far as "a heap" goes, that generally becomes much more nebulous in terms of a definition everyone can agree on. The best definition is likely "a large amount of free memory from which space is allocated for dynamic memory management." In other words, when you need new memory, be it for an array, or an object created with the new operator, it comes from a heap that the OS has reserved for your program. This is "the heap" from the POV of your program, but just "a heap" from the POV of the OS.
The important thing for you to know about stacks is the relationship between the stack and function/method calls. Every function call reserves space on the stack, called a stack frame. This space contains your auto variables (the ones declared inside the function body). When you exit from the function, the stack frame and all the auto variables it contains disappear.
This mechanism is very cheap in terms of CPU resources used, but the lifetime of these stack-allocated variables is obviously limited by the scope of the function.
Memory allocations (objects) on the heap, on the other hand, can live "forever" or as long as you need them without regards to the flow of control of your program. The down side is since you don't get automatic lifetime management of these heap allocated objects, you have to either 1) manage the lifetime yourself, or 2) use special mechanisms like smart pointers to manage the lifetime of these objects. If you get it wrong your program has memory leaks, or access data that may change unexpectedly.
Re: Your question about A stack vs THE stack: When you are using multiple threads, each thread has a separate stack so that each thread can flow into and out of functions/methods independently. Most single threaded programs have only one stack: "the stack" in common terminology.
Likewise for heaps. If you have a special need, it is possible to allocate multiple heaps and choose at allocation time which heap should be used. This is much less common (and a much more complicated topic than I have mentioned here.)

Exclusion of elements in a Java file

I have some doubts about the garbage collector and how I can clear memory in Java.
I have a program that writes a binary search tree to a file and I made a function that inserts an element and another that removes an element, but in the method that removes I put the elements that I remove in a space in the file that I call "empty blocks" (which is a stack). In the C language there is a method that freed the memory that was free(), in Java there is the garbage collector that is at the discretion of Java. How can I free the memory of these blocks in the file (elements excluded).
Is there a way to free the memory of an element on file in Java (the element is of type int)?
I put the elements that I remove in a space in the file that I call “empty blocks ”(Which is a stack)
Whatever data structure you use to track your data will be in an object of some class.
When that object no longer has any references pointing to it, that object becomes a candidate for garbage collection. No need for you to do anything except not hang on to any reference longer than needed.
The garbage collector may clear the unneeded object immediately, or may clear it later. Either way, we as Java programmers do not care. Eventually the memory will be freed up.
If the reference variable pointing to an object is a local variable, that reference is dropped when the local variable goes out of scope.
If the reference variable is a member field on another object, the
object in question will be released when the other object becomes
garbage.
If the reference variable is static, you should assign null explicitly to let the referenced object become garbage. In Java, static variables stay in memory throughout the execution run of your app.
In the first two cases, you can release the object sooner by setting the reference variable to null. Generally this is not needed, but doing so may be wise if a large amount of memory is at stake. Ditto if other precious resources are being needlessly held.
Is there a way to free the memory of an element on file in Java (the element is of type int)?
Your question is really hard to understand, but I think you are asking about freeing up disk blocks in a data structure stored in a file1.
There is no Java support for this. If you write a data structure to a file, the problem of reclaiming space in the file is yours, not Java's. Indeed, I don't think that a typical OS will allow you to (literally) free disk blocks in the middle of a file2.
There may be 3rd-party libraries that support this kind of thing, but I don't have the background knowledge to make a recommendation.
If I have correctly understood what you are asking, your discussion of C's malloc / free versus Java's garbage collection is only peripherally relevant. Both of these schemes are for managing memory, not space in a random access file. Now you could conceivably implement similar schemes for managing space in a file, but you would need to take account of the different characteristics of memory and disk I/O. (Even if you are mapping the file into memory.)
1 - If you are actually talking about managing objects in heap memory in Java, your best bet is to just let the garbage collector deal with it; see Basil's answer. There are also 3rd-party libraries for storing objects in off-heap memory, but it is unclear if they would help you. I understand that such libraries typically leave it to the programmer to decide when to free an object. (They are not garbage collected.)
2 - It would be a bad idea. If the disk blocks thus freed were then used in a different file, you would get a lot of file fragmentation. That would be bad for file I/O performance.

Deleting Dynamic array in java

In C++, dynamically allocated array has to be deleted unless it is lost in the memory. In java, do we have to do the same, and if so. How do you do that
In the Java programming language, dynamic allocation of objects is achieved using the new operator. An object once created uses some memory and the memory remains allocated till there are references for the use of the object. When there are no references for an object, it is assumed to be no longer needed and the memory occupied by the object can be reclaimed.There is no explicit need to destroy an object as java handles the de-allocation automatically. The technique that accomplishes this is known as Garbage Collection.Programs that do not de-allocate memory can eventually crash when there is no memory left in the system to allocate. These programs are said to have memory leaks.
In Java,Garbage collection happens automatically during the lifetime of a java program, eliminating the need to de-allocate memory and avoiding memory leaks.
In C language, it is the programmer’s responsibility to de-allocate memory allocated dynamically using free() function.
Read more at http://www.javatutorialhub.com/java-garbage-collection.html#wpewoJfMWffgXd8O.99
I don't understand your definition of dynamic array. Do you mean a collection of items where its length is unknown initialiazing it? If yes, you're talking about collections such as an ArrayList.
You don't have to worry about the deletion of your objects. The JVM will take care of your objects when they're not used anymore or when they're out of scope.
The JVM analyses the execution of your code and it will invoke a special component, named Garbage Collector, that it will clean the execution code from useless/out of scope objects.
For example, when you have this code:
if(condition)
{
string myStr = "test";
//other code here
}
The variable myStr, because it's not used anymore (it's out of scope of the if statement), will be marked by the JVM to be garbage collected.
The developer need not worry about deallocating the memory in java. The Java Garbage Collector do that for you. You can read how it works here

why use the stack instead of heap?

I see only disadvantage of this: you can get StackOverflow :) Why not use only Heap?
In Java, C, C++ the parameters to functions are passed on stack. The plain variables inside functions bodies are created in stack.
As I know the stack is limited per thread, has some default values, but relative low: 1-8 Mb.
Why not use the Heap instead of Stack. Both are in memory, just the OS make a separation from Address A to B is Heap and from C to D is Stack.
There are variable arguments. It says there are 10 variable of 4 byte each. If you read 11 than you maybe read some data a "memory" trash, and maybe exactly that you want for hacking or maybe you get a Segmentation fault ... if the OS detects you as bad boy. :) - So security can't be a reason for use Stack.
Performance is one of many reasons: memory in the stack is trivial to book-keep; it has no holes; it can be mapped directly into the cache; it is attached on a per-thread basis.
In contrast, memory in the heap is, well, a heap of stuff; it is more difficult to book-keep; it can have holes.
Check out this answer (excellent, in my opinion) explaining some other differences.
Others have already mentioned that the stack can be faster due to simplicity of incrementing/decrementing the stack pointer. This is, however, quite a ways from the whole story.
First of all, if you're using a garbage collector that compacts the heap (i.e., most modern collectors) allocation on the heap isn't much different from allocation on the stack. You simply keep a pointer to boundary between allocated and free memory, and to allocate some space, you just move that pointer, just like you would on the stack. Objects that will have extremely short lives (like the locals in most functions) cost next to nothing in a GC cycle too. Keeping a live object accessible takes (a little) work, but an object that's no longer accessible normally involves next to no work.
There is, however, often still a substantial advantage to using the stack for most variables. Many typical programs tend to run for fairly extended periods of time using nearly constant amounts of stack space. They enter one function, create some variables, use them for a while, pop them off the stack, then repeat the same cycle in another function.
This means most of the memory toward the top of the stack is almost always in the cache. Most function calls are re-using memory that was just vacated by the previous function call. By reusing the same memory continuously, you end up with considerably better cache usage.
By contrast, when you allocate items in the heap, you typically end up allocating separate space for nearly every item. You cache is in a constant state of "churn", throwing away the memory for objects you're no longer user to make space for newly allocated ones. Unless you use a minuscule heap, the chances of re-using an address while it's still in the cache are nearly nonexistent.
I'm sure this is answered a million times online, but...
Because you don't want every method call to be a memory allocation (slow). So, you pre-allocate your stack.
Some more reasons listed here (including security).
The answer is that you get holes when you allocate and de-allocate on the heap. This means that it gets more and more difficult to allocate memory since the places that are available are different sizes. The stack only reserves what is needed and gives it all back when you get out of scope. No hassle.
If everything was on the stack, each time you passed those values on, they would have to be copied. However, unlike the heap, it doesn't need to be cleverly managed - items on the heap require garbage collection.
So they work in two different ways that suit two different uses. The stack is a quick and lightweight home for values to be held for a short time whereas the heap allows you to pass objects around without copying them.
Neither stack nor heap is perfect for every scenario - that is why they both exist.
Using the heap requires "requesting" a bit of memory from the heap, using new or some similar function. Then, when it's finished, you delete the it again. This is very useful for variables that are long-lived and/or that take up quite a bit of space (or take up an "unknown at compile-time" space - for example if you read a string into a variable from a file, you don't necessarily know how much space it needs, and it's REALLY annoying to get a message from the program saying "String too large on line X in file Y").
On the other hand, the stack is "free" both when it comes to allocating and de-allocating (technically, any function that uses stack-space will need one extra instruction for the allocation of the stackspace, but compared to the several hundred or thousands that a call to new will involve, it's not noticeable). Of course, class objects will still have to have their respective constructors called, which may take almost any amount of time to complete, but that is true regardless of how/where the storage is allocated from.

In Java, is there a performance difference between new and local?

In C and C++ I know that there could be a huge difference in performance between instantiating objects on the stack vs. using 'new' to create them on the heap.
Is this the same in Java?
The 'new' operator in Java is very convenient (especially when I don't have to remember freeing/deleting the objects created with 'new'), but does this mean that I can go wild with 'new'?
Erm, there is no other way in java to instantiate an object.
All objects are created with new, and all objects are created on the heap.
in Java, when you say
MyObject foo;
You're simply declaring a variable (reference). It isn't instantiated until you say
foo = new MyObject();
When all references to that object are out of scope, the object becomes elegible for garbage collection. You'll note there's no such thing as delete in java :)
There is no allocation of objects on the stack in Java.
Only local variables (and parameters) can live on the stack and those can only contain references or primitive values, but never objects.
You can't create objects on the stack, you can only have primitives and references on the stack, so the question doesn't apply to Java.
There have been attempts to use escape analysis to optimise objects which are short lived (and possibly put them on the stack instead) however I haven't seen any evidence this improved performance.
Part of the reason there isn't the same performance hit/benifit as there would be in C/C++ is that Java has thread local allocation on the heap and objects are not recycled as agressively. C/C++ has thread local stacks, but you need additional libraires to support multi-thread object allocation. Objects are recycled more aggresively which increases the cost of object allocation.
One of the biggest changes coming from C/C++ world is to find that Java has far less features, but tries to do make the most of them (There is alot of complex optimisation going on in the JVM) On the other hand Java has a rich/baffling array of open sources libraries.
Repeat after me: there is no allocation of objects on the stack in Java
In Java, unlike C++, all objects are allocated on the heap, and the only way out is when they are garbage collected.
In Java, unlike C++, the variable falling out of scope does not mean that the destructor of the object runs; in fact, there is no destructor. So the variable might fall out of scope, but the object remains alive on the heap.
Can I go wild with 'new'?
Yes. First, because it's the only way to instantiate an object. Second, because the JVM is so good it can create up to 2^32 ightweight objects in less than a second.
In Java, there is no way to manually allocate objects on the Stack, though the compiler may decide to allocate objects created with 'new' on the stack, see Java theory and practice: Urban performance legends, revisited.
There's really nothing to compare here: you can't create objects on the stack in Java.
If it's any comfort, however, heap-based allocation in Java is (at least usually) quite fast. Java's garbage collector periodically "cleans up" the heap, so it basically looks a lot like a stack, and allocating from it is a lot like allocating from a stack as well -- in a typical case, you have a pointer to the beginning (or end) of the free memory area, and allocating a chunk of memory simply means adding (or subtracting) the amount from that pointer, and returning the address of the beginning (then, of course, constructing an object (or objects) in that area, etc.)

Categories

Resources