Memory management with Java (how to use the data segment?)

Memory management with Java (how to use the data segment?) - java

I have very large data structures that I define as static fields in a class. I think they get pushed into the heap because my code fails with that error message (heap memory exceeded). Now, I think I recall there to be a memory segment besides heap and stack that is much larger, called data. Is it possible for me to push the variables in that segment? If so, how is this accomplished? I can't afford to increase the heap size because my program will be used by others.

The only thing you could really possibly mean is the disk -- actually writing things to files.

Related

What is the difference between stack data structure and stack memory? [duplicate]

I'm studying for my data organization final and I'm going over stacks and heaps because I know they will be on the final and I'm going to need to know the differences.
I know what the Stack is and what the Heap is.
But I'm confused on what a stack is and what a heap is.
The Stack is a place in the RAM where memory is stored, if it runs out of space, a stackoverflow occurs. Objects are stored here by default, it reallocates memory when objects go out of scope, and it is faster.
The Heap is a place in the RAM where memory is stored, if it runs out of space, the OS will assign it more. For an object to be stored on the Heap it needs to be told by using the, new, operator, and will only be deallocated if told. fragmentation problems can occur, it is slower then the Stack, and it handles large amounts of memory better.
But what is a stack, and what is a heap? is it the way memory is stored? for example a static array or static vector is a stack type and a dynamic array, linked list a heap type?
Thank you all!

"The stack" and "the heap" are memory lumps used in a specific way by a program or operating system. For example, the call stack can hold data pertaining to function calls and the heap is a region of memory specifically used for dynamically allocating space.
Contrast these with stack and heap data structures.
A stack can be thought of as an array where the last element in will be the first element out. Operations on this are called push and pop.
A heap is a data structure that represents a special type of graph where each node's value is greater than that of the node's children.
On a side note, keep in mind that "the stack" or "the heap" or any of the stack/heap data structures are unique to any given programming language but are simply concepts in the field of computer science.

I won't get into virtual memory (read about that if you want) so let's simplify and say you have RAM of some size.
You have your code with static initialized data, with some static uninitialized data (static in C++ means like global vars). You have your code.
When you compile something compiler (and linker) will organize and translate your code to machine code (byte code, ones and zeroes) in a following way:
Binary file (and object files) is organized into segments (portions of RAM).
First you have DATA segment. This is the segment that contains values of initialized variables. so if u have variables i.e. int a=3, b = 4 they will go to DATA segment (4 bytes of RAM containing 00000003h, and other 4 bytes containing 000000004h, hexadecimal notation). They are stored consecutively.
Then you have Code segment. All your code is translated into machine code (1s and 0s) and stored in this segment consecutively.
Then you have BSS segment. There goes uninitialized global vars (all static vars that weren't initialized).
Then you have STACK segment. This is reserved for stack. Stack size is determined by operating system by default. You can change this value but i won't get into this now. All local variables go here. When you call some function first func args are pushed to stack, then return address (where to come back when u exit function), then some computer registers are pushed here, and finally all local variables declared in the function get their reserved space on stack.
And you have HEAP segment. This is part of the RAM (size is also determined by OS) where the objects and data are stored using operator new.
Then all of the segments are piled one after the other DATA, CODE, BSS, STACK, HEAP. There are some other segments, but they are not of interest here, and that is loaded in RAM by the operating system. Binary file also has some headers containing information from which location (address in memory) your code begins.
So in short, they are all parts of RAM, since everything that is being executed is loaded into RAM (can't be in ROM (read only), nor HDD since HDD its just for storing files.

When specifically referring to C++'s memory model, the heap and stack refer to areas of memory. It is easy to confuse this with the stack data structure and heap data structure. They are, however, separate concepts.
When discussing programming languages, stack memory is called 'the stack' because it behaves like a stack data structure. The heap is a bit of a misnomer, as it does not necessarily (or likely) use a heap data structure. See Why are two different concepts both called "heap"? for a discussion of why C++'s heap and the data structure's names are the same, despite being two different concepts.
So to answer your question, it depends on the context. In the context of programming languages and memory management, the heap and stack refer to areas of memory with specific properties. Otherwise, they refer to specific data structures.

The technical definition of "a stack" is a Last In, First Out (LIFO) data structure where data is pushed onto and pulled off of the top. Just like with a stack of plates in the real world, you wouldn't pull one out from the middle or bottom, you [usually] wouldn't pull data out of the middle of or the bottom of a data structure stack. When someone talks about the stack in terms of programming, it can often (but not always) mean the hardware stack, which is controlled by the stack pointer register in the CPU.
As far as "a heap" goes, that generally becomes much more nebulous in terms of a definition everyone can agree on. The best definition is likely "a large amount of free memory from which space is allocated for dynamic memory management." In other words, when you need new memory, be it for an array, or an object created with the new operator, it comes from a heap that the OS has reserved for your program. This is "the heap" from the POV of your program, but just "a heap" from the POV of the OS.

The important thing for you to know about stacks is the relationship between the stack and function/method calls. Every function call reserves space on the stack, called a stack frame. This space contains your auto variables (the ones declared inside the function body). When you exit from the function, the stack frame and all the auto variables it contains disappear.
This mechanism is very cheap in terms of CPU resources used, but the lifetime of these stack-allocated variables is obviously limited by the scope of the function.
Memory allocations (objects) on the heap, on the other hand, can live "forever" or as long as you need them without regards to the flow of control of your program. The down side is since you don't get automatic lifetime management of these heap allocated objects, you have to either 1) manage the lifetime yourself, or 2) use special mechanisms like smart pointers to manage the lifetime of these objects. If you get it wrong your program has memory leaks, or access data that may change unexpectedly.
Re: Your question about A stack vs THE stack: When you are using multiple threads, each thread has a separate stack so that each thread can flow into and out of functions/methods independently. Most single threaded programs have only one stack: "the stack" in common terminology.
Likewise for heaps. If you have a special need, it is possible to allocate multiple heaps and choose at allocation time which heap should be used. This is much less common (and a much more complicated topic than I have mentioned here.)

Reducing Java heap size

I have an application that uses a lot of memory diff'ing the contents of two potentially huge (100k+) directories. It makes sense to me that such an operation would use a lot of memory, but once my diff'ing operation is done, the heap remains the same size.
I basically have code that instantiates a class to store the filename, file size, path, and modification date for each file on the source and target. I save the additions, deletions, and updates in other arrays. I then clear() my source and target arrays (which could be 100k+ each by now), leaving relatively small additions, deletions, and updates arrays left.
After I clear() my target and source arrays though, the memory usage (as visible via VirtualVM and Windows Task Manager) doesn't drop. I'm not experienced enough with VirtualVM (or any profiler for that matter) to figure out what is taking up all this memory. VirtualVM's heap dump lists the top few objects with a retained size of a few megabytes.
Anything to help point me in the right direction?

If the used heap goes down after a Garbage Collection, than it likely works as expected. Java increases its heap when it needs more memory, but does not free it -- it prefers to keep it in case the application uses more memory again. See Is there a way to lower Java heap when not in use? for this topic on why the heap is not reduced after the used heap amount lowers.

The VM grows or shrinks the heap based on the command-line parameters -XX:MinHeapFreeRatio and -XX:MaxHeapFreeRatio. It will shrink the heap when the free percentage hits -XX:MaxHeapFreeRatio, whose default is 70.
There is a short discussion of this in Oracle's bug #6498735.

Depending on your code you might be generating memory leaks and the Garbage collector just can't free them up.
I would suggest to instrument your code in order to find potential memory leaks. Once this is ruled out or fixed, I would start to look at the code itself for possible improvement.
Note that for instance if you use the try/catch/finally block. The finally block might not be called at all (or at least not immediately). If you do some resource freeing in a finally block this might be the answer.
Nevertheless read up on the subject, for instance here: http://www.toptal.com/java/hunting-memory-leaks-in-java

why use the stack instead of heap?

I see only disadvantage of this: you can get StackOverflow :) Why not use only Heap?
In Java, C, C++ the parameters to functions are passed on stack. The plain variables inside functions bodies are created in stack.
As I know the stack is limited per thread, has some default values, but relative low: 1-8 Mb.
Why not use the Heap instead of Stack. Both are in memory, just the OS make a separation from Address A to B is Heap and from C to D is Stack.
There are variable arguments. It says there are 10 variable of 4 byte each. If you read 11 than you maybe read some data a "memory" trash, and maybe exactly that you want for hacking or maybe you get a Segmentation fault ... if the OS detects you as bad boy. :) - So security can't be a reason for use Stack.

Performance is one of many reasons: memory in the stack is trivial to book-keep; it has no holes; it can be mapped directly into the cache; it is attached on a per-thread basis.
In contrast, memory in the heap is, well, a heap of stuff; it is more difficult to book-keep; it can have holes.
Check out this answer (excellent, in my opinion) explaining some other differences.

Others have already mentioned that the stack can be faster due to simplicity of incrementing/decrementing the stack pointer. This is, however, quite a ways from the whole story.
First of all, if you're using a garbage collector that compacts the heap (i.e., most modern collectors) allocation on the heap isn't much different from allocation on the stack. You simply keep a pointer to boundary between allocated and free memory, and to allocate some space, you just move that pointer, just like you would on the stack. Objects that will have extremely short lives (like the locals in most functions) cost next to nothing in a GC cycle too. Keeping a live object accessible takes (a little) work, but an object that's no longer accessible normally involves next to no work.
There is, however, often still a substantial advantage to using the stack for most variables. Many typical programs tend to run for fairly extended periods of time using nearly constant amounts of stack space. They enter one function, create some variables, use them for a while, pop them off the stack, then repeat the same cycle in another function.
This means most of the memory toward the top of the stack is almost always in the cache. Most function calls are re-using memory that was just vacated by the previous function call. By reusing the same memory continuously, you end up with considerably better cache usage.
By contrast, when you allocate items in the heap, you typically end up allocating separate space for nearly every item. You cache is in a constant state of "churn", throwing away the memory for objects you're no longer user to make space for newly allocated ones. Unless you use a minuscule heap, the chances of re-using an address while it's still in the cache are nearly nonexistent.

I'm sure this is answered a million times online, but...
Because you don't want every method call to be a memory allocation (slow). So, you pre-allocate your stack.
Some more reasons listed here (including security).

The answer is that you get holes when you allocate and de-allocate on the heap. This means that it gets more and more difficult to allocate memory since the places that are available are different sizes. The stack only reserves what is needed and gives it all back when you get out of scope. No hassle.

If everything was on the stack, each time you passed those values on, they would have to be copied. However, unlike the heap, it doesn't need to be cleverly managed - items on the heap require garbage collection.
So they work in two different ways that suit two different uses. The stack is a quick and lightweight home for values to be held for a short time whereas the heap allows you to pass objects around without copying them.
Neither stack nor heap is perfect for every scenario - that is why they both exist.

Using the heap requires "requesting" a bit of memory from the heap, using new or some similar function. Then, when it's finished, you delete the it again. This is very useful for variables that are long-lived and/or that take up quite a bit of space (or take up an "unknown at compile-time" space - for example if you read a string into a variable from a file, you don't necessarily know how much space it needs, and it's REALLY annoying to get a message from the program saying "String too large on line X in file Y").
On the other hand, the stack is "free" both when it comes to allocating and de-allocating (technically, any function that uses stack-space will need one extra instruction for the allocation of the stackspace, but compared to the several hundred or thousands that a call to new will involve, it's not noticeable). Of course, class objects will still have to have their respective constructors called, which may take almost any amount of time to complete, but that is true regardless of how/where the storage is allocated from.

Fast jvm start / jvm persistancy - starting jvm with data from heap dump

I am developing an in memory data structure, and would like to add persistency.
I am looking for a way to do this fast. I thought about dumpping a heap-dump once in a while.
Is there a way to load this java heap dump as is, into my memory? or is it impossible?
Otherwise, other suggestions for fast write and fast read of the entire information?
(Serialization might take a lot of time)
-----------------edited explination:--------
Since my memory might be full of small pieces of information, referencing each other - and so serialization may require me to in efficeintly scan all my memory. reloading is also possibly problematic.
On the other hand, I can define a gigantic array, and each object I create, I shall put it in the array. Links will be a long number, reperesnting the place in the array. Now, I can just dump this array as is - and also reload it as is.
There are even some jvms like JRockit that utilize the disk space, and so maybe it is possible maybe to dump as is very quickly and to re-load very quicky.
To prove my point, java dump contains all the information of the jvm, and it is produced quickly.
Sorry, but serialization of 4GB isn't even close to being in the seconds dump is.
Also, memory is memory and there are operating systems that allow you a ram memory dump quicky.
https://superuser.com/questions/164960/how-do-i-dump-physical-memory-in-linux
When you think about it... this is quite a good strategy for persistant data structures. There is quite a hype about in-memory data bases in the last decade. But why settel for that? What if I want a fibonacci heap - to be "almost persistant". That is, every 5 minutes I will dump the inforamtion (quickly) and in case of a electrical outage, I have a backup from 5 minutes ago.
-----------------end of edited explination:--------
Thank you.

In general, there is no way to do this on HotSpot.
Objects in the heap have 2 words of header, the second of which points into permgen for the class metadata (known as a klassOop). You would have to dump all of permgen as well, which includes all the pointers to compiled code - so basically the entire process.
There would be no sane way to recover the heap state correctly.
It may be better to explain precisely what you want to build & why already-existing products don't do what you need.

Use Serialization. Implement java.io.Serializable, add serialVersionUID to all of your classes, and you can persist them to any OutputStream (file, network, whatever). Just create a starting object from where all your object are reachable (even indirectly).
I don't think that Serialization would take long time, it's optimized code in the JVM.

You can use jhat or jvisualvm to load your dump to analyze it. I don't know whether the dump file can be loaded and restarted again.

Why is the default size of PermGen so small?

What would be the purpose of limiting the size of the Permgen space on a Java JVM? Why not always set it equal to the max heap size? Why does Java default to such a small number of 64MB? Are they trying to force people to notice permgen issues in their code by doing this?
If my app uses 85MB of permgen, then it might be safe to set it to 96MB but why set it so small if its just really part of the main heap? Wouldn't it be efficient to allow the JVM to use as much PermGen as the heap allows?

The PermGen is set to disappear in JDK8.
What would be the purpose of limiting the size of the Permgen space on a Java JVM?
Not exhausting resources.
Why not always set it equal to the max heap size?
The PermGen is not part of the Java heap. Besides, even if it was, it wouldn't be of much help to the application to fill the heap with class metadata and constant Strings, since you'd then get "OutOfMemoryError: Java heap size" errors instead.

Conceptually to the programmer, you could argue that a "Permanent Generation" is largely pointless. If you need to load a class or other "permanent" data and there is memory space left, then in principle you may as well just load it somewhere and not care about calling the aggregate of these items a "generation" at all.
However, the rationale is probably more that:
there is potentially a benefit (e.g. from a processor cache point of view) from having all code/class metadata near together in memory space, and to guarantee this it is easier to allocate fixed sized area(s);
similarly, memory space where code/class metadata is stored potentially has certain "special" properties (notably, you don't want it to get paged out to disk if you can help it) and the system may not be able to set such properties on memory in a very granular way, so that it is more practical to have all "special" objects together in one (or a small number of) contiguous block or memory space;
having permanent objects all together helps avoid fragmenting the remaining memory space and again, the most practical way to do this is to allocate one contiguous block of memory of fixed size from the outset.
So as I see things, most of the time the reason for allocating a permanent "generation" is really for practical implementation reasons than because the programmer really cares terribly much.
On the other hand, the situation isn't usually terrible for the programmer either: the amount of permanent generation needed is usually predictable, so that you should be able to allocate the required amount with decent leeway. So if you find you are unexpectedly exceeding the allocation, this may well be a signal that "something serious is wrong".
N.B. It is probably the case that some of the issues that the PermGen originally was designed to solve are not such big issues on modern 64-bit processors with larger processor caches. If it is removed in future releases of Java, this is likely a sign that the JVM designers feel it has now "served its purpose".

PermGen is where class data and other static stuff (like string literals) are allocated.
You'd rather allocate memory to the Java heap for your application data (Xms and Xmx, where young (short-lived) and tenured objects go (when the the JVM realizes they need to stay around longer)).
So the historic PermGen 64MB default may be arbitrary but the having you explicitly set it lets you know (and control) how much static data your application is causing the JVM to store.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.