http://courses.washington.edu/css342/zander/css332/arch.html
bottom of the page:
The C++ memory model differs from the Java memory model. In C++,
memory comes from two places, the run time stack and the memory heap.
This reads as if Java doesnt have a heap (or stack)?
I am trying to learn all the "under the bonnet" details for Java and C++
Java has a heap and a (per-thread) stack as well. The difference is that in Java, you cannot choose where to allocate a variable or object.
Basically, all objects and their instance variables are allocated on the heap, and all method parameters and local variables (just the references in the case of objects) are allocated on the stack.
However, some modern JVMs will allocate some objects on the stack as a performance optimization when they detect that the object is only used locally.
Java uses a heap memory model. All objects are created on the heap; references are used to refer to them.
It also puts method frames onto a stack when processing them.
I would say it has both.
Yes, Java have both heap (common to the entire JVM) and stack (one stack per thread).
And having stack & heap is more a property of implementations than of languages.
I would even say that most Linux programs have heap (obtained thru mmap & sbrk system calls) and stack (at the level of the operating system, this is not dependent of the language).
What Java have, but C++ usually not, is a garbage collector. You don't need to release unused memory in Java. But in C++ you need to release it, by calling delete, for every C++ object allocated in the heap with new.
See however Boehm's garbage collector for a GC usable in C & C++. It works very well in practice (even if it can leak in theory, being a conservative, not a precise, GC).
Some restricted C++ or C environments (in particular free standing implementations for embedded systems without operating system kernel) don't have any heap.
Related
I haven't deep dive into how Java treats memory when a program is running as I have been in working at application level. I recently had one instance in which I needed to know owing to performance issues of application.
I have been aware of "stack" , "heap" regions of memory and I thought this is the model of a Java program. However, it turns out that it is much more, and beyond that.
For example, I came across terms like: Eden, s0, s1, Old memory and so on. I was never aware of these terminologies prior.
As Java is / have been changing and so may be these terminologies are/aren't relevant as of Java 8.
Can anyone guide where to get this information and under what circumstance we need to know them? Are these part of main memory that is RAM.
Eden, s0, s1, Old memory and other memory areas exist only in the context of the specific garbage collector implementation e.g. generational collectors like G1 will divide the heap into mentioned areas however non-generational collectors like ZGC will not.
Start by reviewing the main garbage collectors in the JVM:
ParNew
CMS
G1
ZGC / Shenandoah / Azul C4
and then try to understand related concepts:
Thread-local allocation buffers (TLAB)
Escape analysis
String constant pools, string interning, string de-duplication
Permanent generation vs Metaspace
Object layout e.g. why boolean is not taking 1 bit (word tearing)
Native memory e.g. JNI or off-heap memory access
I don't believe that there is a single website that will explain the full JVM memory management approach.
Java, as defined by the Java Language Specification and the Java Virtual Machine Specification talks about the stack and the heap (as well as the method area).
Those are the things that are needed to describe, conceptually, what makes a Java Virtual Machine.
If you wanted to implement a JVM you'd need to implement those in some way. They are just as valid in Java 13 as they were back in Java 1. Nothing has fundamentally changed about how those work.
The other terms you mentioned (as well as "old gen", "new gen", ...) are memory areas used in the implementation of specific garbage collection mechanisms, specifically those of implemented in the Oracle JDK / OpenJDK.
All of those areas are basically specific parts of the heap. The exact way the heap is split into those areas is up to the garbage collector to decide and knowing about them shouldn't be necessary unless you want to tweak your garbage collector.
Since garbage collectors change between releases and new garbage collector approaches are implemented regularly (as this is one of the primary ways to speed up JVMs), the concrete terms used here will change over the years.
A Java application starts up with one heap for all threads. Each thread has its own stack.
When a Java application is started, we use the JVM option -Xms and -Xmx to control the size of heap and -Xss to control the stack size.
My understanding is that the heap being created becomes a "managed" memory of JVM and all the object being created are placed there.
But how does the stack creation work? Does Java create a stack for each thread when it is created? If so, where exactly the stack is on the memory? It is certainly not in the "managed" heap.
Does JVM create stack from native memory or does it pre-allocate a section of managed memory area for stack? If so, how does JVM know how may threads will be created?
There are a few things about thread stacks that the Java specification tells us. Among other things:
Each Java Virtual Machine thread has a private Java Virtual Machine stack, created at the same time as the thread.
Because the Java Virtual Machine stack is never manipulated directly except to push and pop frames, frames may be heap allocated. The memory for a Java Virtual Machine stack does not need to be contiguous.
Specification permits Java Virtual Machine stacks either to be of a fixed size or to dynamically expand and contract as required by the computation.
Now, if we focus on JVM implementations such as HotSpot, we can get some more information. Here are a few facts I've collected from different sources:
The minimum stack size in HotSpot for a thread seems to be fixed. This is what the aforementioned -Xss option is for.
(Source)
In Java SE 6, the default on Sparc is 512k in the 32-bit VM, and 1024k in the 64-bit VM. ... You can reduce your stack size by running with the -Xss option. ...
64k is the least amount of stack space allowed per thread.
JRockit allocates memory separate from the heap where stacks are located. (Source)
Note that the JVM uses more memory than just the heap. For example Java methods, thread stacks and native handles are allocated in memory separate from the heap, as well as JVM internal data structures.
There is a direct mapping between a Java Thread and a native OS Thread in HotSpot. (Source).
But the Java thread stack in HotSpot is software managed, it is not an OS native thread stack. (Source)
It uses a separate software stack to pass Java arguments, while the native C stack is used by the VM itself. A number of JVM internal variables, such as the program counter or the stack pointer for a Java thread, are stored in C variables, which are not guaranteed to be always kept in the hardware registers. Management of these software interpreter structures consumes a considerable share of total execution time.
JVM also utilizes the same Java thread stack for the native methods and JVM runtime calls (e.g. class loading). (Source).
Interestingly, even allocated objects may be sometimes located on stack instead on heap as a performance optimization. (Source)
JVMs can use a technique called escape analysis, by which they can tell that certain objects remain confined to a single thread for their entire lifetime, and that lifetime is bounded by the lifetime of a given stack frame. Such objects can be safely allocated on the stack instead of the heap.
And because an image is worth a thousand words, here is one from James Bloom
Now answering some of your questions:
How does JVM knows how may threads will be created?
It doesn't. Can be easily proved by contradiction by creating a variable number of threads. It does make some assumptions about the maximum number of threads and stack size of each thread. That's why you may run out of memory (not meaning heap memory!) if you allocate too many threads.
Does Java create a stack for each thread when it is created?
As mentioned earlier, each Java Virtual Machine thread has a private Java Virtual Machine stack, created at the same time as the thread. (Source).
If so, where exactly the stack is on the memory? It is certainly not in the "managed" heap.
As stated above, Java specification allows stack memory to be stored on heap, technically speaking. But at least JRockit JVM uses a different part of memory.
Does JVM create stack from native memory or does it pre-allocate a section of managed memory area for stack?
The stack is JVM managed because the Java specification prescribes how it must behave: A Java Virtual Machine stack stores frames (§2.6). A Java Virtual Machine stack is analogous to the stack of a conventional language. One exception are Native Method stacks used for native methods. More about this again in the specification.
JVM uses more memory than just the heap. For example Java methods,
thread stacks and native handles are allocated in memory separate from
the heap, as well as JVM internal data structures.
Further reading.
So to answer your questions:
Does Java create a stack for each thread when it is created?
Yes.
If so, where exactly the stack is on the memory?
In the JVM allocated memory, but not on the heap.
If so, how does JVM knows how may threads will be created?
It doesn't.
You can create as many as you'd like until you've maxed out your JVM memory and get
Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread
EDIT:
All of the above refers to Jrockit JVM, although i find it hard to believe other JVMs would be different on such fundamental issues.
If an object X exists in java heap, and if I knew the address of the object X on java heap, is it possible for the native code to access this object directly from memory without involving JNI? And vice versa, if java code does know the address of object Y on native heap, can java access it without involving JNI?
To be more precise, "Does the java objects gets stored in memory the same way as the native object or is it any different?". If not, wont byteArray objects in java and native gets stored in the same way?
Please provide your suggestions and references.
EDIT: Might be this one is the right question, why do the objects need to be transferred from java heap to native heap through JNI? Why cant the java heap object is accessible to native heap directly?
Can Java code access native objects? No. Java code is managed by the JVM. (More precisely, it's bytecode, not Java code.) The specification of the JVM does not allow bytecode to access arbitrary memory. Bytecode can't even access arbitrary addresses on the JVM heap. For example, private fields can only be accessed by bytecode in the same class.
Can native code access JVM heap objects directly (without JNI)? Yes. Native code is running in the same process and address space as the JVM. As far as I know, on most operating systems and hardware platforms this means that native code can do whatever it wants in that address space.
Should native code access JVM heap objects directly? Definitely not.
First of all, the JVM specification does not specify the layout of objects on the JVM heap, not even of byte arrays. For example, the JVM may split the array into chunks and transparently translate addresses when bytecode uses the array. If you tried to write native code that accesses the array, you would have to re-implement that translation. Such code may work in one JVM implementation, but probably not in another, or maybe not even in a newer version of the same JVM, or in the same JVM when it runs with a different configuration. That's one reason why you have to use JNI: it gives native code a well-defined "view" of objects on the JVM heap.
Secondly, the JVM garbage collector can move around objects on the heap anytime. Native code should access JVM heap objects through handles. The garbage collector knows about it and updates the handles if necessary. Native code that tries to bypass a handle can never be sure if the object is still there.
A third problem is native code that directly modifies pointers between objects on the JVM heap. Depending on the garbage collector algorithm, this may cause all kinds of problems.
In a nutshell: You probably could access JVM heap objects from native code directly, but you almost certainly shouldn't.
Short answer: No.
Other than being a Java/C++ issue, that contradicts with basic OS concepts. Since each process has its own address space, one process cannot reach any object of others.
This limitation can be mitigated only if the process (that tries to reach other's memory) runs in kernel space and the underlying OS allows operations, or some utility like "shared memory" is involved. Even if this were the case, you will face with virtual address space problem. The same physical portions of memory is addressed with different values in different processes. That's why, if you think that you know the address of an object, this address is virtual and useless in other processes.
EDIT: If they are not in different processes, then the answer is definitely yes. Theoretically, you can implement your own JNI :).
a possible answer is using the APR (Apache Portable Runtime) yeah i know it's JNI based but it have concept of Shared memory. so it's possible to bind a shared memory space created by another program (and vice-versa)
https://apr.apache.org/docs/apr/1.5/group__apr__shm.html
ouside of the JNI part, this not seems possible.
I come from C/C++ background, where a process memory is divided into:
Per thread stack
Heap
Instructions
Data
I am trying to understand how JVM works, I looked at different resources, I gathered that the JVM memory is divided into heap and stack as well plus few other things.
I want to wrap my mind around this, when I read heap and stack in JVM are we talking about concepts of stack and heap? and that the actual memory of the entire JVM resides on the heap (and here I mean the C++ concept of a Heap)?
I want to wrap my mind around this, when I read heap and stack in JVM are we talking about concepts of stack and heap?
Yes, in general this is the case. Each thread has its own per-thread stack, which is used to store local variables in stack frames (corresponding to method calls). The stack need not be located in a location related to the per-thread stack at the OS level. If the stack attempts to grow past a size as specified by -Xss or a default set by the implementation, a StackOverflowError will be thrown.
The stack can exist in C/C++ heap memory, and need not be contiguous (JVM spec v7):
Each Java Virtual Machine thread has a private Java Virtual Machine stack, created at the same time as the thread. A Java Virtual Machine stack stores frames (§2.6). A Java Virtual Machine stack is analogous to the stack of a conventional language such as C: it holds local variables and partial results, and plays a part in method invocation and return. Because the Java Virtual Machine stack is never manipulated directly except to push and pop frames, frames may be heap allocated. The memory for a Java Virtual Machine stack does not need to be contiguous.
The Java heap is a means of storing objects, including automatic garbage collection when objects are no longer reachable via strong references. It is shared between all threads running on a JVM.
The Java Virtual Machine has a heap that is shared among all Java Virtual Machine threads. The heap is the run-time data area from which memory for all class instances and arrays is allocated.
The heap is created on virtual machine start-up. Heap storage for objects is reclaimed by an automatic storage management system (known as a garbage collector); objects are never explicitly deallocated. The Java Virtual Machine assumes no particular type of automatic storage management system, and the storage management technique may be chosen according to the implementor's system requirements. The heap may be of a fixed size or may be expanded as required by the computation and may be contracted if a larger heap becomes unnecessary. The memory for the heap does not need to be contiguous.
By simply calling a constructor (e.g. HashMap foo = new HashMap()) the JVM will allocate the requisite memory on the heap for this object (or throw an OutOfMemoryError if that is not possible). It's also important to note that objects never live on the stack--only references to them do. Additionally, non-primitive fields also always contain references to objects.
It's also possible to allocate memory off-heap through sun.misc.Unsafe on some JVMs, some NIO classes that allocate direct buffers, and through the use of JNI. This memory is not part of the JVM heap and does not undergo automatic garbage-collection (meaning that it would need to be released through means such as delete, but it may be a part of heap memory as C++ may refer to it.
I've been running into a peculiar issue with certain Java applications in the HP-UX environment.
The heap is set to -mx512, yet, looking at the memory regions for this java process using gpm, it shows it using upwards of 1.6GBs of RSS memory, with 1.1GB allocated to the DATA region. Grows quite rapidly over a 24-48hour period and then slows down substantially, still growing 2MB every few hours. However, the Java heap shows no sign of leakage.
Curious how this was possible I researched a bit and found this HP write-up on memory leaks in java heap and c heap: http://docs.hp.com/en/JAVAPERFTUNE/Memory-Management.pdf
My question is what determines what is ran in the C heap vs the java heap, and for things that do not run through the java heap, how would you identify those objects being run on the C heap? Additionally does the java heap sit inside the C heap?
Consider what makes up a Java process.
You have:
the JVM (a C program)
JNI Data
Java byte codes
Java data
Notably, they ALL live in the C heap (the JVM Heap is part of the C heap, naturally).
In the Java heap is simply Java byte codes and the Java data. But what is also in the Java heap is "free space".
The typical (i.e. Sun) JVM only grows it Java Heap as necessary, but never shrinks it. Once it reaches its defined maximum (-Xmx512M), it stops growing and deals with whatever is left. When that maximum heap is exhausted, you get the OutOfMemory exception.
What that Xmx512M option DOES NOT do, is limit the overall size of the process. It limits only the Java Heap part of the process.
For example, you could have a contrived Java program that uses 10mb of Java heap, but calls a JNI call that allocates 500MB of C Heap. You can see how your process size is large, even though the Java heap is small. Also, with the new NIO libraries, you can attach memory outside of the heap as well.
The other aspect that you must consider is that the Java GC is typically a "Copying Collector". Which means it takes the "live" data from memory it's collecting, and copies it to a different section of memory. This empty space that is copies to IS NOT PART OF THE HEAP, at least, not in terms of the Xmx parameter. It's, like, "the new Heap", and becomes part of the heap after the copy (the old space is used for the next GC). If you have a 512MB heap, and it's at 510MB, Java is going to copy the live data someplace. The naive thought would be to another large open space (like 500+MB). If all of your data were "live", then it would need a large chunk like that to copy into.
So, you can see that in the most extreme edge case, you need at least double the free memory on your system to handle a specific heap size. At least 1GB for a 512MB heap.
Turns out that not the case in practice, and memory allocation and such is more complicated than that, but you do need a large chunk of free memory to handle the heap copies, and this impacts the overall process size.
Finally, note that the JVM does fun things like mapping in the rt.jar classes in to the VM to ease startup. They're mapped in a read only block, and can be shared across other Java processes. These shared pages will "count" against all Java processes, even though it is really only consuming physical memory once (the magic of virtual memory).
Now as to why your process continues to grow, if you never hit the Java OOM message, that means that your leak is NOT in the Java heap, but that doesn't mean it may not be in something else (the JRE runtime, a 3rd party JNI librariy, a native JDBC driver, etc.).
In general, only the data in Java objects is stored on the Java heap, all other memory required by the Java VM is allocated from the "native" or "C" heap (in fact, the Java heap itself is just one contiguous chunk allocated from the C heap).
Since the JVM requires the Java heap (or heaps if generational garbage collection is in use) to be a contiguous piece of memory, the whole maximum heap size (-mx value) is usually allocated at JVM start time. In practice, the Java VM will attempt to minimise its use of this space so that the Operating System doesn't need to reserve any real memory to it (the OS is canny enough to know when a piece of storage has never been written to).
The Java heap, therefore, will occupy a certain amount of space in memory.
The rest of the storage will be used by the Java VM and any JNI code in use. For example, the JVM requires memory to store Java bytecode and constant pools from loaded classes, the result of JIT compiled code, work areas for compiling JIT code, native thread stacks and other such sundries.
JNI code is just platform-specific (compiled) C code that can be bound to a Java object in the form of a "native" method. When this method is executed the bound code is executed and can allocate memory using standard C routines (eg malloc) which will consume memory on the C heap.
My only guess with the figures you have given is a memory leak in the Java VM. You might want to try one of the other VMs they listed in the paper you referred. Another (much more difficult) alternative might be to compile the open java on the HP platform.
Sun's Java isn't 100% open yet, they are working on it, but I believe that there is one in sourceforge that is.
Java also thrashes memory by the way. Sometimes it confuses OS memory management a little (you see it when windows runs out of memory and asks Java to free some up, Java touches all it's objects causing them to be loaded in from the swapfile, windows screams in agony and dies), but I don't think that's what you are seeing.