I read somewhere that java can allocate memory for objects in about 12 machine instructions. It's quite impressive for me. As far as I understand one of tricks JVM using is preallocating memory in chunks. This help to minimize number of requests to operating system, which is quite expensive, I guess. But even CAS operations can cost up to 150 cycles on modern processors.
So, could anyone explain real cost of memory allocation in java and which tricks JVM uses to speed up allocation?
The JVM pre-allocates an area of memory for each thread (TLA or Thread Local Area).
When a thread needs to allocate memory, it will use "Bump the pointer allocation" within that area. (If the "free pointer" points to adress 10, and the object to be allocated is size 50, then we just bump the free pointer to 60, and tell the thread that it can use the memory between 10 and 59 for the object).
The best trick is the generational garbage-collector. This keeps the heap unfragmented, so allocating memory is increasing the pointer to the free space and returning the old value. If memory runs out, the garbage-collection copy objects and creates this way a new unfragmented heap.
As different threads have to synchronize over the pointer to the free memory, if increasing it, they preallocate chunks. So a thread can allocate new memory, without the lock.
All of this is explained in more detail here: http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html
There is no single memory allocator for the JVM. IIRC Sun's JVM and IBM's managed memory differently. However generally the way the JVM will operate is that it will initially allocate one piece of memory, this segment will be small enough to live in the processors cache making all access to this extremely fast.
As the application creates objects, the objects will take memory from within this segment. The object allocation within the segment is simply pointer arithmetic.
Initially the offset address into the freshly minted segment will be zero. The first object allocated will have an 'address' (actually an offset into the segment) of zero. When you allocate object then the memory manager will know how big the object is, allocate that much space within the segment (16 bytes say) and then increment it's "offset address" by that amount meaning that memory allocation is blindingly fast, it's just pointer arithmetic.
Sun have a whitepaper here Memory Management in the JavaHotSpotâ„¢ Virtual Machine and IBM used to have a bunch of stuff on ibm.com/developerworks
Related
Do not take my word on this. I am just repeating what I have pieced together from different sources. HotSpot JVM uses Thread Local Allocation Buffers (TLABs). TLABs can be synchronized or not. Most of the time the TLABs are not synchronized and hence a thread can allocate very quickly. There are a large number of these TLABs so that the active threads get their own TLABs. The less active threads share a synchronized TLAB. When a thread exhausts its TLAB, then it gets another TLAB from a pool. When the pool runs out of TLABs, then Young GC is triggered or needed.
When the pool runs out of TLABs, there are still going to be TLABs with space left in them. This "unused space" adds up and is significant. One can see this space because GC is triggered before the reserved heap size or the max heap size is reached. Thus, the heap is effectively 10-30% smaller. At least that is my guess from looking at heap usage graphs.
How do I tune the JVM to reduce the unused space?
You can tweak that setting with the command-line option -XX:TLABSize
However as with most of these "deep down and dirty" settings, you should be very careful when changing those and monitor the effect of your changes closely.
You are correct that once there are no TLABs, there will be a young generation collection and they will be cleaned.
I can't tell much, but there is ResizeTLAB that allows for the JVM to resize it based on allocations stats I guess, eden size, etc. There's also a flag called TLABWasteTargetPercent (by default it is 1%). When the current TLAB can not fit one more object, JVM has to decide what to do : allocate directly to the heap, or allocate a new TLAB.
If this objects size is bigger than 1% of the current TLAB size it is allocated directly; otherwise the current TLAB is retired.
So let's say current size of the TLAB (TLABSize, by default it is zero, meaning it will be adaptive) is 100 bytes (all numbers are theoretical), 1% of that is 1 byte - that's the TLABWasteTargetPercent. Currently your TLAB is filled with 98 bytes and your object that you want to allocate is 3 bytes. It will not fit in this TLAB and at the same time it is bigger than 1 byte threshold => it is allocated directly on the heap.
The other way around is that your TLAB is full with 99.7 bytes and you try to allocate a 1/2 byte object - it will not fit; but it is smaller than 1 byte; thus this TLAB is committed and a new one is given to you.
As far as I understand, there is one more parameter called TLABWasteIncrement - when you fail to allocate in the TLAB (and allocate directly in the heap) - so that this story would not happen forever, the TLABWasteTargetPercent is increased by this value (default of 4%) increasing the chances of retiring this TLAB.
There is also TLABAllocationWeight and TLABRefillWasteFraction - will probably update this post a bit later with them
The allocation of TLABs when there is not enough space has a different algorithm but generally what you say about the free space is right.
The question now is how can you be sure that the default TLAB config is not right for you? You need to start by getting some logs by using -XX:+PrintTLAB and if you see that the space that is not used is too much then you need to try to increase/reduce the TLAB size or change -XX:TLABWasteTargetPercent or -XX:TLABWasteIncrement as people said.
This is an article I find useful when I go through TLABs: https://alidg.me/blog/2019/6/21/tlab-jvm
I wanted to understand what data structures the heap managers in Java or OS in case of C++ or C keep track of the memory locations used by the threads and processes. One way is to use a map of objects and the memory address and a reverse map of memory starting address and the size of the object in the memory.
But here it won't be able to cater the new memory requests in O(1) time. Is there any better data structure to do this?
Note that unmanaged languages are going to be allocating/freeing memory through system calls, generally not managing it themselves. Still regardless of what level of abstraction (OS to the run time), something has to deal with this:
One method is called buddy block allocation, described well with an example on Wikipedia. It essentially keeps track of the usage of spaces in memory of varying sizes (typically multiples of 2). This can be done with a number of arrays with clever indexing, or perhaps more intuitively with a binary tree, each node tell whether a certain block is free, all nodes on a level representing the same size block.
This suffers from internal fragmentation; as things come and go, you might ended up with your data scattered rather than being efficiently consolidated, making it harder to fit in large data. This could be countered by a more complicated, dynamic system, but buddy blocks have the advantage of simplicity.
The OS keeps track of the process's memory allocation in an overall view - 4KB pages or bigger "lumps" are stored in some form of list.
In the typical Windows implementation (Microsoft's C runtime library) - at least in recent versions, all memory allocations are done through the HeapAlloc() system call. So every single heap allocation goes through to the OS. Whether the OS actually tracks every single allocation or just keeps a map of "what is free, what is used" is another matter. It is my understanding that the heap management code has no list of "current allocations", just a list of freed memory lump
In Linux/Unix, the C library will typically avoid calling the OS for every little allocation, and instead uses a large lump of memory, and splits that up into smaller pieces per allocation. Again, no tracking of allocated memory inside the heap management.
This is done at a process level. I'm not aware of an operating system that differentiates memory allocations on a per-thread level (other than TLS - thread local storage, but that is typically a very small region, outside of the typical heap code management).
So, in summary: the OS and/or C/C++ runtime doesn't actually keep a list of all the used allocations - it keeps a list of "freed" memory [and when another lump is freed, typically will "Join" previous and next consecutive allocations to reduce fragmentation]. When the allocator is firsts started, it's given a large lump, which is then assigned as a single freed allocation. When a request is made, the lump is split into sections and the free list becomes the remainder. When that lump is not sufficient, another big lump is carved off using the underlying OS allocations.
There is a small amount of metadata stored with each allocation, which contains things like "how much memory is allocated", and this metadata is used when freeing the memory. In the typical case, this data is stored immediately before the allocated memory. But there is no way to find the allocation metadata without knowing about the allocations in some other way.
there is no automatic garbage collection in C++. You need to call free/delete for malloc/new heap memory allocations. That's where tools like valgrind(to check memory leak) comes handy. There are other concepts like auto_ptr which automatically frees the heap memory which you can refer to.
Charlie Hunt says that large object is bad for JVM GC in his presentation. Because:
Large objects are expensive to allocate and initialize.
Large objects of different sizes can cause Java heap fregmentation.
How to define large object? How can I know if the object is large object? Thanks
The definition depends on the platform, JVM and JVM configuration. For instance, here is as excerpt from How Garbage Collection differs in the three big JVMs blog post by Michael Kopp:
Large and small objects
The JRockit differentiates between large and small objects during
allocation. The limit for when an object is considered large depends
on the JVM version, the heap size, the garbage collection strategy and
the platform used. (italics mine - DL.) It is usually somewhere between 2 and 128 KB. Large
objects are allocated outside thread local area in in case of a
generational heap directly in the old generation. This makes a lot of
sense when you start thinking about it. The young generation uses a
copy ccollection. At some point copying an object becomes more
expensive than traversing it in ever garbage collection.
To your second question, I am not sure how to obtain that threshold, but specifically in HotSpot you can set it:
-XX:PretenureSizeThreshold=2m
Refer to the HotSpot JVM garbage collection options cheat sheet by Alexey Ragozin for details on this and many many other -XX options.
There is no theoretical definition on its size but this will depend upon your JVM configuration for example if young generation is small then even small classes will be causing too many swaps (GC). If your objects are big enough w.r.t your JVM heap then GC will have to do more work to allocate and claim them from heap. This will lead to "stop the world" problem more often.
Large Objects in general from GC point of view means :
Objects which are expensive to allocate
Objects which are expensive to initialize
Eg: arraylist of size 10000.
What would be the purpose of limiting the size of the Permgen space on a Java JVM? Why not always set it equal to the max heap size? Why does Java default to such a small number of 64MB? Are they trying to force people to notice permgen issues in their code by doing this?
If my app uses 85MB of permgen, then it might be safe to set it to 96MB but why set it so small if its just really part of the main heap? Wouldn't it be efficient to allow the JVM to use as much PermGen as the heap allows?
The PermGen is set to disappear in JDK8.
What would be the purpose of limiting the size of the Permgen space on a Java JVM?
Not exhausting resources.
Why not always set it equal to the max heap size?
The PermGen is not part of the Java heap. Besides, even if it was, it wouldn't be of much help to the application to fill the heap with class metadata and constant Strings, since you'd then get "OutOfMemoryError: Java heap size" errors instead.
Conceptually to the programmer, you could argue that a "Permanent Generation" is largely pointless. If you need to load a class or other "permanent" data and there is memory space left, then in principle you may as well just load it somewhere and not care about calling the aggregate of these items a "generation" at all.
However, the rationale is probably more that:
there is potentially a benefit (e.g. from a processor cache point of view) from having all code/class metadata near together in memory space, and to guarantee this it is easier to allocate fixed sized area(s);
similarly, memory space where code/class metadata is stored potentially has certain "special" properties (notably, you don't want it to get paged out to disk if you can help it) and the system may not be able to set such properties on memory in a very granular way, so that it is more practical to have all "special" objects together in one (or a small number of) contiguous block or memory space;
having permanent objects all together helps avoid fragmenting the remaining memory space and again, the most practical way to do this is to allocate one contiguous block of memory of fixed size from the outset.
So as I see things, most of the time the reason for allocating a permanent "generation" is really for practical implementation reasons than because the programmer really cares terribly much.
On the other hand, the situation isn't usually terrible for the programmer either: the amount of permanent generation needed is usually predictable, so that you should be able to allocate the required amount with decent leeway. So if you find you are unexpectedly exceeding the allocation, this may well be a signal that "something serious is wrong".
N.B. It is probably the case that some of the issues that the PermGen originally was designed to solve are not such big issues on modern 64-bit processors with larger processor caches. If it is removed in future releases of Java, this is likely a sign that the JVM designers feel it has now "served its purpose".
PermGen is where class data and other static stuff (like string literals) are allocated.
You'd rather allocate memory to the Java heap for your application data (Xms and Xmx, where young (short-lived) and tenured objects go (when the the JVM realizes they need to stay around longer)).
So the historic PermGen 64MB default may be arbitrary but the having you explicitly set it lets you know (and control) how much static data your application is causing the JVM to store.
It is not possible to increase the maximum size of Java's heap after the VM has started. What are the technical reasons for this? Do the garbage collection algorithms depend on having a fixed amount of memory to work with? Or is it for security reasons, to prevent a Java application from DOS'ing other applications on the system by consuming all available memory?
In Sun's JVM, last I knew, the entire heap must be allocated in a contiguous address space. I imagine that for large heap values, it's pretty hard to add to your address space after startup while ensuring it stays contiguous. You probably need to get it at startup, or not at all. Thus, it is fixed.
Even if it isn't all used immediately, the address space for the entire heap is reserved at startup. If it cannot reserve a large enough contiguous block of address space for the value of -Xmx that you pass it, it will fail to start. This is why it's tough to allocate >1.4GB heaps on 32-bit Windows - because it's hard to find contiguous address space in that size or larger, since some DLLs like to load in certain places, fragmenting the address space. This isn't really an issue when you go 64-bit, since there is so much more address space.
This is almost certainly for performance reasons. I could not find a terrific link detailing this further, but here is a pretty good quote from Peter Kessler (full link - be sure to read the comments) that I found when searching. I believe he works on the JVM at Sun.
The reason we need a contiguous memory
region for the heap is that we have a
bunch of side data structures that are
indexed by (scaled) offsets from the
start of the heap. For example, we
track object reference updates with a
"card mark array" that has one byte
for each 512 bytes of heap. When we
store a reference in the heap we have
to mark the corresponding byte in the
card mark array. We right shift the
destination address of the store and
use that to index the card mark array.
Fun addressing arithmetic games you
can't do in Java that you get to (have
to :-) play in C++.
This was in 2004 - I'm not sure what's changed since then, but I am pretty sure it still holds. If you use a tool like Process Explorer, you can see that the virtual size (add the virtual size and private size memory columns) of the Java application includes the total heap size (plus other required space, no doubt) from the point of startup, even though the memory 'used' by the process will be no where near that until the heap starts to fill up...
Historically there has been a reason for this limitiation, which was not to allow Applets in the browser to eat up all of the users memory. The Microsoft VM which never had such a limitiation actually allowed to do this which could lead to some sort of Denial of Service attack against the users computer. It was only a year ago that Sun introduced in the 1.6.0 Update 10 VM a way to let applets specify how much memory they want (limited to a certain fixed share of the physical memory) instead of always limiting them to 64MB even on computers that have 8GB or more available.
Now since the JVM has evolved it should have been possible to get rid of this limitation when the VM is not running inside a browser, but Sun obviously never considered it such a high priority issue even though there have been numerous bug reports been filed to finally allow the heap to grow.
I think the short, snarky, answer is because Sun hasn't found it worth the time and cost to develop.
The most compelling use case for such a feature is on the desktop, IMO, and Java has always been a disaster on the desktop when it comes to the mechanics of launching the JVM. I suspect that those who think the most about those issues tend to focus on the server side and view any other details best left to native wrappers. It is an unfortunate decision, but it should just be one of the decision points when deciding on the right platform for an application.
My gut feel is that it has to do with memory management with respect to the other applications running on the operating system.
If you set the maximum heap size to, for example, the amount of RAM on the box you effectively let the VM decide how much memory it requires (up to this limit). The problem with this is that the VM could effectively cripple the machine it is running on because it will take over all the memory on the box before it decides that it needs to garbage collect.
When you specify max heap size, what you're saying to the VM is, you are allowed to use this amount of memory before you need to start garbage collecting. You cannot have more because if you take more then the other applications running on the box will slow down and you will start swapping to the disk if you use more than this.
Also be aware that they are two values with respect to memory, that is "current heap size" and "max heap size". The current heap size is how much memory the heap size is currently using and, if it requires more it can resize the heap but it cannot resize the heap above the value of maximum heap size.
From IBM's performance tuning tips (so may not be directly applicable to Sun's VMs)
The Java heap parameters influence the behavior of garbage collection. Increasing the heap size supports more object creation. Because a large heap takes longer to fill, the application runs longer before a garbage collection occurs. However, a larger heap also takes longer to compact and causes garbage collection to take longer.
The JVM has thresholds it uses to manage the JVM's storage. When the thresholds are reached, the garbage collector gets invoked to free up unused storage. Therefore, garbage collection can cause significant degradation of Java performance. Before changing the initial and maximum heap sizes, you should consider the following information:
In the majority of cases you should set the maximum JVM heap size to value higher than the initial JVM heap size. This allows for the JVM to operate efficiently during normal, steady state periods within the confines of the initial heap but also to operate effectively during periods of high transaction volume by expanding the heap up to the maximum JVM heap size. In some rare cases where absolute optimal performance is required you might want to specify the same value for both the initial and maximum heap size. This will eliminate some overhead that occurs when the JVM needs to expand or contract the size of the JVM heap. Make sure the region is large enough to hold the specified JVM heap.
Beware of making the Initial Heap Size too large. While a large heap size initially improves performance by delaying garbage collection, a large heap size ultimately affects response time when garbage collection eventually kicks in because the collection process takes more time.
So, I guess the reason that you can't change the value at runtime is because it may not help: either you have enough space in your heap or you don't. Once you run out, a GC cycle will be triggered. If that doesn't free up the space, you're stuffed anyway. You'd need to catch the OutOfMemoryException, increase the heap size, and then retry you calculation, hoping that this time you have enough memory.
In general the VM won't use the maximum heap size unless you need it, so if you think you might need to expand the memory at runtime, you could just specify a large maximum heap size.
I admit that's all a bit unsatisfying, and seems a bit lazy, since I can imagine a reasonable garbage collection strategy which would increase the heap size when GC fails to free enough space. Whether my imagination translates to a high performance GC implementation is another matter though ;)