How to optimize the unused space in the Java heap

How to optimize the unused space in the Java heap - java

Do not take my word on this. I am just repeating what I have pieced together from different sources. HotSpot JVM uses Thread Local Allocation Buffers (TLABs). TLABs can be synchronized or not. Most of the time the TLABs are not synchronized and hence a thread can allocate very quickly. There are a large number of these TLABs so that the active threads get their own TLABs. The less active threads share a synchronized TLAB. When a thread exhausts its TLAB, then it gets another TLAB from a pool. When the pool runs out of TLABs, then Young GC is triggered or needed.
When the pool runs out of TLABs, there are still going to be TLABs with space left in them. This "unused space" adds up and is significant. One can see this space because GC is triggered before the reserved heap size or the max heap size is reached. Thus, the heap is effectively 10-30% smaller. At least that is my guess from looking at heap usage graphs.
How do I tune the JVM to reduce the unused space?

You can tweak that setting with the command-line option -XX:TLABSize
However as with most of these "deep down and dirty" settings, you should be very careful when changing those and monitor the effect of your changes closely.

You are correct that once there are no TLABs, there will be a young generation collection and they will be cleaned.
I can't tell much, but there is ResizeTLAB that allows for the JVM to resize it based on allocations stats I guess, eden size, etc. There's also a flag called TLABWasteTargetPercent (by default it is 1%). When the current TLAB can not fit one more object, JVM has to decide what to do : allocate directly to the heap, or allocate a new TLAB.
If this objects size is bigger than 1% of the current TLAB size it is allocated directly; otherwise the current TLAB is retired.
So let's say current size of the TLAB (TLABSize, by default it is zero, meaning it will be adaptive) is 100 bytes (all numbers are theoretical), 1% of that is 1 byte - that's the TLABWasteTargetPercent. Currently your TLAB is filled with 98 bytes and your object that you want to allocate is 3 bytes. It will not fit in this TLAB and at the same time it is bigger than 1 byte threshold => it is allocated directly on the heap.
The other way around is that your TLAB is full with 99.7 bytes and you try to allocate a 1/2 byte object - it will not fit; but it is smaller than 1 byte; thus this TLAB is committed and a new one is given to you.
As far as I understand, there is one more parameter called TLABWasteIncrement - when you fail to allocate in the TLAB (and allocate directly in the heap) - so that this story would not happen forever, the TLABWasteTargetPercent is increased by this value (default of 4%) increasing the chances of retiring this TLAB.
There is also TLABAllocationWeight and TLABRefillWasteFraction - will probably update this post a bit later with them

The allocation of TLABs when there is not enough space has a different algorithm but generally what you say about the free space is right.
The question now is how can you be sure that the default TLAB config is not right for you? You need to start by getting some logs by using -XX:+PrintTLAB and if you see that the space that is not used is too much then you need to try to increase/reduce the TLAB size or change -XX:TLABWasteTargetPercent or -XX:TLABWasteIncrement as people said.
This is an article I find useful when I go through TLABs: https://alidg.me/blog/2019/6/21/tlab-jvm

Related

Under which circumstance will JVM decide to grow size of heap?

A JVM application runs on Oracle Hotspot JVM, it starts up with default JVM settings, but with 100MB of initial heap size and 1GB of maximum heap size.
Under which circumstances will JVM decide to grow the current heap size, instead of trying GC?

HotSpot JVM continuously monitors allocation rates and objects lifetimes. It tries to achieve two key factors:
let short-lived objects die in eden
promote long-lived object to heap on time to prevent unnecessarily copying between survivor spaces
In a nutshell you can describe it as the HotSpot have some configured threshold which indicates how much pecentage of total allocated heap have to by free after running garbage collector. For example if this threshold is configured for 70% and after running full GC heap usage will be 80%, then additional memory will be allocated to hit the threshold. Of course bigger heap means longer pauses while smaller heap means more frequent collections.
But you have to remember that JVM is very complex, and you can change this behaviour, for example by using flags:
AdaptiveSizePausePolicy, which will pick heap size to achieve shortest pauses
AdaptiveSizeThroughPutPolicy, which will pick heap size to achieve highest throughtput
GCTimeLimit and GCTimeRatio, which sets time spent in application execution

Number of object which occupies the Heap increases while garbage collection is not possible.
When objects not possible to collect as garbage since they are use by current process, JVM need to increase it's heap size towards it is maximum to allow to create new objects.

Java heap size not entirely used

I'm currently monitoring my running java application with Visual VM: http://visualvm.java.net/
I'm stressing the memory usage by with -Xmx128m.
When running I see the heap size increasing to 128m (as expected) however the used heap converges to approximately 105m before I run into a java heap space error.
Why are these remaining 20m, not used?

You need to understand a central fact about garbage collector ergonomics:
The costly part of garbage collection is finding and dealing with the objects that are NOT garbage.
This means: as the heap gets close to its maximum capacity, the GC will spend more and more time for less and less return in reclaimed space. If the GC was to try and use every last byte of memory, the net result would be that your JVM would spend more and more time garbage collecting, until ... eventually ... almost no useful work was being done.
To avoid this pathological situation, the JVM monitors the ratio of time is spent GC'ing and doing useful work. When the ratio exceeds a configurable threshold value, the GC raises an OutOfMemoryError ... even though (technically) there is free memory available. This is probably what you are seeing, though the other explanations are equally plausible.
You can change the GC thresholds, generation sizes, etc via JVM options, but it is probably better not to. A better idea is to figure out why your application's memory usage is continually creeping upwards. There are most likely memory leaks ... i.e. a bugs ... in your code that are causing this. Spend your effort finding and fixing those bugs, rather than worrying about why you are not using all of the memory.
(In fact, you are using it ... but not all of the time.)

The heap is split up in Young-Generation (Eden-Space, and two Survivor-Spaces of identical size usually called From and To), Old Generation (Tenured) and Permanent Space.
The Xmx/Xms option sets the overall heap size. So a region (with a default size) is actually the Permanent Space - and maybe, we don't know details about your stress test, no objects are actually moved from eden to tenured or permanent, so those regions remain empty while Eden runs out of space.

Java splits its memory into generations. You can get a heap space error if the tenured generation fills. Normally, they resize dynamically but if you have set a fixed size it won't.

Heap memory behaviour

I always had a question about heap memory behaviour.
Profiling my app i get the above graph. Seems all fine. But what i don't understand why,at GC time, the heap grows a litle bit, even there is enough memory (red circle).
That means for a long running app that it will run out of heap space at some time ?

Not necessarily. The garbage collector is free to use up to the maximum allocated heap in any way it sees fit. Extrapolating future GC behaviour based on current behaviour (but with different memory conditions) is in no way guaranteed to be accurate.
This does have the unfortunate side effect that it's very difficult to determine whether an OutOfMemoryError is going to happen unless it does. A legal (yet probably quite inefficient) garbage collector could simply do nothing until the memory ceiling was hit, then do a stop-the-world mark and sweep of the entire heap. With this implementation, you'd see your memory constantly increasing, and might be tempted to say that an OOME was imminent, but you just can't tell.
With such small heap sizes, the increase here is likely just due to bookkeeping/cache size alignment/etc. You're talking about less than 50KB or so looking at the resolution on the scale, so I shouldn't be worried.
If you do think there's a legitimate risk of OutOfMemoryErrors, the only way to show this is to put a stress test together and show that the application really does run out of heap space.

The HotSpot garbage collectors decide to increase the total heap size immediately after a full GC has completed if the ratio of free space to total heap size falls below a certain threshold. This ratio can be tuned using one of the many -XX options for the garbage collector(s).
Looking at the memory graph, you will see that the heap size increases occur at the "saw points"; i.e. the local maxima. Each of these correspond to running a full GC. If you look really carefully at the "points" where the heap gets expanded, you will see that in each case the amount of free space immediately following the full GC is a bit higher than the previous such "point".
I image that what is happening is that you application's memory usage is cyclical. If the GC runs at or near a high point of the cycle, it won't be able to free as much memory as if the GC runs at or near a low point. This variability may be enough to cause the GC to expand the heap.
(Another possibility is that your application has a slow memory leak.)
That means for a long running app that it will run out of heap space at some time ?
No. Assuming that your application's memory usage (i.e. the integral of space occupied by reachable objects) is cyclic, the heap size will approach a fixed high limit and never exceed it. Certainly OOME's are not inevitable.

Why reduce the size of the Java JVM thread stack?

I was reading an article on handling Out Of Memory error conditions in Java (and on JBoss platform) and I saw this suggestion to reduce the size of the threadstack.
How would reducing the size of the threadstack help with a max memory error condition?

When Java creates a new thread, it pre-allocates a fixed-size block of memory for that thread's stack. By reducing the size of that memory block, you can avoid running out of memory, especially if you have lots of threads - the memory saving is the reduction in stack size times the number of threads.
The downside of doing this is that you increase the chance of a Stack Overflow error.
Note that the thread stacks are created outside of the JVM heap, so even if there's plenty of memory available in the heap, you can still fail to create a thread stack due to running out of memory (or running out of address space, as Tom Hawtin correctly points out).

The problem exists on 32-bit JVMs were address space can get exhausted. Reducing the maximum stack size will not normally decrease the amount of memory actually allocated. Consider 8k threads with 256kB reserved for stack of 1k of 2MB, that's 31 bits of address space (2GB) gone there.
The problem all but disappears with 64-bit JVMs (although the actual amount of memory will increase a bit because references are twice as big). Alternatively, use of non-blocking APIs can remove the need for quite so many threads.

There are N threads in a process, and M bytes of memory is allocated for each thread stack. Total memory allocated for stack usage is N x M.
You can reduce total memory consumed by the stack by reducing the number of threads (N), or reducing the memory allocated for each thread (M).
Often a thread won't use all of the stack. It's pre-allocated "in case" it will be needed later, but if the thread doesn't use a deep call path, or doesn't use recursion, it may not need all of the stack space allocated on its behalf.
Finding the optimal stack size can be an art.

I would try other things (such as changing the survivor ratio or the size of space allocated for class definitions) before trying to change the thread stack size. It is hard to get it right, thus very easy to get a stack overflow error (which is equally fatal as an out of memory error.)
I've never gotten this right even after careful examination. But then again, I might have never encountered a web application/container combination that could be fined-tuned by changing its thread stack size. I've had much better (and non-fatal) results modifying the survivor ratio. But that has been my work experience. In different workplaces and applications, YMMV.

Java nonblocking memory allocation

I read somewhere that java can allocate memory for objects in about 12 machine instructions. It's quite impressive for me. As far as I understand one of tricks JVM using is preallocating memory in chunks. This help to minimize number of requests to operating system, which is quite expensive, I guess. But even CAS operations can cost up to 150 cycles on modern processors.
So, could anyone explain real cost of memory allocation in java and which tricks JVM uses to speed up allocation?

The JVM pre-allocates an area of memory for each thread (TLA or Thread Local Area).
When a thread needs to allocate memory, it will use "Bump the pointer allocation" within that area. (If the "free pointer" points to adress 10, and the object to be allocated is size 50, then we just bump the free pointer to 60, and tell the thread that it can use the memory between 10 and 59 for the object).

The best trick is the generational garbage-collector. This keeps the heap unfragmented, so allocating memory is increasing the pointer to the free space and returning the old value. If memory runs out, the garbage-collection copy objects and creates this way a new unfragmented heap.
As different threads have to synchronize over the pointer to the free memory, if increasing it, they preallocate chunks. So a thread can allocate new memory, without the lock.
All of this is explained in more detail here: http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html

There is no single memory allocator for the JVM. IIRC Sun's JVM and IBM's managed memory differently. However generally the way the JVM will operate is that it will initially allocate one piece of memory, this segment will be small enough to live in the processors cache making all access to this extremely fast.
As the application creates objects, the objects will take memory from within this segment. The object allocation within the segment is simply pointer arithmetic.
Initially the offset address into the freshly minted segment will be zero. The first object allocated will have an 'address' (actually an offset into the segment) of zero. When you allocate object then the memory manager will know how big the object is, allocate that much space within the segment (16 bytes say) and then increment it's "offset address" by that amount meaning that memory allocation is blindingly fast, it's just pointer arithmetic.
Sun have a whitepaper here Memory Management in the JavaHotSpot™ Virtual Machine and IBM used to have a bunch of stuff on ibm.com/developerworks

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.