I understand that a larger heap means longer GC pauses. I'm okay with that -- my code is doing analysis of some data, and all I care about is minimizing the time spent doing garbage collection, the length of a single pause doesn't make a difference to me.
Can making the heap too large hurt performance? My understanding is that "young" objects get GC'd quickly, but "old" objects can take longer, so my worry is that a large heap will push some short-lived objects into the longer-lived space. I do a lot of allocation of strings that get thrown away quickly (on the order of 60 GB over the course of a single run) and so I don't want to increase GC time spent on those.
I'm testing on a machine with 8 gb of RAM, so I've been running my code with -Xms4g -Xmx4g, and as of my last profiled run, I spent about 20% of my runtime doing garbage collection. I found that increasing the heap to 5 gb helped reduce it. The production server will have 32 gb of RAM, and much higher memory requirements.
Can I safely run it with -Xms31g -Xmx31g, or might that end up hurting performance?
Can making the heap too large hurt performance?
When you go over 31 GB you can lose CompressedOops which can mean you have to jump to 48 GB just to get more usable memory. I try to keep under 31 GB if I can.
My understanding is that "young" objects get GC'd quickly, but "old" objects can take longer, so my worry is that a large heap will push some short-lived objects into the longer-lived space.
For this reason I tend to have large young generations, e.g. up to 24 GB.
Can I safely run it with -Xms31g -Xmx31g, or might that end up hurting performance?
On a 32 GB machine this would be very bad. By the time you include the off heap the JVM uses, the OS, the disk cache, you are likely to find that a heap over 24-28 GB will hurt performance. I would start with 24 GB and see how that goes, you might find you can reduce it will little effect if 5 GB runs ok now.
You might find moving your data off heap will help GC times. I have run systems with 1 GB heap and 800 GB off heap, but it depends on your applications requirements.
I spent about 20% of my runtime doing garbage collection
I suggest you reduce your allocation rate. Using a memory profiler you can reduce your allocation rate to below 300 MB/s, but less than 30 MB/s is better. For an extreme system you might want less than 1 GB/hour as this would allow you to run all day without a minor collection.
Related
What are the limitations on Java heap space? What are the advantages/disadvantages of not increasing the size to a large value?
This question stems from a program that I am running sometimes running out of space. I am aware that we can change the size of the heap to a larger value in order to reduce the chance of this. However, I am looking for pros and cons of keeping the heap size small or large. Does decreasing it increase speed?
Increasing the heap size may, under some circumstances, delay the activities of the garbage collector. If you are lucky, it may be delayed until your program has finished its work and terminated.
Depending on the data structures that you are moving around in your program, a larger heap size may also improve performance because the GC can work more effectively. But to tune a program into that direction is … tricky, at best.
Using in-memory caches in your program will definitely improve performance (if you have a decent cache-hit ratio), and of course, this will need heap space, perhaps a lot.
But if your program reports OutOfMemoryError: heap space because of the data to process, you do not have much alternatives other than increasing the heap space; performance is your least problem in this case. Or you change your algorithm that it will not load all data into memory, instead processing it on disk. But then again, performance is out of the door.
If you run a server of some kind, having about 80% to 85% heap utilisation is a good value, if you do not have heavy spikes. If you know for sure that incoming request do not cause significant additional memory load, you may even go up to 95%. You want value for money, and you paid for the memory, one or the other way – so you want to use it!
You can even set Xms and Xmx to different values; then the JVM can increase the heap space when needed, and today it can even release that additional memory if no longer needed. But this increase costs performance – on the other side, a slow system is always better than one that crashes.
A too small heap size may affect performance if your system also does not have enough cores, so that the garbage collectors do compete over the CPU with the business threads. At some point, the CPU spends a significant time on garbage collection. Again, if you run a server, this behaviour is nearly unavoidable, but for a tool-like program, increasing the heap can prevent it (because the program may come to an end before the GC needs to get active). This is already said in the beginning …
Recently I read an article stating that 2 -4 GB of heap per four core must be allocated on a site. I do understand that faster CPU and more cores means more throughput from single VM, perhaps faster GC execution but what is the exact relationship. I am looking for some details like benchmarks etc.
Larger the Heap size is, the longer GC pause times will be. The shorter it is, the more expensive GC will be (usually). A good guideline is 100 MB per CPU core.
I have an application running with -mx7000m. I can see it's allocated 5.5 GB heap. Yet for some reasons it's GCing constantly, and that being CMS it turns out to be quite CPU intensive. So, it has 5.5 GB heap already allocated, but somehow it's spending all the CPU time trying to keep the used heap as small as possible, which is around 2 GB - and the other 3.5G are allocated by JVM but unused.
I have a few servers with the exact same configuration and 2 out of 5 are behaving this way. What could explain it?
It's Java 6, the flags are: -server -XX:+PrintGCDateStamps -Duser.timezone=America/Chicago -Djava.awt.headless=true -Djava.io.tmpdir=/whatever -Xloggc:logs/gc.log -XX:+PrintGCDetails -mx7000m -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC
The default threshold for triggering a CMS GC is 70% full (in Java 6). A rule of thumb is that the heap size should be about 2.5x the heap used after a full GC (but your use case is likely to be different)
So in your case, say you have
- 2.5 GB of young generation space
- 3 GB of tenured space.
When you tenured space reached 70% or ~2.1GB, it will start cleaning up the region.
The setting involved is the -XX:CMSInitiatingOccupancyFraction=70
However, if you want to reduce the impact GC, the simplest thing to do is to create less garbage. i.e. use a memory profiler and ensure you allocation rate is as low as possible. Your system will run very different if you are creating as much garbage as the CPUs can handle to say 100 MB/s or 1 MB/s or less.
The reason you might have different servers running differently as the relative sizes of the region might be different, say you have 0.5 young and 5.0 GB tenured, you shouldn't be seeing this. The difference could be purely down to how busy the machine was when you started the process or what it did since then.
I have a machine with 10GB of RAM. I am running 6 java processes with the -Xmx option set to 2GB. The probability of all 6 processes running simultaneously and consuming the entire 2GB memory is very very low. But I still want to understand this worst case scenario.
What happens when all 6 processes consume a little less than 2GB memory at the same instant such that the JVM does not start garbage collection yet the processes are holding that much memory and the sum of the memory consumed by these 6 processes exceeds the available RAM?
Will this crash the server? OR Will it slow down the processing?
You should expect each JVM could use more than 2 GB. This is because the heap is just one memory region, you also have
shared libraries
thread stacks
direct memory
native memory use by shared libraries
perm gen.
This means that setting a maximum heap of 2 GB doesn't mean your process maximum 2 GB.
Your processes should perform well until they get the point where you have swapped some of the heap and a GC is performed. A GC assumes random access to the whole heap and at this point, your system could start swapping like mad. If you have a SSD for swap your system is likely to stop, or almost stop for very long periods of time. If you have Windows (which I have found is worse than Linux in this regard) and a HDD, you might not get control of the machine back and have to power cycle it.
I would suggest either reducing the heap to say 1.5 GB at most, or buying more memory. You can get 8 GB for about $100.
Your machine will start swapping. As long as each java process uses only a small part of the memory it has allocated, you won't notice the effect, but if they all garbage collect at the same time, accessing all of their memory, your hard disk will have 100% utilization and the machine will "feel" very, very slow.
How much data is too much for on-heap cache like ehcache?
I'm getting a 24GB RAM server. I'll probably start off devoting 2-4 GB for caching but may end up devoting 20GB or so to cache. At what point should I worry that GC for on-heap cache will take too long?
By the way, is DirectMemory the only open source off-heap cache available? Is it ready for prime time?
Depends on your JVM and especially the used GC. Older GCs especially were not really capable of handling really large heaps, but there's been an increasing effort to fix that.
Azul systems for example sells hardware with hundreds of GB of heap without problems (ie gc pauses in the ms not half minutes) thanks to their special GC, so it's no limitation of Java per se. No idea how good hotspot/IBM have got over time though. But then a 24gb heap isn't that large anyhow - G1 should probably do a good enough job there anyhow.
At what point should I worry that GC for on-heap cache will take too long?
How long is too long?
Seriously, if the you are running a "throughput" garbage collector and this is giving you pauses that are too long, then you should try switching to a low-pause collector; e.g. CMS or G1.
The main problem with a large cache is the full GC time. To give you an idea it might be 1 seconds per GB (This varies from application to application) If you have a 20 GB cache and your application pauses for 20 seconds every so often is that acceptable?
As a fan of direct and memory mapped files I tend to think in terms of when would I not both to put the data off heap, and just use the heap for simplicity. ;) Memory mapped files have next to no impact on the full GC time regardless of size.
One of the advantages of using a memory mapped file is it can be much larger than your physical memory and still perform reasonably well. This leaves the OS to determine which portions should be in memory and what needs to be flushed to disk.
BTW: Having a faster SSD also helps ;) The larger drives also tend to be faster as well. Check for the IOPs they can perform.
In this example, I create an 8 TB file memory mapped on a machine with 16 GB. http://vanillajava.blogspot.com/2011/12/using-memory-mapped-file-for-huge.html
Note, it performs better in the 80 GB file example, 8 TB is likely to be over kill. ;)