Has anyone found Garbage Collection tuning to be useful?

Has anyone found Garbage Collection tuning to be useful? - java

I've read plenty of articles about tuning GC in Java and have often wondered how many people really use some of the more advanced features.
I've always avoided tuning where possible and concentrated on writing the simplest possible code I can (Brian Goetz's advice) - this seems to have worked well for me so far.
Are these tuning strategies resilient to change across VM versions or do they require constant re-assessment?
The one tuning that I have made use of is the -server flag.

Part of my current job is the care and feeding of a large java application that was designed to run with a large amount of memory (currently around 8 Gb), mostly due to ongoing calculations with lots of cached data. I did the initial deployment with the standard GC setup , mostly because there wasn't a simple way to simulate the production environment running at full tilt.
In stages, over the next few months, I've customized the GC settings. Generally, the biggest available knob seems to be adjusting the frequency and magnitude of incremental gc - the biggest improvement has been in trading off large periodic gc for smaller and more frequent ones. But we've definitely been able to see performance improvements.
I'm not going to post my specific settings because a) they're specific to our setup, and b) because I don't have them handy :). But in general, what I've found is
there's been a lot of work done in
tuning the default gc settings.
Almost always the defaults work better than any tweaks I would make.
At least for me, the situations where
gc tuning was actually worthwhile
were extreme enough that it was
unreasonable to attempt to simulate
them, so I had to do it
experimentally and incrementally.
Here's a good reference from a prev. stackoverflow discussion.

The vast majority of developers will never have to (or want to) tune GC. I have worked with people who have had to tune it and here is the advice:
Before you attempt to tune the garbage
collector make 100% sure you have
verified, with a profiler. what is
going on. Once you start tuning make
sure you verify, with a profiler, that
it had a positive effect.
You should also revisit the changes with each version of the VM you run on (different VMs will have different tuning strategies).
I once helped someone with an GC issue that turned out to be them not closing JDBC result sets (or some issue like that). This caused memory to never be freed (his code held onto them for some reason). Fixing that issue made the program go from 20 minutes to something like 30 seconds or a couple of minutes. The memory usage went way down as well.

I have to say that I haven't had the need myself to use tuning very much. But I work closely with people who write code where latency is critical: they make much use of such tuning - specifying which GC algorithm to use, max pause times, survivor ratios etc.
I guess the answer therefore is: if latency is critical to an application, you might need to look at tuning your GC

I would say the most common thing to tune is the maximum memory size. Most of the other memory options have sensible defaults and are often over tuned IMHO. i.e. Set when it really doesn't make much difference. I often see people set lots of options when half of them are the default in any case. ;)
Using a profiler is the most useful way to improve GC behaviour (by reducing the number of objects created)

I have but not recently. The application that I was working on was real-time rendering of a video stream constructed of individual motion JPEG images. At the time (circa JDK 1.2 and 1.3), the -Xincgc setting would switch the client garbage collector from more of a big bang cleanup to a mode where a bit of garbage was cleaned up regularly. As a result, the distribution of frame latencies was much lower, giving the impression of a smoother video (instead of 1-2-3-pause, 1-2-3-pause).
I haven't looked at that code in quite a long time but I strongly suspect that, with the modern garbage collection algorithms, -Xincgc would actually decrease performance.
In today's world, I would say that standard optimization skepticism should always apply: profile profile profile. Are you sure that the bottleneck is really the garbage collector...?

In short, yes, it's very useful for tuning any serious Java application. We've often found that in production scenarios it's the difference between a stable app and a completely unpredictable app. It's certainly not the first thing I do but once you have an application working and can apply real load to it, it's one of the first things to investigate at that point.

Related

What are the useful JVM options for a multithreaded application?

I am working on an application that creates a lot of threads and relies heavily on String manipulation.
The application works for a good 24 hrs at a time and needs to be always very responsive.
I am trying to keep the creation of objects to a minimum. The application is doing well without any configuration at the moment.
But I was wondering for my own knowledge if there were any advantages (or disavantages) in using a specific JVM configuration?
Please bear with me, I am pretty new on on the subject of the JVM/GC configuration:
I was wondering if there were any JVM options I should absolutely use while working with multithreads?
Should I configure the heap?
Should I also configure the GC?
Should I keep the Garbage Collection to a minimum?
I started reading: http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
Any tips on the subject would greatly be appreciated.
Thanks in advance,

Generally, the best intial advice concerning tweaking your JVM is don't. Unless you are experiencing specific JVM-related problems with the default settings, leave them alone.
If you do need to fiddle around with the settings, I would recommend you set up a representative testcase and use an advanced profiler such as JProfiler.
Furthermore, you should really read the technical documentation regarding the HotSpot VM, specifically the Memory Management Whitepaper, all of which you may find here.

If it is working fine then you should not do anything.
If your application is CPU bound you should not create Lot of threads.
Reason is lot of time is wasted in context switching.
String manipulation if it in memory then there should be only those threads which are required
NCPU = UCPU* (1+W/C)
Where NCPU--> Number of CPU
UCPU--> Target CPU Utilization
W-->Wait time
C--> Compute time
So for CPU bound operations it should be max (Number of CPU +1) threads.
Also there are lot of test cases defined for concurrency applications in Java Concurrency in Practice. You may want to check those.

I was wondering if there were any JVM options I should absolutely use while working with multithreads?
All the best options will be on by default. If you look at HotSpot VM Options you can see quite a few are -XX:+ which means they are on by default.
Should I configure the heap?
Possibly. But I would leave the default setting if you can.
Should I also configure the GC?
Possibly. But I would leave the default setting if you can.
Should I keep the Garbage Collection to a minimum?
Reducing the amount of garbage created takes effort. It provides some benefit up to a point. You have to decide what is the best use of your time and how much time to spend reducing the amount of garbage created.
I would always start with a memory profiler and find where you are creating the most garbage. Start from the top of the list rather than trying to tune everything as this ensures you will get the most benefit for the least amount of effort.
BTW: I am an advocate of low garbage and off heap programs where it makes sense to do so. I have written trading systems which can run for a day without even a minor GC and programs which can load/use 500+ GB of data in off heap memory. However, you have to be able to demonstrate or quantify how much difference it will make to the end users or your business to determine whether it is really worth it.

I was wondering if there were any JVM options I should absolutely use while working with multithreads?
No.
Should I configure the heap?
No, apart from setting the heap size to something reasonable (with -Xmx and -Xms)
Should I also configure the GC?
No, unless you have a particular need for "low-pause". The default throughput compiler is the best option if you are currently meeting your "responsiveness" goals. If you are not meeting those goals then you should consider CMS or G1 ... but beware that they reduce pauses but they also reduce throughput.
Should I keep the Garbage Collection to a minimum?
No. That is not a sensible goal. Your aim is to maximize throughput, and minimizing GC won't necessarily achieve that. In a lot of case, it is more efficient to generate garbage than to to have the application do extra work to avoid generating garbage. (And as Peter Lawrey pointed out, you've also got the extra developer effort in writing and maintaining mode complex code.)
I would advise you to use a profiler to see if your application is spending a lot of time (CPU time or elapsed time) relative to doing other productive work. If not, or if the application is already running fast enough then don't fiddle with the JVM options.
If you are worried that your application won't cope with increased load in the future, then tweaking the GC doesn't scale. A better option is to investigate scaling up your hardware and/or figuring out how to do the work on multiple machines. In addition, tuning the GC to improve performance with current load may actually result in worse performance when the load increases. (Consider the problem that arises with CMS when it can't keep up and is forced to do a full stop-the-world collection to recover.)
Finally, it is generally speak a bad idea to have lots of threads. It is better to use a small number of worker threads (roughly equal to the number of processors / cores) and feed them work via concurrent queues, etcetera.

In the past, I have faced the similar server application: lots of String manipulation, String creation, and needs to be always very responsive. The app worked fine with default configuration, until run into high-stress situation. You need to enable -XX:+UseConcMarkSweepGC for low pause, and fine tune other parameters to ensure the app behavior the way that you want. Here is the short list:
-XX:+CMSParallelRemarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=nn
-XX:CMSWaitDuration=300000
-XX:GCTimeRatio=nn
-XX:+DisableExplicitGC

Java App with lots of memory problems

I've written a pretty complex java application that is doing a lot of calculations on price data from the markets in real time and from looking at the task manager in windows this sucker is taking close to 1MEG every 30 seconds and the performance is fine until it gets closer to the memory limit around 300MEG and then the g-collector really kicks in and spikes my CPU to around 50% and the UI performance rapidly degrades from all I've written so far it sounds like I have some bad code going on because the nature of my program is CPU intensive but by design stores very little data in memory.
I need some help on what might be some good next steps to take to see how I can figure out what the problem is, I think if I can see what objects are getting stored in memory that would help as maybe I have some lousy code but I am heart broken with Java as I thought these are problems I would not have to worry about. Thanks for any answers. - Duncan

Identify some reasonable performance targets (memory usage, throughput, latency).
Put together some repeatable performance tests, the closer you can get these to real life scenarios the better.
Get a hold of a good profiler. I've used YourKit with a lot of success, the Netbeans and Eclipse profilers are not bad either. Most decent profilers will be able to identify memory usage, GC and performance hotspots.
Identify the biggest culprits and start fixing the issues beginning at the TOP of the list.

Check out VisualVM. It's in the current JDK bin directory as jvisualvm. If you don't have a memory leak, the heap usage should go down when you run the garbage collector, and you can see which objects may be holding memory by calculating the retained sizes of objects in the heap.
http://download.oracle.com/javase/6/docs/technotes/guides/visualvm/intro.html

Like others say, use a profiler to find what is consuming the memory.
If you don't know already, the garbage collector can only release memory on objects that are out of scope. That is, don't have any references to them. Just make sure it goes out of scope when your done with it. It sounds like your locking it up in a way were it's still referenced some where.
Also, if you want to suggest to the GC that it cleans up, try this:
System.gc();
System.runFinalization();
Again, that is only a suggestion to the gc; but I've found it really helps if you run it after a lot of objects go out of scope.
Lastly, you can tweak your vm arguments.
There are settings for min/max heap size settings. If it's a critical application set them to the same and set it high (that way it doesn't have to keep allocating/deallocating - it just grabs one big chunk at startup). This isn't a fix, just a workaround.

Can Sun JVM handle gigantic heap sizes without problems, and how?

I have heard several people claiming that you can not scale the JVM heap size up. I've heard claims of the practical limit being 4 gigabytes (I heard an IBM consultant say that), 10 gigabytes, 32 gigabytes, and so on... I simply can not believe any of those numbers and have been wondering about the issue now for a while.
So, I have three part question I would hope someone with experience could answer:
Given the following case how would you tune the heap and GC settings?
Would there be noticeable hickups (pauses of JVM etc) that would be noticed by the end users?
Should this really still work? I think it should.
The case:
64 bit platform
64 cores
64 gigabytes of memory
The application server is client facing (ie. Jboss/tomcat web application server) - complete pauses of JVM would probably be noticed by end users
Sun JVM, probably 1.5
To prove I am not asking you guys to do my homework this is what I came up with:
-XX:+UseConcMarkSweepGC -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions -XX:-EliminateZeroing -Xmn768m -Xmx55000m
CMS should reduce the amount of pauses, although it comes with overhead. The other settings for CMS seem to default automatically to the number of CPUs so they seem sane to me. The rest that I added are extras that might do good or bad generally for performance, and they should probably be tested.
Definitely.

I think it's going to be difficult for anybody to give you anything more than general advice, without having further knowledge of your application.
What I would suggest is that you use VisualGC (or the VisualGC plugin for VisualVM) to actually look at what the garbage collection is doing when your app is running. Once you have a greater understanding of how the GC is working alongside your application, it'll be far easier to tune it.

#1. Given the following case how would you tune the heap and GC settings?
First, having 64 gigabytes of memory doesn't imply that you have to use them all for one JVM. Actually, it rather means you can run many of them. Then, it is impossible to answer your question without any access to your machine and application to measure and analyse things (knowing what your application is doing isn't enough). And no, I'm not asking to get access to your environment :)
#2. Would there be noticeable hickups (pauses of JVM etc) that would be noticed by the end users?
The goal of tuning is to find a good compromise between frequency and duration of (major) GCs. With a ~55g heap, GC won't be frequent but will take noticeable time, for sure (the bigger the heap, the longer the major GC). Using a Parallel or Concurrent garbage collector will help on multiprocessor systems but won't entirely solve this issue. Why do you need ~55g (this is mega ultra huge for a webapp IMO), that's my question. I'd rather run many clustered JVMs to handle load if required (at some point, the database will become the bottleneck anyway with a data oriented application).
#3. Should this really still work? I think it should.
Hmm... not sure I get the question. What is "this"? Instantiating a JVM with a big heap? Yes, it should. Is it equivalent to running several JVMs? No, certainly not.
PS: 4G is the maximum theoretical heap limit for the 32-bit JVM running on a 64-bit operating system (see Why can't I get a larger heap with the 32-bit JVM?)
PPS: On 64-bit VMs, you have 64 bits of addressability to work with resulting in a maximum Java heap size limited only by the amount of physical memory and swap space your system provides. (see How large a heap can I create using a 64-bit VM?)

Obviously heap size is not unlimited and the larger is the heap size, the more your JVM will eventually spend on GC. Though I think it is possible to set heap size quite high on 64-bit JVM, I still think it's not really practical. The advice here is better to have several JVMs running with the same parameters i.e. cluster of JBoss/Tomcat nodes running on the same physical machine and you will get better throughput.
EDIT: Also your GC behavior depends on the taxonomy of your heap. If you have a lot of short-living objects and each request to the server creates a lot of those, then your GC will collect a lot of garbage very often and thus on large heap size this will result in longer pauses. If you have very many long-living objects (e.g. caching most of your data in memory) and the amount of short-living objects is not that big, then having bigger heap size is OK.

As Chris Rice already wrote, I wouldn't expect any obvious problems with the GC for heap sizes up to 32-64GB, although there may of course be some point of your application logic, which can cause problems.
Not directly related to GC, but I would still recommend you to perform a realistic load test on your production system. I used to work on a project, where we had a similar setup (relatively large, clustered JBoss/Tomcat setup to serve a public web application) and without exaggeration, JBoss is not behaving very well under high load or with a high number of concurrent calls if you are using EJBs. JBoss is spending a lot of time in synchronized blocks when accessing and managing the EJB instance pools and if you opt for a cluster, it will even wait for intra-cluster network communication within these synchronized blocks. Be especially aware of poorly performing state replication, if you are using SFSBs.

Only to add some more switches I would use by default: -Xms55g can help to reduce the rampup time because it frees Java from the need to check if it can fall back to the initial size and allows also better internal initial sizing of memory areas.
Additionally we made good experiences with NewSize to give you a large young size to get rid of short term garbage: -XX:NewSize=1g Additionally most webapps create a lot of short time garbage that will never survive the request processing. You can even make that bigger. With Xms55g, the VM reserves a large chunk already. Maybe downsizing can help.
-Xincgc helps to clean the young generation incrementally and return the cpu often to the user threads.
-XX:CMSInitiatingOccupancyFraction=70 If you really fill all that memory, try to start CMS garbage collection earlier.
-XX:+CMSIncrementalMode puts the CMS into incremental mode to return the cpu to the user threads more often.
Attach to the process with jstat -gc -h 10 <pid> 1s and watch the GC working.
Will you really fill up the memory? I assume that 64cpus for request processing might even be able to work with less memory. What do you store in there?

Depending on your GC pause analysis, you may wish to implement Incremental mode whereby the long pause may be broken out over a period of time.

I have found memory architecture plays a part in large memory sizes. Applications in general don't perform as well if they use more than one memory bank. The JVM appears to suffer as well, esp the GC which has to sweep the whole memory.
If you have an application which doesn't fit into one memory bank, your application has to pull in memory which is not local to a processor and use memory local to another processor.
On linux you can run numactl --hardware to see the layout of processors and memory banks.

How to monitor Java memory usage?

We have a j2ee application running on Jboss and we want to monitor its memory usage. Currently we use the following code
System.gc();
Runtime rt = Runtime.getRuntime();
long usedMB = (rt.totalMemory() - rt.freeMemory()) / 1024 / 1024;
logger.information(this, "memory usage" + usedMB);
This code works fine. That means it shows memory curve which corresponds to reality. When we create a big xml file from a DB a curve goes up, after the extraction is finished it goes down.
A consultant told us that calling gc() explicitly is wrong, "let jvm decide when to run gc". Basically his arguments were the same as disscussed here.
But I still don't understand:
how can I have my memory usage curve?
what is wrong with the explicit gc()? I don't care about small performance issues which can happen with explicit gc() and which I would estimate in 1-3%. What I need is memory and thread monitor which helps me in analysis of our system on customer site.

If you want to really look at what is going on in the VM memory you should use a good tool like VisualVM. This is Free Software and it's a great way to see what is going on.
Nothing is really "wrong" with explicit gc() calls. However, remember that when you call gc() you are "suggesting" that the garbage collector run. There is no guarantee that it will run at the exact time you run that command.

There are tools that let you monitor the VM's memory usage. The VM can expose memory statistics using JMX. You can also print GC statistics to see how the memory is performing over time.
Invoking System.gc() can harm the GC's performance because objects will be prematurely moved from the new to old generations, and weak references will be cleared prematurely. This can result in decreased memory efficiency, longer GC times, and decreased cache hits (for caches that use weak refs). I agree with your consultant: System.gc() is bad. I'd go as far as to disable it using the command line switch.

You can take a look at stagemonitor. It is a open source java (web) application performance monitor. It captures response time metrics, JVM metrics, request details (including a call stack captured by the request profiler) and more. The overhead is very low.
Optionally, you can use the great timeseries database graphite with it to store a long history of datapoints that you can look at with fancy dashboards.
Example:
Take a look at the project website to see screenshots, feature descriptions and documentation.
Note: I am the developer of stagemonitor

I would say that the consultant is right in the theory, and you are right in practice. As the saying goes:
In theory, theory and practice are the same. In practice, they are not.
The Java spec says that System.gc suggests to call garbage collection. In practice, it just spawns a thread and runs right away on the Sun JVM.
Although in theory you could be messing up some finely tuned JVM implementation of garbage collection, unless you are writing generic code intended to be deployed on any JVM out there, don't worry about it. If it works for you, do it.

Have you tried JMX?
http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html
(source: sun.com)

Peek into what is happening inside tomcat through Visual VM.
http://www.skill-guru.com/blog/2010/10/05/increasing-permgen-size-in-your-server/

Take a look at the JVM args: http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp#DebuggingOptions
XX:-PrintGC Print messages at garbage collection. Manageable.
-XX:-PrintGCDetails Print more details at garbage collection.
Manageable. (Introduced in 1.4.0.)
-XX:-PrintGCTimeStamps Print timestamps at garbage collection.
Manageable (Introduced in 1.4.0.)
-XX:-PrintTenuringDistribution Print tenuring age information.
While you're not going to upset the JVM with explicit calls to System.gc() they may not have the effect you are expecting. To really understand what's going on with the memory in a JVM with read anything and everything the Brian Goetz writes.

Explicitly running System.gc() on a production system is a terrible idea. If the memory gets to any size at all, the entire system can freeze while a full GC is running. On a multi-gigabyte-sized server, this can easily be very noticeable, depending on how the jvm is configured, and how much headroom it has, etc etc - I've seen pauses of more than 30 seconds.
Another issue is that by explicitly calling GC you're not actually monitoring how the JVM is running the GC, you're actually altering it - depending on how you've configured the JVM, it's going to garbage collect when appropriate, and usually incrementally (It doesn't just run a full GC when it runs out of memory). What you'll be printing out will be nothing like what the JVM will do on it's own - for one thing you'll probably see fewer automatic / incremental GC's as you'll be clearing the memory manually.
As Nick Holt's post points out, options to print GC activity already exist as JVM flags.
You could have a thread that just prints out free and available at reasonable intervals, this will show you actual mem useage.

If you like a nice way to do this from the command line use jstat:
http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jstat.html
It gives raw information at configurable intervals which is very useful for logging and graphing purposes.

If you use java 1.5, you can look at ManagementFactory.getMemoryMXBean() which give you
numbers on all kinds of memory. heap and non-heap, perm-gen.
A good example can be found there
http://www.freshblurbs.com/explaining-java-lang-outofmemoryerror-permgen-space

If you use the JMX provided history of GC runs you can use the same before/after numbers, you just dont have to force a GC.
You just need to keep in mind that those GC runs (typically one for old and one for new generation) are not on regular intervalls, so you need to extract the starttime as well for plotting (or you plot against a sequence number, for most practical purposes that would be enough for plotting).
For example on Oracle HotSpot VM with ParNewGC, there is a JMX MBean called java.lang:type=GarbageCollector,name=PS Scavenge, it has a attribute LastGCInfo, it returns a CompositeData of the last YG scavenger run. It is recorded with duration, absolute startTime and memoryUsageBefore and memoryUsageAfter.
Just use a timer to read that attribute. Whenever a new startTime shows up you know that it describes a new GC event, you extract the memory information and keep polling for the next update. (Not sure if a AttributeChangeNotification somehow can be used.)
Tip: in your timer you might measure the distance to the last GC run, and if that is too long for the resulution of your plotting, you could invoke System.gc() conditionally. But I would not do that in a OLTP instance.

As has been suggested, try VisualVM to get a basic view.
You can also use Eclipse MAT, to do a more detailed memory analysis.
It's ok to do a System.gc() as long as you dont depend on it, for the correctness of your program.

The problem with system.gc, is that the JVM already automatically allocates time to the garbage collector based on memory usage.
However, if you are, for instance, working in a very memory limited condition, like a mobile device, System.gc allows you to manually allocate more time towards this garbage collection, but at the cost of cpu time (but, as you said, you aren't that concerned about performance issues of gc).
Best practice would probably be to only use it where you might be doing large amounts of deallocation (like flushing a large array).
All considered, since you are simply concerned about memory usage, feel free to call gc, or, better yet, see if it makes much of a memory difference in your case, and then decide.

About System.gc()… I just read in Oracle's documentation the following sentence here
The performance effect of explicit garbage collections can be measured by disabling them using the flag -XX:+DisableExplicitGC, which causes the VM to ignore calls to System.gc().
If your VM vendor and version supports that flag you can run your code with and without it and compare Performance.
Also note the previous quoted sentence is preceded by this one:
This can force a major collection to be done when it may not be necessary (for example, when a minor collection would suffice), and so in general should be avoided.

JavaMelody might be a solution for your need.
Developed for Java EE applications, this tool measure and build report about the real operation of your applications on any environments. It's free and open-source and easy to integrate into applications with some history, no database nor profiling, really lightweight.

Java performance with very large amounts of RAM

I'm exploring the possibility of running a Java app on a machine with very large amounts of RAM (anywhere from 300GB to 15TB, probably on an SGI Altix 4700 machine), and I'm curious as to how Java's GC is likely to perform in this scenario.
I've heard that IBM's or JRockit's JVMs may be better suited to this than Sun's. Does anyone know of any research or data on JVM performance in this situation?

On the Sun JVM, you can use the option -XX:UseConcMarkSweepGC to turn on the Concurrent mark and sweep Collector, which will avoid the "stop the world" phases of the default GC algorithm almost completely, at the cost of a little bit more overhead.
The advise to use more than on VM on such a machine is IMHO outdated.
In real world applications you often have enough shared data so that the performance with the CMS and one JVM is better.

The question is: do you want to run within a single process (JVM) or not? If you do, then you're going to have a problem. Refer to Tuning Java Virtual Machines, Oracle Coherence User Guide and similar documentation. The rule of thumb I've operated by is try and avoid heaps larger than 1GB. Whereas a 512MB-1GB full GC might take less than a second. A 2-4GB full GC could potentially take 5 seconds or longer. Obvioiusly this depends on many factors but the moral of the story is that GC overhead does not scale linearly and once you get into the one second range performance then degrades rapidly.

Sun's JVM allows you to configure and optimize the heck out of garbage collection, but it's a science unto itself:
http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html
You might have to do some reading and research, but for that kind of machine, GC settings optimized for the machine and application probably make a big difference.

Since 5.0 the Hotspot JVM uses a concept know as Ergonomics to try to optimise the memory usage. This is based on more than just the sheer amount of memory available and effects heap sizes, generation sizes and garbage collection algorithms.
Start by having a read of this, which explains Ergonomics and more:
https://www.oracle.com/technetwork/java/javase/memorymanagement-whitepaper-150215.pdf
There's also a guy called Brian Goetz that's written numerous articles about how Java allocates and uses memory, all of which and more can be found here:
http://www.briangoetz.com/pubs.html

This is not at all answering your question, but if you plan do deploy a huge Java app you might be interested in looking into Azul Systems appliances. They say to be able to garbage-collect without creating a pause in the application up to a single 670 GB heap.

You might want to consider running a virtual Terracotta cluster on this machine.

The only people who can really tell you are SGI. Super computers don't behave like regular servers only bigger.
However, I have found that Java performs best when memory is local to the processors accessing it. Note: the GC needs to be able to walk the whole memory end to end. This means it doesn't scale well if you have a design which is like lots of computers stuck together which may be the case here. The memory module size is 32 GB, so you may get better performance if you limit your JVM to comfortably fit into this size.

The accepted answer for this post is rather old and is now outdated. As of September 2014, if you are using Java 7, you should probably switch to the GC1 collector. From the Java 7 update 4 release notes:
http://www.oracle.com/technetwork/java/javase/7u4-relnotes-1575007.html
"The G1 collector is targeted for applications that fully utilize the large amount of memory available in today's multiprocessor servers, while still keeping garbage collection latencies under control. Applications that require a large heap, have a big active data set, have bursty or non-uniform workloads or suffer from long Garbage Collection induced latencies should benefit from switching to G1."

Surely the answer as to how the GC's going to perform is "who cares?" ;-)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.