Java performance with very large amounts of RAM

Java performance with very large amounts of RAM - java

I'm exploring the possibility of running a Java app on a machine with very large amounts of RAM (anywhere from 300GB to 15TB, probably on an SGI Altix 4700 machine), and I'm curious as to how Java's GC is likely to perform in this scenario.
I've heard that IBM's or JRockit's JVMs may be better suited to this than Sun's. Does anyone know of any research or data on JVM performance in this situation?

On the Sun JVM, you can use the option -XX:UseConcMarkSweepGC to turn on the Concurrent mark and sweep Collector, which will avoid the "stop the world" phases of the default GC algorithm almost completely, at the cost of a little bit more overhead.
The advise to use more than on VM on such a machine is IMHO outdated.
In real world applications you often have enough shared data so that the performance with the CMS and one JVM is better.

The question is: do you want to run within a single process (JVM) or not? If you do, then you're going to have a problem. Refer to Tuning Java Virtual Machines, Oracle Coherence User Guide and similar documentation. The rule of thumb I've operated by is try and avoid heaps larger than 1GB. Whereas a 512MB-1GB full GC might take less than a second. A 2-4GB full GC could potentially take 5 seconds or longer. Obvioiusly this depends on many factors but the moral of the story is that GC overhead does not scale linearly and once you get into the one second range performance then degrades rapidly.

Sun's JVM allows you to configure and optimize the heck out of garbage collection, but it's a science unto itself:
http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html
You might have to do some reading and research, but for that kind of machine, GC settings optimized for the machine and application probably make a big difference.

Since 5.0 the Hotspot JVM uses a concept know as Ergonomics to try to optimise the memory usage. This is based on more than just the sheer amount of memory available and effects heap sizes, generation sizes and garbage collection algorithms.
Start by having a read of this, which explains Ergonomics and more:
https://www.oracle.com/technetwork/java/javase/memorymanagement-whitepaper-150215.pdf
There's also a guy called Brian Goetz that's written numerous articles about how Java allocates and uses memory, all of which and more can be found here:
http://www.briangoetz.com/pubs.html

This is not at all answering your question, but if you plan do deploy a huge Java app you might be interested in looking into Azul Systems appliances. They say to be able to garbage-collect without creating a pause in the application up to a single 670 GB heap.

You might want to consider running a virtual Terracotta cluster on this machine.

The only people who can really tell you are SGI. Super computers don't behave like regular servers only bigger.
However, I have found that Java performs best when memory is local to the processors accessing it. Note: the GC needs to be able to walk the whole memory end to end. This means it doesn't scale well if you have a design which is like lots of computers stuck together which may be the case here. The memory module size is 32 GB, so you may get better performance if you limit your JVM to comfortably fit into this size.

The accepted answer for this post is rather old and is now outdated. As of September 2014, if you are using Java 7, you should probably switch to the GC1 collector. From the Java 7 update 4 release notes:
http://www.oracle.com/technetwork/java/javase/7u4-relnotes-1575007.html
"The G1 collector is targeted for applications that fully utilize the large amount of memory available in today's multiprocessor servers, while still keeping garbage collection latencies under control. Applications that require a large heap, have a big active data set, have bursty or non-uniform workloads or suffer from long Garbage Collection induced latencies should benefit from switching to G1."

Surely the answer as to how the GC's going to perform is "who cares?" ;-)

Related

hard limit on java heap size [duplicate]

This question already has answers here:
Maximum Java heap size of a 32-bit JVM on a 64-bit OS
(17 answers)
Closed 9 years ago.
This goes to all Java heap/GC experts out there.
Simple and straight question: is there a hard limit on the maximum figure one can set the Java heap size to?
One of the servers I'm working on is hosting an in-memory real-time communication system, and the clients using the server have asked me to increase the amount of memory to 192Gb (it's not a typo: 192 GigaBytes!). Technically this is possible because the machine is virtual.
Besides the one above, my other questions are:
how well is the JVM going to handle such size?
is there any showstopper in setting it?
is there something I should be aware of, when dealing with such sizes?
Thanks in advance to anyone willing to help.
Regards.

The JVM spec for java 7 does not indicate that there is a hard limit.
I'd suggest that it is therefore goverened by your OS and the amount of memory available.
However, you will have to be aware that the JVM needs enough memory for other internal structures (as the spec describes), such as PermGen, constant pool, etc. I'd also suggest that you consider profiling before arbitrarily increasing the heap size. It's possible that your old-gen space is hogging memory. I'd use VisualGC (now in VisualVM) to watch the memory usage and YourKit to look for leaks (its generational capabilities are a real help).
I've not answered your other questions but probably couldn't say more than your other respondents.

1) Have them consider the implications of having the virtual machine swapping stuff in and out of memory to disk with their code doing it. Sometimes its just as good and no extra work to let the VM do it. Sometimes it is much faster when they know something about the data characteristics and their optimizations are targeted that way.
2) Consider the possible use of multiple JVMs either cooperating or running in parallel with the scope of operation divided up somehow. Sometimes even running 4 JVMs running the same application on the same VM with each using 1/4 of the memory can be better. (Depends on your application's characteristics.)
3) Consider a memory management framework like TerraCotta's Big Memory. There are competitors in this space. For example, VMWare has Elastic Memory. They use memory outside the JVM's heap to store stuff and avoid the GC issue. Or they allocate very large heap objects and manage it independently of Java.
4) JVMs do ok(TM) if the memory gets taken up by live objects and they end up moving to the older generations for garbage collection. GC can be a problem. This is a black art. Be sure to kill a chicken and spread the feathers over the virtual machine before attempting to optimize GC. :) Oh ... and there are some really interesting comments here: Java very large heap sizes
5) Sometimes you set the JVM size and other factors beyond your control will keep it from growing that large. Be sure to do some testing to make sure it is really able to grow to the size you set.
And number 5 is the real key item. Test. Test. And test some more. You are out on the edges of explored territory.

What are the useful JVM options for a multithreaded application?

I am working on an application that creates a lot of threads and relies heavily on String manipulation.
The application works for a good 24 hrs at a time and needs to be always very responsive.
I am trying to keep the creation of objects to a minimum. The application is doing well without any configuration at the moment.
But I was wondering for my own knowledge if there were any advantages (or disavantages) in using a specific JVM configuration?
Please bear with me, I am pretty new on on the subject of the JVM/GC configuration:
I was wondering if there were any JVM options I should absolutely use while working with multithreads?
Should I configure the heap?
Should I also configure the GC?
Should I keep the Garbage Collection to a minimum?
I started reading: http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
Any tips on the subject would greatly be appreciated.
Thanks in advance,

Generally, the best intial advice concerning tweaking your JVM is don't. Unless you are experiencing specific JVM-related problems with the default settings, leave them alone.
If you do need to fiddle around with the settings, I would recommend you set up a representative testcase and use an advanced profiler such as JProfiler.
Furthermore, you should really read the technical documentation regarding the HotSpot VM, specifically the Memory Management Whitepaper, all of which you may find here.

If it is working fine then you should not do anything.
If your application is CPU bound you should not create Lot of threads.
Reason is lot of time is wasted in context switching.
String manipulation if it in memory then there should be only those threads which are required
NCPU = UCPU* (1+W/C)
Where NCPU--> Number of CPU
UCPU--> Target CPU Utilization
W-->Wait time
C--> Compute time
So for CPU bound operations it should be max (Number of CPU +1) threads.
Also there are lot of test cases defined for concurrency applications in Java Concurrency in Practice. You may want to check those.

I was wondering if there were any JVM options I should absolutely use while working with multithreads?
All the best options will be on by default. If you look at HotSpot VM Options you can see quite a few are -XX:+ which means they are on by default.
Should I configure the heap?
Possibly. But I would leave the default setting if you can.
Should I also configure the GC?
Possibly. But I would leave the default setting if you can.
Should I keep the Garbage Collection to a minimum?
Reducing the amount of garbage created takes effort. It provides some benefit up to a point. You have to decide what is the best use of your time and how much time to spend reducing the amount of garbage created.
I would always start with a memory profiler and find where you are creating the most garbage. Start from the top of the list rather than trying to tune everything as this ensures you will get the most benefit for the least amount of effort.
BTW: I am an advocate of low garbage and off heap programs where it makes sense to do so. I have written trading systems which can run for a day without even a minor GC and programs which can load/use 500+ GB of data in off heap memory. However, you have to be able to demonstrate or quantify how much difference it will make to the end users or your business to determine whether it is really worth it.

I was wondering if there were any JVM options I should absolutely use while working with multithreads?
No.
Should I configure the heap?
No, apart from setting the heap size to something reasonable (with -Xmx and -Xms)
Should I also configure the GC?
No, unless you have a particular need for "low-pause". The default throughput compiler is the best option if you are currently meeting your "responsiveness" goals. If you are not meeting those goals then you should consider CMS or G1 ... but beware that they reduce pauses but they also reduce throughput.
Should I keep the Garbage Collection to a minimum?
No. That is not a sensible goal. Your aim is to maximize throughput, and minimizing GC won't necessarily achieve that. In a lot of case, it is more efficient to generate garbage than to to have the application do extra work to avoid generating garbage. (And as Peter Lawrey pointed out, you've also got the extra developer effort in writing and maintaining mode complex code.)
I would advise you to use a profiler to see if your application is spending a lot of time (CPU time or elapsed time) relative to doing other productive work. If not, or if the application is already running fast enough then don't fiddle with the JVM options.
If you are worried that your application won't cope with increased load in the future, then tweaking the GC doesn't scale. A better option is to investigate scaling up your hardware and/or figuring out how to do the work on multiple machines. In addition, tuning the GC to improve performance with current load may actually result in worse performance when the load increases. (Consider the problem that arises with CMS when it can't keep up and is forced to do a full stop-the-world collection to recover.)
Finally, it is generally speak a bad idea to have lots of threads. It is better to use a small number of worker threads (roughly equal to the number of processors / cores) and feed them work via concurrent queues, etcetera.

In the past, I have faced the similar server application: lots of String manipulation, String creation, and needs to be always very responsive. The app worked fine with default configuration, until run into high-stress situation. You need to enable -XX:+UseConcMarkSweepGC for low pause, and fine tune other parameters to ensure the app behavior the way that you want. Here is the short list:
-XX:+CMSParallelRemarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=nn
-XX:CMSWaitDuration=300000
-XX:GCTimeRatio=nn
-XX:+DisableExplicitGC

Can Sun JVM handle gigantic heap sizes without problems, and how?

I have heard several people claiming that you can not scale the JVM heap size up. I've heard claims of the practical limit being 4 gigabytes (I heard an IBM consultant say that), 10 gigabytes, 32 gigabytes, and so on... I simply can not believe any of those numbers and have been wondering about the issue now for a while.
So, I have three part question I would hope someone with experience could answer:
Given the following case how would you tune the heap and GC settings?
Would there be noticeable hickups (pauses of JVM etc) that would be noticed by the end users?
Should this really still work? I think it should.
The case:
64 bit platform
64 cores
64 gigabytes of memory
The application server is client facing (ie. Jboss/tomcat web application server) - complete pauses of JVM would probably be noticed by end users
Sun JVM, probably 1.5
To prove I am not asking you guys to do my homework this is what I came up with:
-XX:+UseConcMarkSweepGC -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions -XX:-EliminateZeroing -Xmn768m -Xmx55000m
CMS should reduce the amount of pauses, although it comes with overhead. The other settings for CMS seem to default automatically to the number of CPUs so they seem sane to me. The rest that I added are extras that might do good or bad generally for performance, and they should probably be tested.
Definitely.

I think it's going to be difficult for anybody to give you anything more than general advice, without having further knowledge of your application.
What I would suggest is that you use VisualGC (or the VisualGC plugin for VisualVM) to actually look at what the garbage collection is doing when your app is running. Once you have a greater understanding of how the GC is working alongside your application, it'll be far easier to tune it.

#1. Given the following case how would you tune the heap and GC settings?
First, having 64 gigabytes of memory doesn't imply that you have to use them all for one JVM. Actually, it rather means you can run many of them. Then, it is impossible to answer your question without any access to your machine and application to measure and analyse things (knowing what your application is doing isn't enough). And no, I'm not asking to get access to your environment :)
#2. Would there be noticeable hickups (pauses of JVM etc) that would be noticed by the end users?
The goal of tuning is to find a good compromise between frequency and duration of (major) GCs. With a ~55g heap, GC won't be frequent but will take noticeable time, for sure (the bigger the heap, the longer the major GC). Using a Parallel or Concurrent garbage collector will help on multiprocessor systems but won't entirely solve this issue. Why do you need ~55g (this is mega ultra huge for a webapp IMO), that's my question. I'd rather run many clustered JVMs to handle load if required (at some point, the database will become the bottleneck anyway with a data oriented application).
#3. Should this really still work? I think it should.
Hmm... not sure I get the question. What is "this"? Instantiating a JVM with a big heap? Yes, it should. Is it equivalent to running several JVMs? No, certainly not.
PS: 4G is the maximum theoretical heap limit for the 32-bit JVM running on a 64-bit operating system (see Why can't I get a larger heap with the 32-bit JVM?)
PPS: On 64-bit VMs, you have 64 bits of addressability to work with resulting in a maximum Java heap size limited only by the amount of physical memory and swap space your system provides. (see How large a heap can I create using a 64-bit VM?)

Obviously heap size is not unlimited and the larger is the heap size, the more your JVM will eventually spend on GC. Though I think it is possible to set heap size quite high on 64-bit JVM, I still think it's not really practical. The advice here is better to have several JVMs running with the same parameters i.e. cluster of JBoss/Tomcat nodes running on the same physical machine and you will get better throughput.
EDIT: Also your GC behavior depends on the taxonomy of your heap. If you have a lot of short-living objects and each request to the server creates a lot of those, then your GC will collect a lot of garbage very often and thus on large heap size this will result in longer pauses. If you have very many long-living objects (e.g. caching most of your data in memory) and the amount of short-living objects is not that big, then having bigger heap size is OK.

As Chris Rice already wrote, I wouldn't expect any obvious problems with the GC for heap sizes up to 32-64GB, although there may of course be some point of your application logic, which can cause problems.
Not directly related to GC, but I would still recommend you to perform a realistic load test on your production system. I used to work on a project, where we had a similar setup (relatively large, clustered JBoss/Tomcat setup to serve a public web application) and without exaggeration, JBoss is not behaving very well under high load or with a high number of concurrent calls if you are using EJBs. JBoss is spending a lot of time in synchronized blocks when accessing and managing the EJB instance pools and if you opt for a cluster, it will even wait for intra-cluster network communication within these synchronized blocks. Be especially aware of poorly performing state replication, if you are using SFSBs.

Only to add some more switches I would use by default: -Xms55g can help to reduce the rampup time because it frees Java from the need to check if it can fall back to the initial size and allows also better internal initial sizing of memory areas.
Additionally we made good experiences with NewSize to give you a large young size to get rid of short term garbage: -XX:NewSize=1g Additionally most webapps create a lot of short time garbage that will never survive the request processing. You can even make that bigger. With Xms55g, the VM reserves a large chunk already. Maybe downsizing can help.
-Xincgc helps to clean the young generation incrementally and return the cpu often to the user threads.
-XX:CMSInitiatingOccupancyFraction=70 If you really fill all that memory, try to start CMS garbage collection earlier.
-XX:+CMSIncrementalMode puts the CMS into incremental mode to return the cpu to the user threads more often.
Attach to the process with jstat -gc -h 10 <pid> 1s and watch the GC working.
Will you really fill up the memory? I assume that 64cpus for request processing might even be able to work with less memory. What do you store in there?

Depending on your GC pause analysis, you may wish to implement Incremental mode whereby the long pause may be broken out over a period of time.

I have found memory architecture plays a part in large memory sizes. Applications in general don't perform as well if they use more than one memory bank. The JVM appears to suffer as well, esp the GC which has to sweep the whole memory.
If you have an application which doesn't fit into one memory bank, your application has to pull in memory which is not local to a processor and use memory local to another processor.
On linux you can run numactl --hardware to see the layout of processors and memory banks.

Java very large heap sizes [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Does anyone have experience with using very large heaps, 12 GB or higher in Java?
Does the GC make the program unusable?
What GC params do you use?
Which JVM, Sun or BEA would be better suited for this?
Which platform, Linux or Windows, performs better under such conditions?
In the case of Windows is there any performance difference to be had between 64 bit Vista and XP under such high memory loads?

If your application is not interactive, and GC pauses are not an issue for you, there shouldn't be any problem for 64-bit Java to handle very large heaps, even in hundreds of GBs. We also haven't noticed any stability issues on either Windows or Linux.
However, when you need to keep GC pauses low, things get really nasty:
Forget the default throughput, stop-the-world GC. It will pause you application for several tens of seconds for moderate heaps (< ~30 GB) and several minutes for large ones (> ~30 GB). And buying faster DIMMs won't help.
The best bet is probably the CMS collector, enabled by -XX:+UseConcMarkSweepGC. The CMS garbage collector stops the application only for the initial marking phase and remarking phases. For very small heaps like < 4 GB this is usually not a problem, but for an application that creates a lot of garbage and a large heap, the remarking phase can take quite a long time - usually much less then full stop-the-world, but still can be a problem for very large heaps.
When the CMS garbage collector is not fast enough to finish operation before the tenured generation fills up, it falls back to standard stop-the-world GC. Expect ~30 or more second long pauses for heaps of size 16 GB. You can try to avoid this keeping the long-lived garbage production rate of you application as low as possible. Note that the higher the number of the cores running your application is, the bigger is getting this problem, because the CMS utilizes only one core. Obviously, beware there is no guarantee the CMS does not fall back to the STW collector. And when it does, it usually happens at the peak loads, and your application is dead for several seconds. You would probably not want to sign an SLA for such a configuration.
Well, there is that new G1 thing. It is theoretically designed to avoid the problems with CMS, but we have tried it and observed that:
Its throughput is worse than that of CMS.
It theoretically should avoid collecting the popular blocks of memory first, however it soon reaches a state where almost all blocks are "popular", and the assumptions it is based on simply stop working.
Finally, the stop-the-world fallback still exists for G1; ask Oracle, when that code is supposed to be run. If they say "never", ask them, why the code is there. So IMHO G1 really doesn't make the huge heap problem of Java go away, it only makes it (arguably) a little smaller.
If you have bucks for a big server with big memory, you have probably also bucks for a good, commercial hardware accelerated, pauseless GC technology, like the one offered by Azul. We have one of their servers with 384 GB RAM and it really works fine - no pauses, 0-lines of stop-the-world code in the GC.
Write the damn part of your application that requires lots of memory in C++, like LinkedIn did with social graph processing. You still won't avoid all the problems by doing this (e.g. heap fragmentation), but it would be definitely easier to keep the pauses low.

I am CEO of Azul Systems so I am obviously biased in my opinion on this topic! :) That being said...
Azul's CTO, Gil Tene, has a nice overview of the problems associated with Garbage Collection and a review of various solutions in his Understanding Java Garbage Collection and What You Can Do about It presentation, and there's additional detail in this article: http://www.infoq.com/articles/azul_gc_in_detail.
Azul's C4 Garbage Collector in our Zing JVM is both parallel and concurrent, and uses the same GC mechanism for both the new and old generations, working concurrently and compacting in both cases. Most importantly, C4 has no stop-the-world fall back. All compaction is performed concurrently with the running application. We have customers running very large (hundreds of GBytes) with worse case GC pause times of <10 msec, and depending on the application often times less than 1-2 msec.
The problem with CMS and G1 is that at some point Java heap memory must be compacted, and both of those garbage collectors stop-the-world/STW (i.e. pause the application) to perform compaction. So while CMS and G1 can push out STW pauses, they don't eliminate them. Azul's C4, however, does completely eliminate STW pauses and that's why Zing has such low GC pauses even for gigantic heap sizes.

We have an application that we allocate 12-16 Gb for but it really only reaches 8-10 during normal operation. We use the Sun JVM (tried IBMs and it was a bit of a disaster but that just might have been ignorance on our part...I have friends that swear by it--that work at IBM). As long as you give your app breathing room, the JVM can handle large heap sizes with not too much GC. Plenty of 'extra' memory is key.
Linux is almost always more stable than Windows and when it is not stable it is a hell of a lot easier to figure out why. Solaris is rock solid as well and you get DTrace too :)
With these kind of loads, why on earth would you be using Vista or XP? You are just asking for trouble.
We don't do anything fancy with the GC params. We do set the minimum allocation to be equal to the maximum so it is not constantly trying to resize but that is it.

I have used over 60 GB heap sizes on two different applications under Linux and Solaris respectively using 64-bit versions (obviously) of the Sun 1.6 JVM.
I never encountered garbage collection problems with the Linux-based application except when pushing up near the heap size limit. To avoid the thrashing problems inherent to that scenario (too much time spent doing garbage collection), I simply optimized memory usage throughout the program so that peak usage was about 5-10% below a 64 GB heap size limit.
With a different application running under Solaris, however, I encountered significant garbage-collection problems which made it necessary to do a lot of tweaking. This consisted primarily of three steps:
Enabling/forcing use of the parallel garbage collector via the -XX:+UseParallelGC -XX:+UseParallelOldGC JVM options, as well as controlling the number of GC threads used via the -XX:ParallelGCThreads option. See "Java SE 6 HotSpot Virtual Machine Garbage Collection Tuning" for more details.
Extensive and seemingly ridiculous setting of local variables to "null" after they are no longer needed. Most of these were variables that should have been eligible for garbage collection after going out of scope, and they were not memory leak situations since the references were not copied. However, this "hand-holding" strategy to aid garbage collection was inexplicably necessary for some reason for this application under the Solaris platform in question.
Selective use of the System.gc() method call in key code sections after extensive periods of temporary object allocation. I'm aware of the standard caveats against using these calls, and the argument that they should normally be unnecessary, but I found them to be critical in taming garbage collection when running this memory-intensive application.
The three above steps made it feasible to keep this application contained and running productively at around 60 GB heap usage instead of growing out of control up into the 128 GB heap size limit that was in place. The parallel garbage collector in particular was very helpful since major garbage-collection cycles are expensive when there are a lot of objects, i.e., the time required for major garbage collection is a function of the number of objects in the heap.
I cannot comment on other platform-specific issues at this scale, nor have I used non-Sun (Oracle) JVMs.

12Gb should be no problem with a decent JVM implementation such as Sun's Hotspot.
I would advice you to use the Concurrent Mark and Sweep colllector ( -XX:+UseConcMarkSweepGC) when using a SUN VM.Otherwies you may face long "stop the world" phases, were all threads are stopped during a GC.
The OS should not make a big difference for the GC performance.
You will need of course a 64 bit OS and a machine with enough physical RAM.

I recommend also considering taking a heap dump and see where memory usage can be improved in your app and analyzing the dump in something such as Eclipse's MAT . There are a few articles on the MAT page on getting started in looking for memory leaks. You can use jmap to obtain the dump with something such as ...
jmap -heap:format=b pid

As mentioned above, if you have a non-interactive program, the default (compacting) garbage collector (GC) should work well. If you have an interactive program, and you (1) don't allocate memory faster than the GC can keep up, and (2) don't create temporary objects (or collections of objects) that are too big (relative to the total maximum JVM memory) for the GC to work around, then CMS is for you.
You run into trouble if you have an interactive program where the GC doesn't have enough breathing room. That's true regardless of how much memory you have, but the more memory you have, the worse it gets. That's because when you get too low on memory, CMS will run out of memory, whereas the compacting GCs (including G1) will pause everything until all the memory has been checked for garbage. This stop-the-world pause gets bigger the more memory you have. Trust me, you don't want your servlets to pause for over a minute. I wrote a detailed StackOverflow answer about these pauses in G1.
Since then, my company has switched to Azul Zing. It still can't handle the case where your app really needs more memory than you've got, but up until that very moment it runs like a dream.
But, of course, Zing isn't free and its special sauce is patented. If you have far more time than money, try rewriting your app to use a cluster of JVMs.
On the horizon, Oracle is working on a high-performance GC for multi-gigabyte heaps. However, as of today that's not an option.

If you switch to 64-bit you will use more memory. Pointers become 8 bytes instead of 4. If you are creating lots of objects this can be noticeable seeing as every object is a reference (pointer).
I have recently allocated 15GB of memory in Java using the Sun 1.6 JVM with no problems. Though it is all only allocated once. Not much more memory is allocated or released after the initial amount. This was on a Linux but I imagine the Sun JVM will work just as well on 64-bit Windows.

You should try running visualgc against your app. It´s a heap visualization tool that´s part of the jvmstat download at http://java.sun.com/performance/jvmstat/
It is a lot easier than reading GC logs.
It quickly helps you understand how the parts (generations) of the heap are working. While your total heap may be 10GB, the various parts of the heap will be much smaller. GCs in the Eden portion of the heap are relatively cheap, while full GCs in the old generation are expensive. Sizing your heap so that that the Eden is large and the old generation is hardly ever touched is a good strategy. This may result in a very large overall heap, but what the heck, if the JVM never touches the page, it´s just a virtual page, and doesn´t have to take up RAM.

A couple of years ago, I compared JRockit and the Sun JVM for a 12G heap. JRockit won, and Linux hugepages support made our test run 20% faster. YMMV as our test was very processor/memory intensive and was primarily single-threaded.

here's an article on gc FROM one of Java Champions --
http://kirk.blog-city.com/is_your_concurrent_collector_failing_you.htm
Kirk, the author writes
"Send me your GC logs
I'm currently interested in studying Sun JVM produced GC logs. Since these logs contain no business relevent information it should be ease concerns about protecting proriatary information. All I ask that with the log you mention the OS, complete version information for the JRE, and any heap/gc related command line switches that you have set. I'd also like to know if you are running Grails/Groovey, JRuby, Scala or something other than or along side Java. The best setting is -Xloggc:. Please be aware that this log does not roll over when it reaches your OS size limit. If I find anything interesting I'll be happy to give you a very quick synopsis in return. "

An article from Sun on Java 6 can help you: https://www.oracle.com/java/technologies/javase/troubleshooting-javase.html

The max memory that XP can address is 4 gig(here). So you may not want to use XP for that(use a 64 bit os).

sun has had an itanium 64-bit jvm for a while although itanium is not a popular destination. The solaris and linux 64-bit JVMs should be what you should be after.
Some questions
1) is your application stable ?
2) have you already tested the app in a 32 bit JVM ?
3) is it OK to run multiple JVMs on the same box ?
I would expect the 64-bit OS from windows to get stable in about a year or so but until then, solaris/linux might be better bet.

Has anyone found Garbage Collection tuning to be useful?

I've read plenty of articles about tuning GC in Java and have often wondered how many people really use some of the more advanced features.
I've always avoided tuning where possible and concentrated on writing the simplest possible code I can (Brian Goetz's advice) - this seems to have worked well for me so far.
Are these tuning strategies resilient to change across VM versions or do they require constant re-assessment?
The one tuning that I have made use of is the -server flag.

Part of my current job is the care and feeding of a large java application that was designed to run with a large amount of memory (currently around 8 Gb), mostly due to ongoing calculations with lots of cached data. I did the initial deployment with the standard GC setup , mostly because there wasn't a simple way to simulate the production environment running at full tilt.
In stages, over the next few months, I've customized the GC settings. Generally, the biggest available knob seems to be adjusting the frequency and magnitude of incremental gc - the biggest improvement has been in trading off large periodic gc for smaller and more frequent ones. But we've definitely been able to see performance improvements.
I'm not going to post my specific settings because a) they're specific to our setup, and b) because I don't have them handy :). But in general, what I've found is
there's been a lot of work done in
tuning the default gc settings.
Almost always the defaults work better than any tweaks I would make.
At least for me, the situations where
gc tuning was actually worthwhile
were extreme enough that it was
unreasonable to attempt to simulate
them, so I had to do it
experimentally and incrementally.
Here's a good reference from a prev. stackoverflow discussion.

The vast majority of developers will never have to (or want to) tune GC. I have worked with people who have had to tune it and here is the advice:
Before you attempt to tune the garbage
collector make 100% sure you have
verified, with a profiler. what is
going on. Once you start tuning make
sure you verify, with a profiler, that
it had a positive effect.
You should also revisit the changes with each version of the VM you run on (different VMs will have different tuning strategies).
I once helped someone with an GC issue that turned out to be them not closing JDBC result sets (or some issue like that). This caused memory to never be freed (his code held onto them for some reason). Fixing that issue made the program go from 20 minutes to something like 30 seconds or a couple of minutes. The memory usage went way down as well.

I have to say that I haven't had the need myself to use tuning very much. But I work closely with people who write code where latency is critical: they make much use of such tuning - specifying which GC algorithm to use, max pause times, survivor ratios etc.
I guess the answer therefore is: if latency is critical to an application, you might need to look at tuning your GC

I would say the most common thing to tune is the maximum memory size. Most of the other memory options have sensible defaults and are often over tuned IMHO. i.e. Set when it really doesn't make much difference. I often see people set lots of options when half of them are the default in any case. ;)
Using a profiler is the most useful way to improve GC behaviour (by reducing the number of objects created)

I have but not recently. The application that I was working on was real-time rendering of a video stream constructed of individual motion JPEG images. At the time (circa JDK 1.2 and 1.3), the -Xincgc setting would switch the client garbage collector from more of a big bang cleanup to a mode where a bit of garbage was cleaned up regularly. As a result, the distribution of frame latencies was much lower, giving the impression of a smoother video (instead of 1-2-3-pause, 1-2-3-pause).
I haven't looked at that code in quite a long time but I strongly suspect that, with the modern garbage collection algorithms, -Xincgc would actually decrease performance.
In today's world, I would say that standard optimization skepticism should always apply: profile profile profile. Are you sure that the bottleneck is really the garbage collector...?

In short, yes, it's very useful for tuning any serious Java application. We've often found that in production scenarios it's the difference between a stable app and a completely unpredictable app. It's certainly not the first thing I do but once you have an application working and can apply real load to it, it's one of the first things to investigate at that point.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.