GC pauses get really long after several days

GC pauses get really long after several days - java

I am running a build system. We used to use CMS collector, but we started suffering under very long full GC cycles, throughput (time not doing GC) was around 90%. So I now decided to switch to G1 with the assumtion that even if I have longer overall GC time, the pauses will be shorter hence ensuring higher availability. So this idea seemed to work even better than I expeced, I was seeing no full GC for almost 3 days, throughput was 97%, overall GC performance was way better. (All screenshots and data got from GCViewer)
Until now (day 6). Today the system simply went berzerk. Old space utilized is just barely under 100%. I am seeing Full GC triggered almost every 2-3 minutes or so:
Old space utilization:
Heap size is 20G (128G Ram total). The flags I am currently using are:
-XX:+UseG1GC
-XX:MaxPermSize=512m
-XX:MaxGCPauseMillis=800
-XX:GCPauseIntervalMillis=8000
-XX:NewRatio=4
-XX:PermSize=256m
-XX:InitiatingHeapOccupancyPercent=35
-XX:+ParallelRefProcEnabled
plus logging flags. What I seem to be missing is -XX:+ParallelGCThreads=20 (I have 32 processors), default should be 8. I have also read from oracle that it would be suggested to have -XX:+G1NewSizePercent=4 for 20G heap, default should be 5.
I am using Java HotSpot(TM) 64-Bit Server VM 1.7.0_76, Oracle Corporation
What would you suggest? Do I have obvious mistakes? What to change?
Am I do greedy by giving Java only 20G? The assumption here is that giving it too much heap would mean longer GC as there is simply more to clean (peasant logic).
PS: Application is not mine. For me its a box-product.

What would you suggest? Do I have obvious mistakes? What to change? Am I do greedy by giving Java only 20G? The assumption here is that giving it too much heap would mean longer GC as there is simply more to clean (peasant logic).
If it triggers full GCs but your occupancy stays near those 20GB then it's possible that the GC simply does not have enough breathing room, either to meet the demand of huge allocations or or to meet some of its goals (throughput, pause times), forcing full GCs as a fallback.
So what you can attempt is increasing the heap limit or relaxing the throughput goals.
As mentioned earlier in my comment you can also try upgrading to java8 for improved G1 heuristics.
For further advice GC logs covering the "berzerk" behavior would be useful.

Related

G1GC very high GC count and CPU, very frequency GCs that kill performance

I've recently switched my Java application from CMS + ParNew to G1GC.
What I observed when I did the switch is the CPU usage went higher and the GC count + pause time went up as well.
My JVM flags before the switched were
java -Xmx22467m -Xms22467m -XX:NewSize=11233m -XX:+UseConcMarkSweepGC -XX:AutoBoxCacheMax=1048576 -jar my-application.jar
After the switch my flags are:
java -Xmx22467m -Xms22467m -XX:+G1GC -XX:AutoBoxCacheMax=1048576 -XX:MaxGCPauseMillis=30 -jar my-application.jar
I followed Oracle's Best Practices http://www.oracle.com/technetwork/tutorials/tutorials-1876574.html
Do not Set Young Generation Size
And did not set the young generation size.
However I am suspecting that the young generation size is the problem here.
What I see is the heap usage is fluctuating between ~6 - 8 GB.
Whereas before, with CMS and Par New there the memory usage grew between 4-16 GB and only then I saw a GC:
I am not sure I understand why with G1GC the GC is so frequent. I am not sure what I'm missing when it comes to GC tuning with G1GC.
I'm using Java 8 :
ava version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
I appreciate your help.
UPDATE:
A bit more information about those pauses:
As you can see all those pauses are G1New, and seemingly they are as long as my target pause time, which is 30ms.
When I look at the ParNew pauses before the switch to G1GC, this is how it looked like:
So they are also all young gen collections (ParNew) but they are less frequent and shorter, because they happen only when the heap usage gets to around 14GB (according to the graph)
I am still clueless why the G1New happen so early (in terms of heap usage)
Update 2
I also noticed that NewRatio=2, I don't know if G1GC is respecting that, but that would mean that my New Gen is capped at 7GB. Could that be the reason?
Update 3
Adding G1GC GC logs:
https://drive.google.com/file/d/1iWqZCbB-nU6k_0-AQdvb6vaBSYbkQcqn/view?usp=sharing

I was able to see that the time spent in copying objects is very significant. Looks like G1GC has 15 generations by default before the object is promoted to Tenured Generation.
I reduced it to 1 (-XX:MaxTenuringThreshold=1)
Also I don't know how to confirm it in the logs, however visualizing the GC log I saw that the young generation is constantly being resized, from minimum size to maximum size. I narrowed down the range and that also improved the performance.
Looking here https://docs.oracle.com/javase/9/gctuning/garbage-first-garbage-collector-tuning.htm#JSGCT-GUID-70E3F150-B68E-4787-BBF1-F91315AC9AB9
I was trying to figure out if coarsenings is indeed an issue. But it simply says to set gc+remset=trace which I do not understand how to pass to java in command line, and if it's even available in JDK 8.
I increased the XX:G1RSetRegionEntries a bit just in case.
I hope it helps to the future G1GC tuner and if anyone else has more suggestions that would be great.
What I still see is that [Processed Buffers] is still taking a very long time in young evacuations, and [Scan RS] is very long in mixed collections.
Not sure why

Your GC log shows an average GC pause interval of 2 seconds with each around 30-40ms, which amounts to an application throughput of around 95%. That does not amount to "killing performance" territory. At least not due to GC pauses.
G1 does more concurrent work though, e.g. for remembered set refinement and your pauses seem to spend some time in update/scan RS, so I assume the concurrent GC threads are busy too, i.e. it may need additional CPU cycles outside GC pauses, which is not covered by the logs by default, you need +G1SummarizeRSetStats for that. If latency is more important you might want to allocated more cores to the machine, if throughput is more important you could tune G1 to perform more of the RS updates during the pauses (at the cost of increased pause times).

What does JProfiler "Run GC" button uses to clean garbage compare to regular GC tuning being done by JVM

I have java enterprise application which is consuming more memory since few days, even though GC is running and we have adequate parameters set (ConcMarkSweepGC) it is not freeing complete memory.
When I have attached JProfiler, it is observed that whenever GC is running it is only clearing lets say if it was consuming 9GB, only around 1 to 1.2 GB is getting cleared. At the same time if I click on "Run GC" button attached with JProfiler it clears atleast 6-7 GB out of 9 GB occupied.
I was trying to understand what extra does Jprofiler GC does compare to regular GC executed by the application.
Following are few of the details required:
- App server: Wildfly 9
- Java version: Java 8
- OS: Windows 2012 - 64Bit
Any help around this would be helpful. Thanks in advance.

The behaviour varies between different GC algorithms but in principle a GC on the Old Space is not supposed to clear all unused memory at all times. In New Space we see a copying parallel GC to combat memory fragmentation but the Old Space is supposed to be significantly larger. Running such a GC would result in a long stop-the-world pause. You selected ConcMarkSweepGC which is a concurrent GC that won't attempt to execute the full stop-the-world GC cycle if there is enough free memory. You probably initiated a full stop-the-world GC on the Old Space with JProfiler.
If you want to understand it in detail read about different GC algorithms in JVM. There is quite a few of them and they are designed with different goals in mind.

G1 garbage collector: Perm Gen fills up indefinitely until a Full GC is performed

We have a fairly big application running on a JBoss 7 application server. In the past, we were using ParallelGC but it was giving us trouble in some servers where the heap was large (5 GB or more) and usually nearly filled up, we would get very long GC pauses frequently.
Recently, we made improvements to our application's memory usage and in a few cases added more RAM to some of the servers where the application runs, but we also started switching to G1 in the hopes of making these pauses less frequent and/or shorter. Things seem to have improved but we are seeing a strange behaviour which did not happen before (with ParallelGC): the Perm Gen seems to fill up pretty quickly and once it reaches the max value a Full GC is triggered, which usually causes a long pause in the application threads (in some cases, over 1 minute).
We have been using 512 MB of max perm size for a few months and during our analysis the perm size would usually stop growing at around 390 MB with ParallelGC. After we switched to G1, however, the behaviour above started happening. I tried increasing the max perm size to 1 GB and even 1,5 GB, but still the Full GCs are happening (they are just less frequent).
In this link you can see some screenshots of the profiling tool we are using (YourKit Java Profiler). Notice how when the Full GC is triggered the Eden and the Old Gen have a lot of free space, but the Perm size is at the maximum. The Perm size and the number of loaded classes decrease drastically after the Full GC, but they start rising again and the cycle is repeated. The code cache is fine, never rises above 38 MB (it's 35 MB in this case).
Here is a segment of the GC log:
2013-11-28T11:15:57.774-0300: 64445.415: [Full GC 2126M->670M(5120M), 23.6325510 secs]
[Eden: 4096.0K(234.0M)->0.0B(256.0M) Survivors: 22.0M->0.0B Heap: 2126.1M(5120.0M)->670.6M(5120.0M)]
[Times: user=10.16 sys=0.59, real=23.64 secs]
You can see the full log here (from the moment we started up the server, up to a few minutes after the full GC).
Here's some environment info:
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
Startup options: -Xms5g -Xmx5g -Xss256k -XX:PermSize=1500M -XX:MaxPermSize=1500M -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -Xloggc:gc.log
So here are my questions:
Is this the expected behaviour with G1? I found another post on the web of someone questioning something very similar and saying that G1 should perform incremental collections on the Perm Gen, but there was no answer...
Is there something I can improve/corrrect in our startup parameters? The server has 8 GB of RAM, but it doesn't seem we are lacking hardware, performance of the application is fine until a full GC is triggered, that's when users experience big lags and start complaining.

Causes of growing Perm Gen
Lots of classes, especially JSPs.
Lots of static variables.
There is a classloader leak.
For those that don't know, here is a simple way to think about how the PremGen fills up. The Young Gen doesn't get enough time to let things expire and so they get moved up to Old Gen space. The Perm Gen holds the classes for the objects in the Young and Old Gen. When the objects in the Young or Old Gen get collected and the class is no longer being referenced then it gets 'unloaded' from the Perm Gen. If the Young and Old Gen don't get GC'd then neither does the Perm Gen and once it fills up it needs a Full stop-the-world GC. For more info see Presenting the Permanent Generation.
Switching to CMS
I know you are using G1 but if you do switch to the Concurrent Mark Sweep (CMS) low pause collector -XX:+UseConcMarkSweepGC, try enabling class unloading and permanent generation collections by adding -XX:+CMSClassUnloadingEnabled.
The Hidden Gotcha'
If you are using JBoss, RMI/DGC has the gcInterval set to 1 min. The RMI subsystem forces a full garbage collection once per minute. This in turn forces promotion instead of letting it get collected in the Young Generation.
You should change this to at least 1 hr if not 24 hrs, in order for the the GC to do proper collections.
-Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000
List of every JVM option
To see all the options, run this from the cmd line.
java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version
If you want to see what JBoss is using then you need to add the following to your standalone.xml. You will get a list of every JVM option and what it is set to. NOTE: it must be in the JVM that you want to look at to use it. If you run it external you won't see what is happening in the JVM that JBoss is running on.
set "JAVA_OPTS= -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal %JAVA_OPTS%"
There is a shortcut to use when we are only interested in the modified flags.
-XX:+PrintcommandLineFlags
Diagnostics
Use jmap to determine what classes are consuming permanent generation space. Output will show
class loader
# of classes
bytes
parent loader
alive/dead
type
totals
jmap -permstat JBOSS_PID >& permstat.out
JVM Options
These settings worked for me but depending how your system is set up and what your application is doing will determine if they are right for you.
-XX:SurvivorRatio=8 – Sets survivor space ratio to 1:8, resulting in larger survivor spaces (the smaller the ratio, the larger the space). The SurvivorRatio is the size of the Eden space compared to one survivor space. Larger survivor spaces allow short lived objects a longer time period to die in the young generation.
-XX:TargetSurvivorRatio=90 – Allows 90% of the survivor spaces to be occupied instead of the default 50%, allowing better utilization of the survivor space memory.
-XX:MaxTenuringThreshold=31 – To prevent premature promotion from the young to the old generation . Allows short lived objects a longer time period to die in the young generation (and hence, avoid promotion). A consequence of this setting is that minor GC times can increase due to additional objects to copy. This value and survivor space sizes may need to be adjusted so as to balance overheads of copying between survivor spaces versus tenuring objects that are going to live for a long time. The default settings for CMS are SurvivorRatio=1024 and MaxTenuringThreshold=0 which cause all survivors of a scavenge to be promoted. This can place a lot of pressure on the single concurrent thread collecting the tenured generation. Note: when used with -XX:+UseBiasedLocking, this setting should be 15.
-XX:NewSize=768m – allow specification of the initial young generation sizes
-XX:MaxNewSize=768m – allow specification of the maximum young generation sizes
Here is a more extensive JVM options list.

Is this the expected behaviour with G1?
I don't find it surprising. The base assumption is that stuff put into permgen almost never becomes garbage. So you'd expect that permgen GC would be a "last resort"; i.e. something the JVM would only do if its was forced into a full GC. (OK, this argument is nowhere near a proof ... but its consistent with the following.)
I've seen lots of evidence that other collectors have the same behaviour; e.g.
permgen garbage collection takes multiple Full GC
What is going on with java GC? PermGen space is filling up?
I found another post on the web of someone questioning something very similar and saying that G1 should perform incremental collections on the Perm Gen, but there was no answer...
I think I found the same post. But someone's opinion that it ought to be possible is not really instructive.
Is there something I can improve/corrrect in our startup parameters?
I doubt it. My understanding is that this is inherent in the permgen GC strategy.
I suggest that you either track down and fix what is using so much permgen in the first place ... or switch to Java 8 in which there isn't a permgen heap anymore: see PermGen elimination in JDK 8
While a permgen leak is one possible explanation, there are others; e.g.
overuse of String.intern(),
application code that is doing a lot of dynamic class generation; e.g. using DynamicProxy,
a huge codebase ... though that wouldn't cause permgen churn as you seem to be observing.

I would first try to find the root cause for the PermGen getting larger before randomly trying JVM options.
You could enable classloading logging (-verbose:class, -XX:+TraceClassLoading -XX:+TraceClassUnloading, ...) and chek out the output
In your test environment, you could try monitoring (over JMX) when classes get loaded (java.lang:type=ClassLoading LoadedClassCount). This might help you find out which part of your application is responsible.
You could also try listing all the classes using the JVM tools (sorry but I still mostly use jrockit and there you would do it with jrcmd. Hope Oracle have migrated those helpful features to Hotspot...)
In summary, find out what generates so many classes and then think how to reduce that / tune the gc.
Cheers,
Dimo

I agree with the answer above in that you should really try to find what is actually filling your permgen, and I'd heavily suspect it's about some classloader leak that you want to find a root cause for.
There's this thread in the JBoss forums that goes through couple of such diagnozed cases and how they were fixed. this answer and this article discusses the issue in general as well. In that article there's a mention of possibly the easiest test you can do:
Symptom
This will happen only if you redeploy your application without
restarting the application server. The JBoss 4.0.x series suffered
from just such a classloader leak. As a result I could not redeploy
our application more than twice before the JVM would run out of
PermGen memory and crash.
Solution
To identify such a leak, un-deploy your application and then trigger a
full heap dump (make sure to trigger a GC before that). Then check if
you can find any of your application objects in the dump. If so,
follow their references to their root, and you will find the cause of
your classloader leak. In the case of JBoss 4.0 the only solution was
to restart for every redeploy.
This is what I'd try first, IF you think that redeployment might be related. This blog post is an earlier one, doing the same thing but discussing the details as well. Based on the posting it might be though that you're not actually redeploying anything, but permgen is just filling up by itself. In that case, examination of classes + anything else added to permgen might be the way (as has been already mentioned in previous answer).
If that doesn't give more insight, my next step would be trying out plumbr tool. They have a sort of guarantee on finding the leak for you, as well.

You should be starting your server.bat with java command with -verbose:gc

Appropriate JVM/GC tuning for 4GB JVM with 3GB cache

I am looking for the appropriate settings to configure the JVM for a web application. I have read about old/young/perm generation, but I have trouble using those parameters at best for this configuration.
Out of the 4 GB, around 3 GB are used for a cache (applicative cache using EhCache), so I'm looking for the best set up considering that. FYI, the cache is static during the lifetime of the application (loaded from disk, never expires), but heavily used.
I have profiled my application already, and I have performed optimization regarding the DB queries, the application's architecture, the cache size, etc... I am just looking for JVM configuration advices here. I have measured 99% throughput for the Garbage Collector, and 6-8s pauses when the Full GC runs (approximately once every 1/2h).
Here are the current JVM parameters:
-XX:+UseParallelGC -XX:+AggressiveHeap -Xms2048m -Xmx4096m
-XX:NewSize=64m -XX:PermSize=64m -XX:MaxPermSize=512m
-verbose:gc -XX:+PrintGCDetails -Xloggc:gc.log
Those parameters may be completely off because they have been written a long time ago... Before the application became that big.
I am using Java 1.5 64 bits.
Do you see any possible improvements?
Edit: the machine has 4 cores.

-XX:+UseParallel*Old*GC should speed up the Full GCs on a multi core machine.
You could also profile with different NewRatio values. Your cached objects will live in the tenured generation so profile it with -XX:NewRatio=7 and then again with some higher and lower values.
You may not be able to accurately replicate realistic use during profiling, so make sure you monitor GC when it is in real life use and then you can make minor changes (e.g. to survivor space etc) and see what effect they have.
Old advice was not to use AggressiveHeap with Xms and Xmx, I am not sure if that is still true.
Edit: Please let us know which OS/hardware platform you are deployed on.
Full collections every 30 mins indicates the old generation is quite full. A high value for newRatio will give it more space at the expense of the young gen. Can you give the JVM more than 4g or are you limited to that?
It would also be useful to know what your goals / non functional requirements are. Do you want to avoid these 6 / 7 second pauses at the risk of lower throughput or are those pauses an acceptable compromise for highest possible throughput?
If you want to minimise the pauses, try the CMS collector by removing both
-XX:+UseParallelGC -XX:+UseParallelOldGC
and adding
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
Profile with that with various NewRatio values and see how you get on.
One downside of the CMS collector is that unlike the parallel old and serial collectors, it doesn't compact the old generation. If the old generation gets too fragmented and a minor collection needs to promote a lot of objects to the old gen at once, a full serial collection may be invoked which could mean a long pause. (I've seen this once in prod but with the IBM JVM which went out of memory instead of invoking a compacting collection!)
This might not be a problem for you - it depends on the nature of the application - but you can insure against it by restarting nightly or weekly.

I would use Java 6 update 30 or 7 update 2, 64-bit as they are much more efficient. e.g. they use 32-bit references by default.
I would also configure Ehcache to use direct memory or a memory mapped file if possible. This should minimise the impact on GC.
Using these options its possible to almost eliminate your heap foot print. e.g. I have an app which uses up to 180 GB of memory mapped files on a machine with 16 GB of memory and the heap size is 6 MB. A full GC takes up to 11 ms when trigger manually, not that it ever GCs. ;)
If you want a simple example where I map in an 8 TB file into memory and update it. http://vanillajava.blogspot.com/2011/12/using-memory-mapped-file-for-huge.html

I hope you just removed -server to not inflate the post, otherwise you should instantly enable it. Apart from the bit longer startup time (which really isn't an issue for a web application that should run days) I don't see any reason to use anything but c2. That could give some nice performance improvements in general. Umn back to topic:
Sadly the best thing I can think of won't work with your ancient JVM. The G1 garbage collector was basically designed to reduce latency. Not only does it try to reduce pauses in general, it also offers some tuning parameters to set pause goals and intervals. See this page.
There is an experimental backport to java6 though I doubt it's kept up to date. And nobody is wasting any time on optimizing GC or anything else for Java 1.5 anymore I fear.
PS: There would also be IBM's JVM and obviously azul systems (ok that wasn't a serious proposition ;) ), but those are obviously out of the question.. just wanted to mention them.

Java very large heap sizes [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Does anyone have experience with using very large heaps, 12 GB or higher in Java?
Does the GC make the program unusable?
What GC params do you use?
Which JVM, Sun or BEA would be better suited for this?
Which platform, Linux or Windows, performs better under such conditions?
In the case of Windows is there any performance difference to be had between 64 bit Vista and XP under such high memory loads?

If your application is not interactive, and GC pauses are not an issue for you, there shouldn't be any problem for 64-bit Java to handle very large heaps, even in hundreds of GBs. We also haven't noticed any stability issues on either Windows or Linux.
However, when you need to keep GC pauses low, things get really nasty:
Forget the default throughput, stop-the-world GC. It will pause you application for several tens of seconds for moderate heaps (< ~30 GB) and several minutes for large ones (> ~30 GB). And buying faster DIMMs won't help.
The best bet is probably the CMS collector, enabled by -XX:+UseConcMarkSweepGC. The CMS garbage collector stops the application only for the initial marking phase and remarking phases. For very small heaps like < 4 GB this is usually not a problem, but for an application that creates a lot of garbage and a large heap, the remarking phase can take quite a long time - usually much less then full stop-the-world, but still can be a problem for very large heaps.
When the CMS garbage collector is not fast enough to finish operation before the tenured generation fills up, it falls back to standard stop-the-world GC. Expect ~30 or more second long pauses for heaps of size 16 GB. You can try to avoid this keeping the long-lived garbage production rate of you application as low as possible. Note that the higher the number of the cores running your application is, the bigger is getting this problem, because the CMS utilizes only one core. Obviously, beware there is no guarantee the CMS does not fall back to the STW collector. And when it does, it usually happens at the peak loads, and your application is dead for several seconds. You would probably not want to sign an SLA for such a configuration.
Well, there is that new G1 thing. It is theoretically designed to avoid the problems with CMS, but we have tried it and observed that:
Its throughput is worse than that of CMS.
It theoretically should avoid collecting the popular blocks of memory first, however it soon reaches a state where almost all blocks are "popular", and the assumptions it is based on simply stop working.
Finally, the stop-the-world fallback still exists for G1; ask Oracle, when that code is supposed to be run. If they say "never", ask them, why the code is there. So IMHO G1 really doesn't make the huge heap problem of Java go away, it only makes it (arguably) a little smaller.
If you have bucks for a big server with big memory, you have probably also bucks for a good, commercial hardware accelerated, pauseless GC technology, like the one offered by Azul. We have one of their servers with 384 GB RAM and it really works fine - no pauses, 0-lines of stop-the-world code in the GC.
Write the damn part of your application that requires lots of memory in C++, like LinkedIn did with social graph processing. You still won't avoid all the problems by doing this (e.g. heap fragmentation), but it would be definitely easier to keep the pauses low.

I am CEO of Azul Systems so I am obviously biased in my opinion on this topic! :) That being said...
Azul's CTO, Gil Tene, has a nice overview of the problems associated with Garbage Collection and a review of various solutions in his Understanding Java Garbage Collection and What You Can Do about It presentation, and there's additional detail in this article: http://www.infoq.com/articles/azul_gc_in_detail.
Azul's C4 Garbage Collector in our Zing JVM is both parallel and concurrent, and uses the same GC mechanism for both the new and old generations, working concurrently and compacting in both cases. Most importantly, C4 has no stop-the-world fall back. All compaction is performed concurrently with the running application. We have customers running very large (hundreds of GBytes) with worse case GC pause times of <10 msec, and depending on the application often times less than 1-2 msec.
The problem with CMS and G1 is that at some point Java heap memory must be compacted, and both of those garbage collectors stop-the-world/STW (i.e. pause the application) to perform compaction. So while CMS and G1 can push out STW pauses, they don't eliminate them. Azul's C4, however, does completely eliminate STW pauses and that's why Zing has such low GC pauses even for gigantic heap sizes.

We have an application that we allocate 12-16 Gb for but it really only reaches 8-10 during normal operation. We use the Sun JVM (tried IBMs and it was a bit of a disaster but that just might have been ignorance on our part...I have friends that swear by it--that work at IBM). As long as you give your app breathing room, the JVM can handle large heap sizes with not too much GC. Plenty of 'extra' memory is key.
Linux is almost always more stable than Windows and when it is not stable it is a hell of a lot easier to figure out why. Solaris is rock solid as well and you get DTrace too :)
With these kind of loads, why on earth would you be using Vista or XP? You are just asking for trouble.
We don't do anything fancy with the GC params. We do set the minimum allocation to be equal to the maximum so it is not constantly trying to resize but that is it.

I have used over 60 GB heap sizes on two different applications under Linux and Solaris respectively using 64-bit versions (obviously) of the Sun 1.6 JVM.
I never encountered garbage collection problems with the Linux-based application except when pushing up near the heap size limit. To avoid the thrashing problems inherent to that scenario (too much time spent doing garbage collection), I simply optimized memory usage throughout the program so that peak usage was about 5-10% below a 64 GB heap size limit.
With a different application running under Solaris, however, I encountered significant garbage-collection problems which made it necessary to do a lot of tweaking. This consisted primarily of three steps:
Enabling/forcing use of the parallel garbage collector via the -XX:+UseParallelGC -XX:+UseParallelOldGC JVM options, as well as controlling the number of GC threads used via the -XX:ParallelGCThreads option. See "Java SE 6 HotSpot Virtual Machine Garbage Collection Tuning" for more details.
Extensive and seemingly ridiculous setting of local variables to "null" after they are no longer needed. Most of these were variables that should have been eligible for garbage collection after going out of scope, and they were not memory leak situations since the references were not copied. However, this "hand-holding" strategy to aid garbage collection was inexplicably necessary for some reason for this application under the Solaris platform in question.
Selective use of the System.gc() method call in key code sections after extensive periods of temporary object allocation. I'm aware of the standard caveats against using these calls, and the argument that they should normally be unnecessary, but I found them to be critical in taming garbage collection when running this memory-intensive application.
The three above steps made it feasible to keep this application contained and running productively at around 60 GB heap usage instead of growing out of control up into the 128 GB heap size limit that was in place. The parallel garbage collector in particular was very helpful since major garbage-collection cycles are expensive when there are a lot of objects, i.e., the time required for major garbage collection is a function of the number of objects in the heap.
I cannot comment on other platform-specific issues at this scale, nor have I used non-Sun (Oracle) JVMs.

12Gb should be no problem with a decent JVM implementation such as Sun's Hotspot.
I would advice you to use the Concurrent Mark and Sweep colllector ( -XX:+UseConcMarkSweepGC) when using a SUN VM.Otherwies you may face long "stop the world" phases, were all threads are stopped during a GC.
The OS should not make a big difference for the GC performance.
You will need of course a 64 bit OS and a machine with enough physical RAM.

I recommend also considering taking a heap dump and see where memory usage can be improved in your app and analyzing the dump in something such as Eclipse's MAT . There are a few articles on the MAT page on getting started in looking for memory leaks. You can use jmap to obtain the dump with something such as ...
jmap -heap:format=b pid

As mentioned above, if you have a non-interactive program, the default (compacting) garbage collector (GC) should work well. If you have an interactive program, and you (1) don't allocate memory faster than the GC can keep up, and (2) don't create temporary objects (or collections of objects) that are too big (relative to the total maximum JVM memory) for the GC to work around, then CMS is for you.
You run into trouble if you have an interactive program where the GC doesn't have enough breathing room. That's true regardless of how much memory you have, but the more memory you have, the worse it gets. That's because when you get too low on memory, CMS will run out of memory, whereas the compacting GCs (including G1) will pause everything until all the memory has been checked for garbage. This stop-the-world pause gets bigger the more memory you have. Trust me, you don't want your servlets to pause for over a minute. I wrote a detailed StackOverflow answer about these pauses in G1.
Since then, my company has switched to Azul Zing. It still can't handle the case where your app really needs more memory than you've got, but up until that very moment it runs like a dream.
But, of course, Zing isn't free and its special sauce is patented. If you have far more time than money, try rewriting your app to use a cluster of JVMs.
On the horizon, Oracle is working on a high-performance GC for multi-gigabyte heaps. However, as of today that's not an option.

If you switch to 64-bit you will use more memory. Pointers become 8 bytes instead of 4. If you are creating lots of objects this can be noticeable seeing as every object is a reference (pointer).
I have recently allocated 15GB of memory in Java using the Sun 1.6 JVM with no problems. Though it is all only allocated once. Not much more memory is allocated or released after the initial amount. This was on a Linux but I imagine the Sun JVM will work just as well on 64-bit Windows.

You should try running visualgc against your app. It´s a heap visualization tool that´s part of the jvmstat download at http://java.sun.com/performance/jvmstat/
It is a lot easier than reading GC logs.
It quickly helps you understand how the parts (generations) of the heap are working. While your total heap may be 10GB, the various parts of the heap will be much smaller. GCs in the Eden portion of the heap are relatively cheap, while full GCs in the old generation are expensive. Sizing your heap so that that the Eden is large and the old generation is hardly ever touched is a good strategy. This may result in a very large overall heap, but what the heck, if the JVM never touches the page, it´s just a virtual page, and doesn´t have to take up RAM.

A couple of years ago, I compared JRockit and the Sun JVM for a 12G heap. JRockit won, and Linux hugepages support made our test run 20% faster. YMMV as our test was very processor/memory intensive and was primarily single-threaded.

here's an article on gc FROM one of Java Champions --
http://kirk.blog-city.com/is_your_concurrent_collector_failing_you.htm
Kirk, the author writes
"Send me your GC logs
I'm currently interested in studying Sun JVM produced GC logs. Since these logs contain no business relevent information it should be ease concerns about protecting proriatary information. All I ask that with the log you mention the OS, complete version information for the JRE, and any heap/gc related command line switches that you have set. I'd also like to know if you are running Grails/Groovey, JRuby, Scala or something other than or along side Java. The best setting is -Xloggc:. Please be aware that this log does not roll over when it reaches your OS size limit. If I find anything interesting I'll be happy to give you a very quick synopsis in return. "

An article from Sun on Java 6 can help you: https://www.oracle.com/java/technologies/javase/troubleshooting-javase.html

The max memory that XP can address is 4 gig(here). So you may not want to use XP for that(use a 64 bit os).

sun has had an itanium 64-bit jvm for a while although itanium is not a popular destination. The solaris and linux 64-bit JVMs should be what you should be after.
Some questions
1) is your application stable ?
2) have you already tested the app in a 32 bit JVM ?
3) is it OK to run multiple JVMs on the same box ?
I would expect the 64-bit OS from windows to get stable in about a year or so but until then, solaris/linux might be better bet.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.