Selecting a JVM for an embedded streaming device

Selecting a JVM for an embedded streaming device - java

We're developing a device that's basically a raspberry pi that reads file data, processes it, and streams data out of a USB device at a given frame rate.
Due to the nature of the features we're using, we can't totally eliminate garbage allocation, and our GC pauses for even minor, young generation GC are causing frame skips.
Right now we're using HotSpot JVM, but my understanding is that it's better suited to large heap sizes, our memory needs rarely go over 256mb, so I'm wondering if there's a better VM with garbage collection that can give us pauses less than 15ms on a Raspberry Pi?

I think you will really struggle with this. You don't provide the flags you're using to start your JVM so it's not possible to recommend alternatives.
A well configured G1 collector with an application that does not generate constantly increasing long-life objects will avoid stop-the-world full GCs. However, your problem is that even minor GCs (which are typically very fast) are causing unacceptable latency. Part of the problem is the speed of the memory bus on the Pi, which is not really that great.
We (Azul, who I work for) produce a pauseless collector (C4) but that is designed for machines with more resources. It needs a minimum of 1Gb RAM and uses multiple cores to handle GC concurrently with application threads.

Ultimately, we've decided that we're fighting uphill making the application do something it really wasn't built for, or at least, it's not worth the development resources to continue along that path.
Our current solution is to leave the application as it is, and live with the reality of garbage collection pauses so we don't hamstring future development of our application with a crazy amount of optimization. Let Java do what it was designed to do.
To stop pauses that are causing frame skips, we have instead opted to create a second, tiny buffer application, managed by our primary application via IPC.

Related

Detecting memory-leak programmatically

If, on purpose, I create an application that crunches data while suffering from memory-leaks, I can notice that the memory as reported by, say:
Runtime.getRuntime().freeMemory()
starts oscillating between 1 and 2 MB of free memory.
The application then enters a loop that goes like this: GC, processing some data, GC, etc. but because the GC happens so often, the application basically isn't doing much else anymore. Even the GUI takes age to respond (and, no, I'm not talking about EDT issues here, it's really the VM basically stuck in some endless GC'ing mode).
And I was wondering: is there a way to programmatically detect that the JVM doesn't have enough memory anymore?
Note that I'm not talking about ouf-of-memory errors nor about detecting the memory leak itself.
I'm talking about detecting that an application is running so low on memory that it is basically calling the GC all the time, leaving hardly any time to do something else (in my hypothetical example: crunching data).
Would it work, for example, to repeatedly read how much memory is available during, say, one minute, and see that if the number has been "oscillating" between different values all below, say, 4 MB, conclude that there's been some leak and that the application has become unusable?

And I was wondering: is there a way to programmatically detect that the JVM doesn't have enough memory anymore?
I don't think so. You can find out roughly how much heap memory is free at any given instant, but AFAIK you cannot reliably determine when you are running out of memory. (Sure, you can do things like scraping the GC log files, or trying to pick patterns in the free memory oscillations. But these are likely to be unreliable and fragile in the face of JVM changes.)
However, there is another (and IMO better) approach.
In recent versions of Hotspot (version 1.6 and later, I believe), you can tune the JVM / GC so that it will give up and throw an OOME sooner. Specifically, the JVM can be configured to check that:
the ratio of free heap to total heap is greater than a given threshold after a full GC, and/or
the time spent running the GC is less than a certain percentage of the total.
The relevant JVM parameters are "UseGCOverheadLimit", "GCTimeLimit" and "GCHeapFreeLimit". Unfortunately, Hotspot's tuning parameters are not well documented on the public web, but these ones are all listed here.
Assuming that you want your application to do the sensible thing ... give up when it doesn't have enough memory to run properly anymore ... then just launch the JVM with a smaller "GCTimeLimitor" or "GCHeapFreeLimit" than the defaults.
EDIT
I've discovered that the MemoryPoolMXBean API allows you to look at the peak usage of individual memory pools (heaps), and set thresholds. However, I've never tried this, and the APIs have lots of hints that suggest that not all JVMs implement the full API. So, I would still recommend the HotSpot tuning option approach (see above) over this one.

You can use getHeapMemoryUsage.

I see two attack vectors.
Either monitor your memory consumption.
When you more or less constantly use lots of the available memory it is very likely that you have a memory leak (or are just using too much memory). The vm will constantly try to free some memory without much success => constant high memory usage.
You need to distinguish that from a large zigzag pattern which happens often without being an indicator of memory problem. Basically you use more an more memory, but when gc finds time to do its job it finds lots of garbage to bring out, so everything is fine.
The other attack vector is to monitor how often and what kind of success the gc runs. If it runs often with only small gains in memory, it is likely you have a problem.
I don't know if you can access this kind of information directly from your program. But if nothing else I think you can specify parameters on startup which makes the gc log information into a file which in turn could get parsed.

What you could do is spawn a thread that wakes up periodically and calculates the amount of used memory and records the result. Then you can do regression analysis on the result to estimate the rate of memory growth in your application. If you know the rate of growth, and the maximum amount of memory, you can predict (with some confidence) when your application will run out of memory.

You can pass arguments to your java virtual machine that gives you GC diagnostics such as
-verbose:gc This flag turns on the logging of GC information. Available
in all JVMs.
-XX:+PrintGCTimeStamps Prints the times at which the GCs happen
relative to the start of the
application.
If you capture that output in a file, in your application you can periodcly read that file and parse it to know when the GC has happened. So you can work out the average time between every GC

I think the JVM does exactly this for you and throws java.lang.OutOfMemoryError: GC overhead limit exceeded. So if you catch OutOfMemoryError and check for that message then you have what you want, don't you?
See this question for more details

i've been using plumbr for memory leak detection and it's been a great experience, though the licence is very expensive: http://plumbr.eu/

Is a garbage collector (.net/java) an issue for real-time systems?

When building a system which needs to respond very consistently and fast, is having a garbage collector a potential problem?
I remember horror stories from years ago where the typical example always was an action game where your character would stop for a few seconds in mid-jump, when the garbage collector would do its cleanup.
We are some years further, but I'm wondering if this is still an issue. I read about the new garbage collector in .Net 4, but it still seems a lot like a big black box, and you just have to trust everything will be fine.
If you have a system which always has to be quick to respond, is having a garbage collector too big of a problem and is it better to chose for a more hardcore, control it yourself language like c++? I would hate it that if it turns out to be a problem, that there is basically almost nothing you can do about it, other than waiting for a new version of the runtime or doing very weird things to try and influence the collector.
EDIT
thanks for all the great resources. However, it seems that most articles/custom gc's/solutions pertain to the Java environment. Does .Net also have tuning capabilities or options for a custom GC?

To be precise, garbage collectors are a problem for real-time systems. To be even more precise, it is possible to write real-time software in languages that have automatic memory management.
More details can be found in the Real Time Specification for Java on one of the approaches for achieving real-time behavior using Java. The idea behind RTSJ is very simple - do not use a heap. RTSJ provides for new varieties of Runnable objects that ensure threads do not access heap memory of any kind. Threads can either access scoped memory (nothing unusual here; values are destroyed when the scope is closed) or immortal memory (that exists throughout the application lifetime). Variables in the immortal memory are written over, time and again with new values.
Through the use of immortal memory, RTSJ ensures that threads do not access the heap, and more importantly, the system does not have a garbage collector that preempts execution of the program by the threads.
More details are available in the paper "Project Golden Gate: Towards Real-Time Java in Space Missions" published by JPL and Sun.

I've written games in Java and .NET and never found this to be a big problem. I expect your "horror stories" are based on the garbage collectors of many years ago - the technology really has moved a long way since then.
The only thing I would hesitate to use Java/.NET for on the the basis of garbage collection would be something like embedded programming with hard real time constraints (e.g. motion controllers).
However you do need to be aware of GC pauses and all of the following can be helpful in minimising the risk of GC pauses:
Minimise new object allocations - while object allocations are extremely fast in modern GC systems, they do contribute to future pauses so should be minimised. You can use techniques like pre-allocating arrays of objects, keeping object pools or using unboxed primitives.
Use specialized low-latency libraries such as Javalution for heavily used functions and data types. These are designed specifically for real-time / low latency application
Make sure you are using the best GC algorithm when there are multiple versions available. I've heard good things about the Sun G1 Collector for low latency applications. The best GC systems do most of their collections concurrently so that garbage collections do not have to "stop the world" for very long if at all.
Tune the GC parameters appropriately. Usually there is a trade-off between overall performance and pause times, you may want to improve the latter at the expense of the former.
If you're very rich, you can of course buy machines with hardware GC support. :-)

Yes, garbage must be handled in a deterministic manner in real-time systems.
One approach is to schedule a certain amount of garbage collection time during each memory allocation. This is called "work-based garbage collection." The idea is that in the absence of leaks, allocation and collection should be proportional.
Another simple approach ("time-based garbage collection") is to schedule a certain proportion of time for periodic garbage collection, whether it is needed or not.
In either case, it is possible that a program will run out of usable memory because it is not allowed to spend enough time to do a full garbage collection. This is in contrast to a non-realtime system, which is permitted to pause as long as it needs to in order to collect garbage.

On a theoretical point of view, garbage collectors are not a problem but a solution. Real-time systems are hard, when there is dynamic memory allocation. In particular, the usual C functions malloc() and free() do not offer real-time guarantees (they are normally fast but have, at least theoretically, "worst cases" where they use inordinate amounts of time).
It so happens that it is possible to build a dynamic memory allocator which offers real-time guarantees, but this requires the allocator to do some heavy stuff, in particular moving some objects in RAM. Object moving implies adjusting pointers (transparently, from the application code point of view), and at that point the allocator is just one small step away from being a garbage collector.
Usual Java or .NET implementations do not offer real-time garbage collection, in the sense of guaranteed response times, but their GC are still heavily optimized and have very short response times most of the time. Under normal conditions, very short average response times are better than guaranteed response times ("guaranteed" does not mean "fast").
Also, note that usual Java or .NET implementations run on operating systems which are not real-time either (the OS can decide to schedule other threads, or may aggressively send some data to a swap file, and so on), and neither is the underlying hardware (e.g. a typical hard disk may make "recalibration pauses" on time to time). If you are ready to tolerate the occasional timing glitch due to the hardware, then you should be fine with a (carefully tuned) JVM garbage collector. Even for games.

It is a potential problem, BUT...
Your character might also freeze in the middle of your C++ program while the OS retrieves a page of memory from an overtaxed hard disk. If you are not using a real-time OS on hardware designed to provide concrete performance guarantees, you are never guaranteed performance.
To get a more specific answer, you'd have to ask about a specific implementation of a specific virtual machine. You can use a garbage-collected virtual machine for real-time systems if it provides suitable performance guarantees about garbage collection.

You bet it is a problem. If you are writing low-latency applications you cannot afford the stop-the-world pauses that most garbage collectors impose. Since Java does not allow you to turn off the GC, your only option is to produce no garbage. That can be done and has been done through object pooling and bootstrapping. I wrote a blog article where I talk about this in detail.

Our company is employing a large .Net-based software application that amongst other things monitors binary sensors over fieldbus networks. In some situations, the sensors activate only for a short amount of time (300 ms) but our software still needs to capture those events as the controlled system will immediately fail when an event is missed. We recently observed increased problems at our customer sites due to the garbage collector running for long timespans (up to 1 second). We are still trying to figure out how to enforce a time limit on the garbage collector. In conclusion of this short story, i would say the garbage collector is a handicap in time critical applications.

Can Sun JVM handle gigantic heap sizes without problems, and how?

I have heard several people claiming that you can not scale the JVM heap size up. I've heard claims of the practical limit being 4 gigabytes (I heard an IBM consultant say that), 10 gigabytes, 32 gigabytes, and so on... I simply can not believe any of those numbers and have been wondering about the issue now for a while.
So, I have three part question I would hope someone with experience could answer:
Given the following case how would you tune the heap and GC settings?
Would there be noticeable hickups (pauses of JVM etc) that would be noticed by the end users?
Should this really still work? I think it should.
The case:
64 bit platform
64 cores
64 gigabytes of memory
The application server is client facing (ie. Jboss/tomcat web application server) - complete pauses of JVM would probably be noticed by end users
Sun JVM, probably 1.5
To prove I am not asking you guys to do my homework this is what I came up with:
-XX:+UseConcMarkSweepGC -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions -XX:-EliminateZeroing -Xmn768m -Xmx55000m
CMS should reduce the amount of pauses, although it comes with overhead. The other settings for CMS seem to default automatically to the number of CPUs so they seem sane to me. The rest that I added are extras that might do good or bad generally for performance, and they should probably be tested.
Definitely.

I think it's going to be difficult for anybody to give you anything more than general advice, without having further knowledge of your application.
What I would suggest is that you use VisualGC (or the VisualGC plugin for VisualVM) to actually look at what the garbage collection is doing when your app is running. Once you have a greater understanding of how the GC is working alongside your application, it'll be far easier to tune it.

#1. Given the following case how would you tune the heap and GC settings?
First, having 64 gigabytes of memory doesn't imply that you have to use them all for one JVM. Actually, it rather means you can run many of them. Then, it is impossible to answer your question without any access to your machine and application to measure and analyse things (knowing what your application is doing isn't enough). And no, I'm not asking to get access to your environment :)
#2. Would there be noticeable hickups (pauses of JVM etc) that would be noticed by the end users?
The goal of tuning is to find a good compromise between frequency and duration of (major) GCs. With a ~55g heap, GC won't be frequent but will take noticeable time, for sure (the bigger the heap, the longer the major GC). Using a Parallel or Concurrent garbage collector will help on multiprocessor systems but won't entirely solve this issue. Why do you need ~55g (this is mega ultra huge for a webapp IMO), that's my question. I'd rather run many clustered JVMs to handle load if required (at some point, the database will become the bottleneck anyway with a data oriented application).
#3. Should this really still work? I think it should.
Hmm... not sure I get the question. What is "this"? Instantiating a JVM with a big heap? Yes, it should. Is it equivalent to running several JVMs? No, certainly not.
PS: 4G is the maximum theoretical heap limit for the 32-bit JVM running on a 64-bit operating system (see Why can't I get a larger heap with the 32-bit JVM?)
PPS: On 64-bit VMs, you have 64 bits of addressability to work with resulting in a maximum Java heap size limited only by the amount of physical memory and swap space your system provides. (see How large a heap can I create using a 64-bit VM?)

Obviously heap size is not unlimited and the larger is the heap size, the more your JVM will eventually spend on GC. Though I think it is possible to set heap size quite high on 64-bit JVM, I still think it's not really practical. The advice here is better to have several JVMs running with the same parameters i.e. cluster of JBoss/Tomcat nodes running on the same physical machine and you will get better throughput.
EDIT: Also your GC behavior depends on the taxonomy of your heap. If you have a lot of short-living objects and each request to the server creates a lot of those, then your GC will collect a lot of garbage very often and thus on large heap size this will result in longer pauses. If you have very many long-living objects (e.g. caching most of your data in memory) and the amount of short-living objects is not that big, then having bigger heap size is OK.

As Chris Rice already wrote, I wouldn't expect any obvious problems with the GC for heap sizes up to 32-64GB, although there may of course be some point of your application logic, which can cause problems.
Not directly related to GC, but I would still recommend you to perform a realistic load test on your production system. I used to work on a project, where we had a similar setup (relatively large, clustered JBoss/Tomcat setup to serve a public web application) and without exaggeration, JBoss is not behaving very well under high load or with a high number of concurrent calls if you are using EJBs. JBoss is spending a lot of time in synchronized blocks when accessing and managing the EJB instance pools and if you opt for a cluster, it will even wait for intra-cluster network communication within these synchronized blocks. Be especially aware of poorly performing state replication, if you are using SFSBs.

Only to add some more switches I would use by default: -Xms55g can help to reduce the rampup time because it frees Java from the need to check if it can fall back to the initial size and allows also better internal initial sizing of memory areas.
Additionally we made good experiences with NewSize to give you a large young size to get rid of short term garbage: -XX:NewSize=1g Additionally most webapps create a lot of short time garbage that will never survive the request processing. You can even make that bigger. With Xms55g, the VM reserves a large chunk already. Maybe downsizing can help.
-Xincgc helps to clean the young generation incrementally and return the cpu often to the user threads.
-XX:CMSInitiatingOccupancyFraction=70 If you really fill all that memory, try to start CMS garbage collection earlier.
-XX:+CMSIncrementalMode puts the CMS into incremental mode to return the cpu to the user threads more often.
Attach to the process with jstat -gc -h 10 <pid> 1s and watch the GC working.
Will you really fill up the memory? I assume that 64cpus for request processing might even be able to work with less memory. What do you store in there?

Depending on your GC pause analysis, you may wish to implement Incremental mode whereby the long pause may be broken out over a period of time.

I have found memory architecture plays a part in large memory sizes. Applications in general don't perform as well if they use more than one memory bank. The JVM appears to suffer as well, esp the GC which has to sweep the whole memory.
If you have an application which doesn't fit into one memory bank, your application has to pull in memory which is not local to a processor and use memory local to another processor.
On linux you can run numactl --hardware to see the layout of processors and memory banks.

What does fragmented memory look like?

I have a mobile application that is suffering from slow-down over time. My hunch, (In part fed by this article,) is that this is due to fragmentation of memory slowing the app down, but I'm not sure. Here's a pretty graph of the app's memory use over time:
fraggle rock http://kupio.com/image-dump/fragmented.png
The 4 peaks on the graph are 4 executions of the exact same task on the app. I start the task, it allocates a bunch of memory, it sits for a bit (The flat line on top) and then I stop the task. At that point it calls System.gc(); and the memory gets cleaned up.
As can be seen, each of the 4 runs of the exact same task take longer to execute. The low-points in the graph all return to the same level so there do not seem to be any memory leaks between task runs.
What I want to know is, is memory fragmentation a feasible explanation or should I look elsewhere first, bearing in mind that I've already done a lot of looking? The low-points on the graph are relatively low so my assumption is that in this state the memory would not be very fragmented since there can't be a lot of small memory holes to be causing problems.
I don't know how the j2me memory allocator works though, so I really don't know. Can anyone advise? Has anyone else had problems with this and recognises the memory profile of the app?

If you've got a little bit of time, you could test your theory by re-using the memory by using Memory Pool techniques: each run of the task uses the 'same' chunks of memory by getting them from the pool and returning them at release time.
If you're still seeing the degrading performance after doing this investigation, it's not memory fragmentation causing the problem. Let us all know your results and we can help troubleshoot further.

Memory fragmentation would account for it... what is not clear is whether the Apps use of memory is causing paging? this would also slow things up.... and could cause the same issues.

It the problem really is memory fragmentation, there is not much you can do about it.
But before you give up in despair, try running your app with a execution profiler to see if it is spending a lot of time executing in an unexpected place. It is possible that the slow down is actually due to a problem in your algorithms, and nothing to do with memory fragmentation. As people have already said, J2ME garbage collectors should not suffer from fragmentation issues.

Consider looking at garbage collection statistics. You should have a lot more on the last run than the first, if your theory is to hold. Another thought might be that something else eats your memory so your application has less.
In other words, profiler time :)

What OS are you running this on? I have some experience with Windows CE5 (or Windows Mobile) devices. CE5's operating system level memory architecture is quite broken and will fail soon for memory intensive applications. Your graph does not have any scales, but every process only gets 32MB of address space on CE5. The VM and shared libraries will take their fair share of that as well, leaving you with quite little left.
The only way around this is to re-use the memory you allocated instead of giving it back to the collector and re-allocating later. This is, of course, much more low-level programming than you would usually want to do in Java, but on this platform you might be out of luck.

Java very large heap sizes [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Does anyone have experience with using very large heaps, 12 GB or higher in Java?
Does the GC make the program unusable?
What GC params do you use?
Which JVM, Sun or BEA would be better suited for this?
Which platform, Linux or Windows, performs better under such conditions?
In the case of Windows is there any performance difference to be had between 64 bit Vista and XP under such high memory loads?

If your application is not interactive, and GC pauses are not an issue for you, there shouldn't be any problem for 64-bit Java to handle very large heaps, even in hundreds of GBs. We also haven't noticed any stability issues on either Windows or Linux.
However, when you need to keep GC pauses low, things get really nasty:
Forget the default throughput, stop-the-world GC. It will pause you application for several tens of seconds for moderate heaps (< ~30 GB) and several minutes for large ones (> ~30 GB). And buying faster DIMMs won't help.
The best bet is probably the CMS collector, enabled by -XX:+UseConcMarkSweepGC. The CMS garbage collector stops the application only for the initial marking phase and remarking phases. For very small heaps like < 4 GB this is usually not a problem, but for an application that creates a lot of garbage and a large heap, the remarking phase can take quite a long time - usually much less then full stop-the-world, but still can be a problem for very large heaps.
When the CMS garbage collector is not fast enough to finish operation before the tenured generation fills up, it falls back to standard stop-the-world GC. Expect ~30 or more second long pauses for heaps of size 16 GB. You can try to avoid this keeping the long-lived garbage production rate of you application as low as possible. Note that the higher the number of the cores running your application is, the bigger is getting this problem, because the CMS utilizes only one core. Obviously, beware there is no guarantee the CMS does not fall back to the STW collector. And when it does, it usually happens at the peak loads, and your application is dead for several seconds. You would probably not want to sign an SLA for such a configuration.
Well, there is that new G1 thing. It is theoretically designed to avoid the problems with CMS, but we have tried it and observed that:
Its throughput is worse than that of CMS.
It theoretically should avoid collecting the popular blocks of memory first, however it soon reaches a state where almost all blocks are "popular", and the assumptions it is based on simply stop working.
Finally, the stop-the-world fallback still exists for G1; ask Oracle, when that code is supposed to be run. If they say "never", ask them, why the code is there. So IMHO G1 really doesn't make the huge heap problem of Java go away, it only makes it (arguably) a little smaller.
If you have bucks for a big server with big memory, you have probably also bucks for a good, commercial hardware accelerated, pauseless GC technology, like the one offered by Azul. We have one of their servers with 384 GB RAM and it really works fine - no pauses, 0-lines of stop-the-world code in the GC.
Write the damn part of your application that requires lots of memory in C++, like LinkedIn did with social graph processing. You still won't avoid all the problems by doing this (e.g. heap fragmentation), but it would be definitely easier to keep the pauses low.

I am CEO of Azul Systems so I am obviously biased in my opinion on this topic! :) That being said...
Azul's CTO, Gil Tene, has a nice overview of the problems associated with Garbage Collection and a review of various solutions in his Understanding Java Garbage Collection and What You Can Do about It presentation, and there's additional detail in this article: http://www.infoq.com/articles/azul_gc_in_detail.
Azul's C4 Garbage Collector in our Zing JVM is both parallel and concurrent, and uses the same GC mechanism for both the new and old generations, working concurrently and compacting in both cases. Most importantly, C4 has no stop-the-world fall back. All compaction is performed concurrently with the running application. We have customers running very large (hundreds of GBytes) with worse case GC pause times of <10 msec, and depending on the application often times less than 1-2 msec.
The problem with CMS and G1 is that at some point Java heap memory must be compacted, and both of those garbage collectors stop-the-world/STW (i.e. pause the application) to perform compaction. So while CMS and G1 can push out STW pauses, they don't eliminate them. Azul's C4, however, does completely eliminate STW pauses and that's why Zing has such low GC pauses even for gigantic heap sizes.

We have an application that we allocate 12-16 Gb for but it really only reaches 8-10 during normal operation. We use the Sun JVM (tried IBMs and it was a bit of a disaster but that just might have been ignorance on our part...I have friends that swear by it--that work at IBM). As long as you give your app breathing room, the JVM can handle large heap sizes with not too much GC. Plenty of 'extra' memory is key.
Linux is almost always more stable than Windows and when it is not stable it is a hell of a lot easier to figure out why. Solaris is rock solid as well and you get DTrace too :)
With these kind of loads, why on earth would you be using Vista or XP? You are just asking for trouble.
We don't do anything fancy with the GC params. We do set the minimum allocation to be equal to the maximum so it is not constantly trying to resize but that is it.

I have used over 60 GB heap sizes on two different applications under Linux and Solaris respectively using 64-bit versions (obviously) of the Sun 1.6 JVM.
I never encountered garbage collection problems with the Linux-based application except when pushing up near the heap size limit. To avoid the thrashing problems inherent to that scenario (too much time spent doing garbage collection), I simply optimized memory usage throughout the program so that peak usage was about 5-10% below a 64 GB heap size limit.
With a different application running under Solaris, however, I encountered significant garbage-collection problems which made it necessary to do a lot of tweaking. This consisted primarily of three steps:
Enabling/forcing use of the parallel garbage collector via the -XX:+UseParallelGC -XX:+UseParallelOldGC JVM options, as well as controlling the number of GC threads used via the -XX:ParallelGCThreads option. See "Java SE 6 HotSpot Virtual Machine Garbage Collection Tuning" for more details.
Extensive and seemingly ridiculous setting of local variables to "null" after they are no longer needed. Most of these were variables that should have been eligible for garbage collection after going out of scope, and they were not memory leak situations since the references were not copied. However, this "hand-holding" strategy to aid garbage collection was inexplicably necessary for some reason for this application under the Solaris platform in question.
Selective use of the System.gc() method call in key code sections after extensive periods of temporary object allocation. I'm aware of the standard caveats against using these calls, and the argument that they should normally be unnecessary, but I found them to be critical in taming garbage collection when running this memory-intensive application.
The three above steps made it feasible to keep this application contained and running productively at around 60 GB heap usage instead of growing out of control up into the 128 GB heap size limit that was in place. The parallel garbage collector in particular was very helpful since major garbage-collection cycles are expensive when there are a lot of objects, i.e., the time required for major garbage collection is a function of the number of objects in the heap.
I cannot comment on other platform-specific issues at this scale, nor have I used non-Sun (Oracle) JVMs.

12Gb should be no problem with a decent JVM implementation such as Sun's Hotspot.
I would advice you to use the Concurrent Mark and Sweep colllector ( -XX:+UseConcMarkSweepGC) when using a SUN VM.Otherwies you may face long "stop the world" phases, were all threads are stopped during a GC.
The OS should not make a big difference for the GC performance.
You will need of course a 64 bit OS and a machine with enough physical RAM.

I recommend also considering taking a heap dump and see where memory usage can be improved in your app and analyzing the dump in something such as Eclipse's MAT . There are a few articles on the MAT page on getting started in looking for memory leaks. You can use jmap to obtain the dump with something such as ...
jmap -heap:format=b pid

As mentioned above, if you have a non-interactive program, the default (compacting) garbage collector (GC) should work well. If you have an interactive program, and you (1) don't allocate memory faster than the GC can keep up, and (2) don't create temporary objects (or collections of objects) that are too big (relative to the total maximum JVM memory) for the GC to work around, then CMS is for you.
You run into trouble if you have an interactive program where the GC doesn't have enough breathing room. That's true regardless of how much memory you have, but the more memory you have, the worse it gets. That's because when you get too low on memory, CMS will run out of memory, whereas the compacting GCs (including G1) will pause everything until all the memory has been checked for garbage. This stop-the-world pause gets bigger the more memory you have. Trust me, you don't want your servlets to pause for over a minute. I wrote a detailed StackOverflow answer about these pauses in G1.
Since then, my company has switched to Azul Zing. It still can't handle the case where your app really needs more memory than you've got, but up until that very moment it runs like a dream.
But, of course, Zing isn't free and its special sauce is patented. If you have far more time than money, try rewriting your app to use a cluster of JVMs.
On the horizon, Oracle is working on a high-performance GC for multi-gigabyte heaps. However, as of today that's not an option.

If you switch to 64-bit you will use more memory. Pointers become 8 bytes instead of 4. If you are creating lots of objects this can be noticeable seeing as every object is a reference (pointer).
I have recently allocated 15GB of memory in Java using the Sun 1.6 JVM with no problems. Though it is all only allocated once. Not much more memory is allocated or released after the initial amount. This was on a Linux but I imagine the Sun JVM will work just as well on 64-bit Windows.

You should try running visualgc against your app. It´s a heap visualization tool that´s part of the jvmstat download at http://java.sun.com/performance/jvmstat/
It is a lot easier than reading GC logs.
It quickly helps you understand how the parts (generations) of the heap are working. While your total heap may be 10GB, the various parts of the heap will be much smaller. GCs in the Eden portion of the heap are relatively cheap, while full GCs in the old generation are expensive. Sizing your heap so that that the Eden is large and the old generation is hardly ever touched is a good strategy. This may result in a very large overall heap, but what the heck, if the JVM never touches the page, it´s just a virtual page, and doesn´t have to take up RAM.

A couple of years ago, I compared JRockit and the Sun JVM for a 12G heap. JRockit won, and Linux hugepages support made our test run 20% faster. YMMV as our test was very processor/memory intensive and was primarily single-threaded.

here's an article on gc FROM one of Java Champions --
http://kirk.blog-city.com/is_your_concurrent_collector_failing_you.htm
Kirk, the author writes
"Send me your GC logs
I'm currently interested in studying Sun JVM produced GC logs. Since these logs contain no business relevent information it should be ease concerns about protecting proriatary information. All I ask that with the log you mention the OS, complete version information for the JRE, and any heap/gc related command line switches that you have set. I'd also like to know if you are running Grails/Groovey, JRuby, Scala or something other than or along side Java. The best setting is -Xloggc:. Please be aware that this log does not roll over when it reaches your OS size limit. If I find anything interesting I'll be happy to give you a very quick synopsis in return. "

An article from Sun on Java 6 can help you: https://www.oracle.com/java/technologies/javase/troubleshooting-javase.html

The max memory that XP can address is 4 gig(here). So you may not want to use XP for that(use a 64 bit os).

sun has had an itanium 64-bit jvm for a while although itanium is not a popular destination. The solaris and linux 64-bit JVMs should be what you should be after.
Some questions
1) is your application stable ?
2) have you already tested the app in a 32 bit JVM ?
3) is it OK to run multiple JVMs on the same box ?
I would expect the 64-bit OS from windows to get stable in about a year or so but until then, solaris/linux might be better bet.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.