Java uses more memory than anticipated

Java uses more memory than anticipated - java

Ok, so I try to do this little experiment in java. I want to fill up a queue with integers and see how long it takes. Here goes:
import java.io.*;
import java.util.*;
class javaQueueTest {
public static void main(String args[]){
System.out.println("Hello World!");
long startTime = System.currentTimeMillis();
int i;
int N = 50000000;
ArrayDeque<Integer> Q = new ArrayDeque<Integer>(N);
for (i = 0;i < N; i = i+1){
Q.add(i);
}
long endTime = System.currentTimeMillis();
long totalTime = endTime - startTime;
System.out.println(totalTime);
}
}
OK, so I run this and get a
Hello World!
12396
About 12 secs, not bad for 50 million integers. But if I try to run it for 70 million integers I get:
Hello World!
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.Integer.valueOf(Integer.java:642)
at javaQueueTest.main(javaQueueTest.java:14)
I also notice that it takes about 10 mins to come up with this message. Hmm so what if I give almost all my memory (8gigs) for the heap? So I run it for heap size of 7gigs but I still get the same error:
javac javaQueueTest.java
java -cp . javaQueueTest -Xmx7g
Hello World!
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.Integer.valueOf(Integer.java:642)
at javaQueueTest.main(javaQueueTest.java:14)
I want to ask two things. First, why does it take so long to come up with the error? Second, Why is all this memory not enough? If I run the same experiment for 300 million integers in C (with the glib g_queue) it will run (and in 10 secs no less! although it will slow down the computer alot) so the number of integers must not be at fault here. For the record, here is the C code:
#include<stdlib.h>
#include<stdio.h>
#include<math.h>
#include<glib.h>
#include<time.h>
int main(){
clock_t begin,end;
double time_spent;
GQueue *Q;
begin = clock();
Q = g_queue_new();
g_queue_init(Q);
int N = 300000000;
int i;
for (i = 0; i < N; i = i+1){
g_queue_push_tail(Q,GINT_TO_POINTER(i));
}
end = clock();
time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
printf("elapsed time: %f \n",time_spent);
}
I compile and get the result:
gcc cQueueTest.c `pkg-config --cflags --libs glib-2.0 gsl ` -o cQueueTest
~/Desktop/Software Development/Tests $ ./cQueueTest
elapsed time: 13.340000

My rough thoughts about your questions:
First, why does it take so long to come up with the error?
As gimpycpu in his comment stated, java does not start with full memory acquisition of your RAM. If you want so (and you have a 64 bit VM for greater amount of RAM), you can add the options -Xmx8g and -Xms8g at VM startup time to ensure that the VM gots 8 gigabyte of RAM and the -Xms means that it will also prepare the RAM for usage instead of just saying that it can use it. This will reduce the runtime significantly. Also as already mentioned, Java integer boxing is quite overhead.
Why is all this memory not enough?
Java introduces for every object a little bit of memory overhead, because the JVM uses Integer references in the ArrayDeque datastructur in comparision to just 4 byte plain integers due to boxing. So you have to calulate about 20 byte for every integer.
You can try to use an int[] instead of the ArrayDeque:
import java.io.*;
import java.util.*;
class javaQueueTest {
public static void main(args){
System.out.println("Hello World!");
long startTime = System.currentTimeMillis();
int i;
int N = 50000000;
int[] a = new int[N];
for (i = 0;i < N; i = i+1){
a[i] = 0;
}
long endTime = System.currentTimeMillis();
long totalTime = endTime - startTime;
System.out.println(totalTime);
}
}
This will be ultra fast and due the usage of plain arrays.
On my system I am under one second for every run!

In your case, the GC struggles as it assumes that at least some objects will be short lived. In your case all objects are long lived, this adds a significant overhead to managing this data.
If you use -Xmx7g -Xms7g -verbose:gc and N = 150000000 you get an output like
Hello World!
[GC (Allocation Failure) 1835008K->1615280K(7034368K), 3.8370127 secs]
5327
int is a primitive in Java (4 -bytes), while Integer is the wrapper. This wrapper need a reference to it and a header and padding and the result is that an Integer and its reference uses 20 bytes per value.
The solution is to not queue up some many values at once. You can use a Supplier to provide new values on demand, avoiding the need to create the queue in the first place.
Even so, with 7 GB heap you should be able to create a ArrayQueue of 200 M or more.

First, why does it take so long to come up with the error?
This looks like a classic example of a GC "death spiral". Basically what happens is that the JVM does full GCs repeatedly, reclaiming less and less space each time. Towards the end, the JVM spends more time running the GC than doing "useful" work. Finally it gives up.
If you are experiencing this, the solution is to configure a GC Overhead Limit as described here:
GC overhead limit exceeded
(Java 8 configures a GC overhead limit by default. But you are apparently using an older version of Java ... judging from the exception message.)
Second, Why is all this memory not enough?
See #Peter Lawrey's explanation.
The workaround is to find or implement a queue class that doesn't use generics. Unfortunately, that class will not be compatible with the standard Deque API.

You can catch OutOfMemoryError with :
try{
ArrayDeque<Integer> Q = new ArrayDeque<Integer>(N);
for (i = 0;i < N; i = i+1){
Q.add(i);
}
}
catch(OutOfMemoryError e){
Q=null;
System.gc();
System.err.println("OutOfMemoryError: "+i);
}
in order to show when the OutOfMemoryError is thrown.
And launch your code with :
java -Xmx4G javaQueueTest
in order to increase heap size for JVM
As mentionned earlier, Java is much slower with Objects than C with primitive types ...

Related

Why 2 similar loop codes costs different time in java

I was confused by the codes as follows:
public static void test(){
long currentTime1 = System.currentTimeMillis();
final int iBound = 10000000;
final int jBound = 100;
for(int i = 1;i<=iBound;i++){
int a = 1;
int tot = 10;
for(int j = 1;j<=jBound;j++){
tot *= a;
}
}
long updateTime1 = System.currentTimeMillis();
System.out.println("i:"+iBound+" j:"+jBound+"\nIt costs "+(updateTime1-currentTime1)+" ms");
}
That's the first version, it costs 443ms on my computer.
first version result
public static void test(){
long currentTime1 = System.currentTimeMillis();
final int iBound = 100;
final int jBound = 10000000;
for(int i = 1;i<=iBound;i++){
int a = 1;
int tot = 10;
for(int j = 1;j<=jBound;j++){
tot *= a;
}
}
long updateTime1 = System.currentTimeMillis();
System.out.println("i:"+iBound+" j:"+jBound+"\nIt costs "+(updateTime1-currentTime1)+" ms");
}
The second version costs 832ms.
second version result
The only difference is that I simply swap the i and j.
This result is incredible, I test the same code in C and the difference in C is not that huge.
Why is this 2 similar codes so different in java?
My jdk version is openjdk-14.0.2

TL;DR - This is just a bad benchmark.
I did the following:
Create a Main class with a main method.
Copy in the two versions of the test as test1() and test2().
In the main method do this:
while(true) {
test1();
test2();
}
Here is the output I got (Java 8).
i:10000000 j:100
It costs 35 ms
i:100 j:10000000
It costs 33 ms
i:10000000 j:100
It costs 33 ms
i:100 j:10000000
It costs 25 ms
i:10000000 j:100
It costs 0 ms
i:100 j:10000000
It costs 0 ms
i:10000000 j:100
It costs 0 ms
i:100 j:10000000
It costs 0 ms
i:10000000 j:100
It costs 0 ms
i:100 j:10000000
It costs 0 ms
i:10000000 j:100
It costs 0 ms
....
So as you can see, when I run two versions of the same method alternately in the same JVM, the times for each method are roughly the same.
But more importantly, after a small number of iterations the time drops to ... zero! What has happened is that the JIT compiler has compiled the two methods and (probably) deduced that their loops can be optimized away.
It is not entirely clear why people are getting different times when the two versions are run separately. One possible explanation is that the first time run, the JVM executable is being read from disk, and the second time is already cached in RAM. Or something like that.
Another possible explanation is that JIT compilation kicks in earlier1 with one version of test() so the proportion of time spent in the slower interpreting (pre-JIT) phase is different between the two versions. (It may be possible to teas this out using JIT logging options.)
But it is immaterial really ... because the performance of a Java application while the JVM is warming up (loading code, JIT compiling, growing the heap to its working size, loading caches, etc) is generally speaking not important. And for the cases where it is important, look for a JVM that can do AOT compilation; e.g. GraalVM.
1 - This could be because of the way that the interpreter gathers stats. The general idea is that the bytecode interpreter accumulates statistics on things like branches until it has "enough". Then the JVM triggers the JIT compiler to compile the bytecodes to native code. When that is done, the code runs typically 10 or more times faster. The different looping patterns might it reach "enough" earlier in one version compared to the other. NB: I am speculating here. I offer zero evidence ...
The bottom line is that you have to be careful when writing Java benchmarks because the timings can be distorted by various JVM warmup effects.
For more information read: How do I write a correct micro-benchmark in Java?

I test it myself, I get same difference (around 16ms and 4ms).
After testing, I found that :
Declare 1M of variable take less time than multiple by 1 1M time.
How ?
I made a sum of 100
final int nb = 100000000;
for(int i = 1;i<=nb;i++){
i *= 1;
i *= 1;
[... written 20 times]
i *= 1;
i *= 1;
}
And of 100 this:
final int nb = 100000000;
for(int i = 1;i<=nb;i++){
int a = 0;
int aa = 0;
[... written 20 times]
int aaaaaaaaaaaaaaaaaaaaaa = 0;
int aaaaaaaaaaaaaaaaaaaaaaa = 0;
}
And I respectively get 8 and 3ms, which seems to correspond to what you get.
You can have different result if you have different processor.

you found the answer in algorithm books first chapter :
cost of producing and assigning is 1. so in first algorithm you have 2 declaration and assignation 10000000 and in second one you make it 100. so you reduce time ...
in first :
5 in main loop and 3 in second loop -> second loop is : 3*100 = 300
then 300 + 5 -> 305 * 10000000 = 3050000000
in second :
3*10000000 = 30000000 - > (30000000 + 5 )*100 = 3000000500
so the second one in algorithm is faster in theory but I think its back to multi cpu's ...which they can do 10000000 parallel job in first but only 100 parallel job in second .... so the first one became faster.

What happens inside the JVM so that a method invocation in Java becomes slower when you call it somewhere else in your code?

The short code below isolates the problem. Basically I'm timing the method addToStorage. I start by executing it one million times and I'm able to get its time down to around 723 nanoseconds. Then I do a short pause (using a busy spinning method not to release the cpu core) and time the method again N times, on a different code location. For my surprise I find that the smaller the N the bigger is the addToStorage latency.
For example:
If N = 1 then I get 3.6 micros
If N = 2 then I get 3.1 and 2.5 micros
if N = 5 then I get 3.7, 1.8, 1.7, 1.5 and 1.5 micros
Does anyone know why this is happening and how to fix it? I would like my method to consistently perform at the fastest time possible, no matter where I call it.
Note: I would not think it is thread related since I'm not using Thread.sleep. I've also tested using taskset to pin my thread to a cpu core with the same results.
import java.util.ArrayList;
import java.util.List;
public class JvmOdd {
private final StringBuilder sBuilder = new StringBuilder(1024);
private final List<String> storage = new ArrayList<String>(1024 * 1024);
public void addToStorage() {
sBuilder.setLength(0);
sBuilder.append("Blah1: ").append(System.nanoTime()).append('\n');
sBuilder.append("Blah2: ").append(System.nanoTime()).append('\n');
sBuilder.append("Blah3: ").append(System.nanoTime()).append('\n');
sBuilder.append("Blah4: ").append(System.nanoTime()).append('\n');
sBuilder.append("Blah5: ").append(System.nanoTime()).append('\n');
sBuilder.append("Blah6: ").append(System.nanoTime()).append('\n');
sBuilder.append("Blah7: ").append(System.nanoTime()).append('\n');
sBuilder.append("Blah8: ").append(System.nanoTime()).append('\n');
sBuilder.append("Blah9: ").append(System.nanoTime()).append('\n');
sBuilder.append("Blah10: ").append(System.nanoTime()).append('\n');
storage.add(sBuilder.toString());
}
public static long mySleep(long t) {
long x = 0;
for(int i = 0; i < t * 10000; i++) {
x += System.currentTimeMillis() / System.nanoTime();
}
return x;
}
public static void main(String[] args) throws Exception {
int warmup = Integer.parseInt(args[0]);
int mod = Integer.parseInt(args[1]);
int passes = Integer.parseInt(args[2]);
int sleep = Integer.parseInt(args[3]);
JvmOdd jo = new JvmOdd();
// first warm up
for(int i = 0; i < warmup; i++) {
long time = System.nanoTime();
jo.addToStorage();
time = System.nanoTime() - time;
if (i % mod == 0) System.out.println(time);
}
// now see how fast the method is:
while(true) {
System.out.println();
// Thread.sleep(sleep);
mySleep(sleep);
long minTime = Long.MAX_VALUE;
for(int i = 0; i < passes; i++) {
long time = System.nanoTime();
jo.addToStorage();
time = System.nanoTime() - time;
if (i > 0) System.out.print(',');
System.out.print(time);
minTime = Math.min(time, minTime);
}
System.out.println("\nMinTime: " + minTime);
}
}
}
Executing:
$ java -server -cp . JvmOdd 1000000 100000 1 5000
59103
820
727
772
734
767
730
726
840
736
3404
MinTime: 3404

There is so much going on in here that I don't know where to start. But lets start here....
long time = System.nanoTime();
jo.addToStorage();
time = System.nanoTime() - time;
The latency of addToStoarge() cannot be measured using this technique. It simply runs for too quickly meaning you're likely below the resolution of the clock. Without running this, my guess is that your measures are dominated by clock edge counts. You'll need to bulk up the unit of work to get a measure with lower levels of noise in it.
As for what is happening? There are a number of call site optimizations the most important being inlining. Inlining would totally eliminate the call site but it's a path specific optimization. If you call the method from a different place, that would follow the slow path of performing a virtual method lookup followed by a jump to that code. So to see the benefits of inlining from a different path, that path would also have to be "warmed up".
I would strongly recommend that you look at both JMH (delivered with the JDK). There are facilities in there such as blackhole which will help with the effects of CPU clocks winding down. You might also want to evaluate the quality of the bench with the help of tools like JITWatch (Adopt OpenJDK project) which will take logs produced by the JIT and help you interrupt them.

There is so much to this subject, but the bottom line is that you can't write a simplistic benchmark like this and expect it to tell you anything useful. You will need to use JMH.
I suggest watching this: https://www.infoq.com/presentations/jmh about microbenchmarking and JMH
There's also a chapter on microbenchmarking & JMH in my book: http://shop.oreilly.com/product/0636920042983.do

Java internally uses JIT(Just in Compiler). Based on the number of times the same method executes it optimizes the instruction and perform better. For lesser values, the usage of method would be normal which may not fall under optimization that shows the execution time more. When the same method called more time, it uses JIT and executes in lesser time because of the optimized instruction for the same method execution.

Heap size not reducing after removing facts from Drool's working memory

I'm creating and inserting fairly light weight Person objects which have one field- age in Drool's working memory. But even after removing facts, heap size is not reducing. Sample code- (using Drools 6.0.0.CR5 from maven)
long numOfFacts=1000000;
long heapSize = Runtime.getRuntime().totalMemory();
System.out.println("Heapsize before insertion: "+heapSize);
System.out.println("Inserting objects");
ArrayList<FactHandle> factHandles = new ArrayList<FactHandle>(100);
for (int i = 0; i < numOfFacts; i++) {
Person person = new Person();
person.setAge(randomGenerator.nextInt(100));
FactHandle factHandle = wkmem.insert(person);
factHandles.add(factHandle);
}
long heapSizeAfter = Runtime.getRuntime().totalMemory();
System.out.println("Heapsize after insertion: "+heapSizeAfter);
long endTimeInsert = System.currentTimeMillis();
long elTime= endTimeInsert-startTimeInsert;
System.out.println("Time it took to insert " +numOfFacts+" objects :"+elTime+" milliseconds");
long startTime = System.currentTimeMillis();
System.out.println("Number of facts: " + wkmem.getFactCount());
wkmem.fireAllRules();
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println("Time it took for evaluation: " + elapsedTime);
for(int i=0;i<numOfFacts;i++){
wkmem.retract(factHandles.get(i));
}
long heapSizeAfterRemoval = Runtime.getRuntime().totalMemory();
System.out.println("Heapsize after removal of facts: "+heapSizeAfterRemoval);
The output of code is-
Heapsize before insertion: 158138368
Inserting objects
Heapsize after insertion: 746717184
Time it took to insert 1000000 objects :5372 milliseconds
Number of facts: 1000000
Time it took for evaluation: 839
Heapsize after removal of facts: 792002560
Why is that heapsize has in fact increased?

As mentioned in Peter Lawrey's answer, you're not going to to see heap size reduced in the middle of a method. Unless perhaps GC just happens to kick in at that very moment. To test for that, you need to have a long-running application and connect to it with something such as JConsole or use a profiler of some sort.
However, it is worth noting that the way you are retracting is not reliable and will result in memory leaks in some cases. The truth is that in some cases Drools will generate FactHandles internally, so that after retracting all facts associated with your own fact handle references, there may well be more sitting in working memory. If I remember right, these keep hold of references to your facts, which prevents those objects from being garbage collected. Therefore it's a lot safer to just retract all fact handles:
public void retractAll() {
for (FactHandle handle : ksession.getFactHandles()) {
retract(handle);
}
}
... or retract all FactHandles for a filter:
public void retractAll(ObjectFilter filter) {
for (FactHandle handle : ksession.getFactHandles(filter)) {
retract(handle);
}
}
I discovered this the hard way ... my retraction code made the same assumption as yours originally. :)

The heap size always stays the same or increases, until the GC needs to run or decides to run (for concurrent collectors)
Collecting memory is expensive so it only does it when it has to, not when it might.
Why is that heapsize has in fact increased?
Removing objects can do some work which can end up creating temporary objects.
Basically you should only look at memory consumption after a Full GC, anything else is on a least effort basis.

Multiplication time in BigInteger

My mini benchmark:
import java.math.*;
import java.util.*;
import java.io.*;
public class c
{
static Random rnd = new Random();
public static String addDigits(String a, int n)
{
if(a==null) return null;
if(n<=0) return a;
for(int i=0; i<n; i++)
a+=rnd.nextInt(10);
return a;
}
public static void main(String[] args) throws IOException
{
int n = 10000; \\number of iterations
int k = 10; \\number of digits added at each iteration
BigInteger a;
BigInteger b;
String as = "";
String bs = "";
as += rnd.nextInt(9)+1;
bs += rnd.nextInt(9)+1;
a = new BigInteger(as);
b = new BigInteger(bs);
FileWriter fw = new FileWriter("c.txt");
long t1 = System.nanoTime();
a.multiply(b);
long t2 = System.nanoTime();
//fw.write("1,"+(t2-t1)+"\n");
if(k>0) {
as = addDigits(as, k-1);
bs = addDigits(as, k-1);
}
for(int i=0; i<n; i++)
{
a = new BigInteger(as);
b = new BigInteger(bs);
t1 = System.nanoTime();
a.multiply(b);
t2 = System.nanoTime();
fw.write(((i+1)*k)+","+(t2-t1)+"\n");
if(i < n-1)
{
as = addDigits(as, k);
bs = addDigits(as, k);
}
System.out.println((i+1)*k);
}
fw.close();
}
}
It measures multiplication time of n-digit BigInteger
Result:
You can easily see the trend but why there is so big noise above 50000 digits?
It is because of garbage collector or is there something else that affects my results?
When performing the test, there were no other applications running.
Result from test with only odd digits. The test was shorter (n=1000, k=100)
Odd digits (n=10000, k=10)
As you can see there is a huge noise between 65000 and 70000. I wonder why...
Odd digits (n=10000, k=10), System.gc() every 1000 iterations
Results in noise between 50000-70000

I also suspect this is a JVM warmup effect. Not warmup involving classloading or the JIT compiler, but warmup of the heap.
Put a (java) loop around the whole benchmark, and run it a number of times. (If this gives you the same graphs as before ... you will have evidence that this is not a warmup effect. Currently you don't have any empirical evidence one way or the other.)
Another possibility is that the noise is caused by your benchmark's interactions with the OS and/or other stuff running on the machine.
You are writing your timing data to an unbuffered stream. That means LOTS of syscalls, and (potentially) lots of fine-grained disc writes.
You are making LOTS of calls to nanoTime(), and that might introduce noise.
If something else is running on your machine (e.g. you are web browsing) that will slow down your benchmark for a bit and introduce noise.
There could be competition over physical memory ... if you've got too much running on your machine for the amount of RAM.
Finally, a certain amount of noise is inevitable, because each of those multiply calls generates garbage, and the garbage collector is going to need to work to deal with it.
Finally finally, if you manually run the garbage collector (or increase the heap size) to "smooth out" the data points, what you are actually doing is concealing one of the costs of multiply calls. The resulting graphs looks nice, but it is misleading:
The noisiness reflects what will happen in real life.
The true cost of the multiply actually includes the amortized cost of running the GC to deal with the garbage generated by the call.
To get a measurements that reflect the way that BigInteger behaves in real life, you need to run the test a large number of times, calculate average times and fit a curve to the average data-points.
Remember, the real aim of the game is to get scientifically valid results ... not a smooth curve.

If you do a microbenchmark, you must "warm up" the JVM first to let the JIT optimize the code, and then you can measure the performance. Otherwise you are measuring the work done by the JIT and that can change the result on each run.
The "noise" happens probably because the cache of the CPU is exceeded and the performance starts degrading.

Java VM suddenly exiting without apparent reason

I have a problem with my Java progam suddenly exiting, without any exception thrown or the program finishing normally.
I'm writing a program to solve Project Euler's 14th problem. This is what I got:
private static final int INITIAL_CACHE_SIZE = 30000;
private static Map<Long, Integer> cache = new HashMap<Long, Integer>(INITIAL_CACHE_SIZE);
public void main(String... args) {
long number = 0;
int maxSize = 0;
for (long i = 1; i <= TARGET; i++) {
int size = size(i);
if (size > maxSize) {
maxSize = size;
number = i;
}
}
}
private static int size(long i) {
if (i == 1L) {
return 1;
}
final int size = size(process(i)) + 1;
return size;
}
private static long process(long n) {
return n % 2 == 0 ? n/2 : 3*n + 1;
}
This runs fine, and finishes correctly in about 5 seconds when using a TARGET of 1 000 000.
I wanted to optimize by adding a cache, so I changed the size method to this:
private static int size(long i) {
if (i == 1L) {
return 1;
}
if (cache.containsKey(i)) {
return cache.get(i);
}
final int size = size(process(i)) + 1;
cache.put(i, size);
return size;
}
Now when I run it, it simply stops (process exits) when I get to 555144. Same number every time. No exception, error, Java VM crash or anything is thrown.
Changing the cache size doesn't seem to have any effect either, so how could the cache
introduction cause this error?
If I enforce the cache size to be not just initial, but permanent like so:
if (i < CACHE_SIZE) {
cache.put(i, size);
}
the bug no longer occurs.
Edit: When I set the cache size to like 2M, the bug starts showing again.
Can anyone reproduce this, and maybe even provide a suggestion as to why it happens?

This is simply an OutOfMemoryError that is not being printed. The program runs fine if I set a high heap size, otherwise it exits with an unlogged OutOfMemoryError (easy to see in a Debugger, though).
You can verify this and get a heap dump (as well as printout that an OutOfMemoryError occurred) by passing this JVM arg and re-running your program:
-XX:+HeapDumpOnOutOfMemoryError
With this it will then print out something to this effect:
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid4192.hprof ...
Heap dump file created [91901809 bytes in 4.464 secs]
Bump up your heap size with, say, -Xmx200m and you won't have an issue - At least for TARGET=1000000.

It sounds like the JVM itself crashes (that is the first thought when your program dies without a hint of an exception anyway). The first step in such a problem is to upgrade to the latest revision for your platform. The JVM should dump the heap to a .log file in the directory where you started the JVM, assuming your user level has access rights to that directory.
That being said, some OutOfMemory errors don't report in the main thread, so unless you do a try/catch (Throwable t) and see if you get one, it is hard to be sure you aren't actually just running out of memory. The fact that it only uses 100MB could just mean that the JVM isn't configured to use more. That can be changed by changing the startup options to the JVM to -Xmx1024m to get a Gig of memory, to see if the problem goes anywhere.
The code for doing the try catch should be something like this:
public static void main(String[] args) {
try {
MyObject o = new MyObject();
o.process();
} catch (Throwable t) {
t.printStackTrace();
}
}
And do everything in the process method and do not store your cache in statics, that way if the error happens at the catch statement the object is out of scope and can be garbage collected, freeing enough memory to allow the printing of the stack trace. No guarantees that that works, but it gives it a better shot.

One significant difference between the two implmentations of size(long i) is in the amount of objects you are creating.
In the first implementation, there are no Objects being created. In the second you are doing an awful lot of autoboxing, creating a new Long for each access of your cache, and putting in new Longs and new Integers on each modification.
This would explain the increase in memory usage, but not the absence of an OutOfMemoryError. Increasing the heap does allows it to complete for me.
From this Sun aritcle:
The performance ... is likely to be poor, as it boxes or unboxes on every get or set operation. It is plenty fast enough for occasional use, but it would be folly to use it in a performance critical inner loop.

If your java process suddenly crashes it could be some resource got maxed out. Like memory. You could try setting a higher max heap

Do you see a Heap Dump being generated after the crash? This file should be in the current directory for your JVM, that's where I would look for more info.

I am getting an OutOfMemory error on cache.put(i, size);
To get the error run your program in eclipse using debug mode it will appear in the debug window. It does not produce a stack trace in the console.

The recursive size() method is probably not a good place to do the caching. I put a call to cache.put(i, size); inside the main()'s for-loop and it works much more quickly. Otherwise, I also get an OOM error (no more heap space).
Edit: Here's the source - the cache retrieval is in size(), but the storing is done in main().
public static void main(String[] args) {
long num = 0;
int maxSize = 0;
long start = new Date().getTime();
for (long i = 1; i <= TARGET; i++) {
int size = size(i);
if (size >= maxSize) {
maxSize = size;
num = i;
}
cache.put(i, size);
}
long computeTime = new Date().getTime() - start;
System.out.println(String.format("maxSize: %4d on initial starting number %6d", maxSize, num));
System.out.println("compute time in milliseconds: " + computeTime);
}
private static int size(long i) {
if (i == 1l) {
return 1;
}
if (cache.containsKey(i)) {
return cache.get(i);
}
return size(process(i)) + 1;
}
Note that by removing the cache.put() call from size(), it does not cache every computed size, but it also avoids re-caching a previously computed size. This does not affect the hashmap operations, but like akf points out, it avoids the autoboxing/unboxing operations which is where your heap killer is coming from. I also tried a "if (!containsKey(i)) { cache.put() etc" in size() but that unfortunately also runs out of memory.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.