Why does a big unreferenced HashMap increase performance in Java? - java

I have a performance problem that I can't get my head around. I am writing a Java application that parses huge (> 20 million lines) text files and stores certain information in a Set.
I measure the performance in seconds per million lines. Since I need a lot of memory, I usually run the program with -Xmx6000m and -Xms4000m.
If I just run the program, It parses 1 Million lines in about 6 seconds. However, I realized after some performance investigations, that if I add this code before the actual parsing routine, performance increases to under 3 seconds per 1 million lines:
BufferedReader br = new BufferedReader(new FileReader("graphs.nt"));
HashMap<String, String> foo = new HashMap<String, String>();
String line;
while ((line = br.readLine()) != null){
foo.put(line, "foo");
}
foo = null;
br.close();
br = null;
The graphs.nt file is about 9 million lines long. The performance increase persists even if I do not set foo to null, this is mainly to demonstrate that the map is in fact not used by the program.
The rest of the code is completely unrelated. I use a parser from openrdf sesame to read a different (not the graphs.nt) file and store extracted information in a new HashSet, created by another object.
In the rest of the code, I create a Parser object, to which I pass a Handler object.
This really confuses me. My guess is, that this somehow drives the JVM to allocate more memory for my program, which I can see hints for when I run top. Without the HashMap, it will allocate about 1 Gig of memory. If I initialize the HashMap, it will allocate > 2 Gigs.
My question is, if this sounds at all reasonable. Is it possible that creating such a big object will allocate more memory for the program to use afterwards? Shouldn't -Xmx and -Xms control the memory allocation or are there further arguments that maybe play a role here?
I am aware that this may seem like an odd question and that information is scarce, but this is all the information that I found related to the issue. If there is any more information that may be helpful, I am more than happy to provide it.

Memory and GC can definitely impact performance. If possible you should run Xms==Xmx to disable resizing, and give JVM plenty of room at start. Your app could exit before any major GC is needed.

Unless you go out of your way to make it otherwise, "foo" will eventually pass out of scope and be collected, even if you don't nil the pointer, and even if the method containing the above code is never exited. But it will have forced the heap to grow larger, and this will reduce the relative overhead of GC.
(It would be an interesting experiment to reference "foo" at the end of your program, to keep it in scope.)

This sounds like file caching? Your file "graphs.nt" is probably cached in RAM either by the OS or the JVM. GC will allow memory consumption to go up for performance reasons, if you add a forced collect right after your preload, System.gc(), you'll be able to tell if the caching happens in the JVM or in the OS.

Related

Java high memory usage - GC Strings

I'm trying to write a code that will have a minimal impact on resources and I have come across GC behavior I don't understand.
Apparently Strings are not cleared from the memory immediately even though they are not in use anymore.
for(int i = 0; i < 999999999; i++)
System.out.println("Test");
Memory usage graph
according to the graph I assume that a new String object is created on every run of the loop but it is not cleared automatically on the next run of the loop - if that is the case I would like to know why is it happening and in case I'm misreading the situation I would like to know what really is happening "behind the curtains".
When I add Sleep to the code I presented above the graph becomes stable, what is the reason for that?
for(int i = 0; i < 999999999; i++){
System.out.println("Test");
try{
Thread.sleep(1);
}
catch(Exception e){}
}
Stable graph
Also I have a few question about the given case:
Can GC be forced to be more aggressive? I mean shorten the object lifetime and not reducing the memory allocated by JVM?
If I plug in a null value to the variable will it affect the time until it's cleared by the GC?
What is the correct way to work with Strings when I need to run a large number of regex matches on them?
What is the best way to declare a String object "obsolete" so the GC will clear it?
Does the above situation occur because Java does an automatic intern for Strings and if so is there a way to cancel it?
Thank you very much!
I assume that a new String object is created on every run of the loop
No, if it was creating a new String on each iteration you would get far more garbage.
At this garbage rate it could be the profiler which is allocating some objects.
A String literal is create once ever. (In a JVM)
but it is not cleared automatically on the next run of the loop
Correct, even if it was created on each iteration the GC only runs when it needs to, doing it on each iteration would be insanely expensive.
When I add Sleep to the code I presented above the graph becomes stable, what is the reason for that?
You have dramatically slowed down your application.
Can GC be forced to be more aggressive?
You can make the Eden space much smaller, but this would slow down your application.
If I plug in a null value to the variable will it affect the time until it's cleared by the GC?
No, this rarely does anything.
What is the correct way to work with Strings when I need to run a large number of regex matches on them
regex's create a lot of garbage. If you want to reduce allocations and speed up your application, avoid using regex's.
I recently speed up an application by 3x by replacing some commonly used regex with direct String handling.
What is the best way to declare a String object "obsolete" so the GC will clear it?
Use it in a limited scope. When the scope ends so does the reference to it and it can be GCed.
Does the above situation occur because Java does an automatic intern
Once a String is interned it is not recreated.
for Strings and if so is there a way to cancel it?
Sure, force it create a new String each time. This of course creates more garbage and is much slower (and the code is longer) but you can do it if you want.
The Garbage Collector collects when its time to collect, more or less.
Yes, depending on what collector you are using. There's literally dozens of vm properties you can set, some of them influencing each other.
I don't think it does in 'newer' JDK's
Normally you do not care. When it comes to GC, it's more about not loading tons of gigs of data into your memory. One specialty about strings are its its interns, but Strings will be gc'd like other objects, too.
When there's no reference to the string/intern anymore (when you exit the braces)
No, the situation does occur, because java's GC's work this way...
I can explain the GC effects on base on CMS/ParNew (since I know this combo best), it works like this:
The heap is splitted into two regions (i exclude PermGen for now).
Young and Old
Young is split into 'eden' and 'copy' (or survivor)
When you generate a new object, it will go Young->Eden. At some point, the eden will reach its max memory, then not used objects will be removed, objects still having references will be copied to Young->Copy.
As the program keeps running, Young->Copy will reach its max memory. It will be copied again in another Young->Copy memory space.
At some point, it can't do that anymore, so some objects it will be moved from Young->Copy to Old, depending on a copy counter (I think). Same story for the old heap.
So what can you tune? First of all, you normally have throughput (batching) and low-latency (webpages), the ParNew/CMS combo was used for low-latency.
Since I know ParNew/CMS best, I'll explain what you can consider tuning first:
You can tune max memory (more memory means more managing, the less memory an application needs to run, the better... in general)
You can tune heap ration between young and old
You can tune the ratios between eden and copy within young
You can tune the time, when CMS starts its collection cycle
And then there's a lot more. From my personal experience, for large applications, we used in general the following settings:
Fix min and max memory to the same size (no change of max heap)
New Ratio to Old something about 1:4 to 1:7
Disable System.gc()
Log a lot of gc stuff
put an alert on OutOfMemory
do weekly analysis on the log and decide on tuning parameters. (Only one parameter at a time ;)
If you really want to know what's behind everything, I'd recommend reading a book, because there's really, really, really a lot going on.

String.split() temporary objects and Garbage Collect

In my project, we have a requirement to read a very large file, where each line has identifiers separated by a special character ( "|"). Unfortunately I can't use parallelism, since it is necessary to make a validation between the last character of a line with the first of the next line, to decide whether it will or not be extracted. Anyway, the requirement is very simple: break the line into tokens, analyze them and store only some of them in memory. The code is very simple, something like below:
final LineIterator iterator = FileUtils.lineIterator(file)
while(iterator.hasNext()){
final String[] tokens = iterator.nextLine().split("\\|");
//process
}
But this little piece of code is very, very inefficient. The method split() generates too many temporary objects that are not been collected (as best explained here: http://chrononsystems.com/blog/hidden-evils-of-javas-stringsplit-and-stringr.
For comparison purposes: a 5mb file was using around 35 mb memory at the end of file process.
I tested some alternatives like:
Using a pre compiled pattern (Performance of StringTokenizer class vs. split method in Java)
Use Guava's Splitter (Java split String performances)
Optimize String storage (http://java-performance.info/string-packing-converting-characters-to-bytes/)
Use of optimized collections (http://blog.takipi.com/5-coding-hacks-to-reduce-gc-overhead)
But none of them appears to be efficient enough. Using JProfiler, I could see that the amount of memory used by temporary objects is too high (35 mb used, but only 15 mb is actually been used by valid objects).
Then I decide make a simple test: after 50,000 lines read, explicit call to System.gc(). And then, at the end of process, the memory usage has decreased from 35 mb to 16mb. I tested many, many times, and always got the same result.
I know invoke that invoke of System.gc () is a bad practice (as indicated in Why is it bad practice to call System.gc()?). But is there is any other alternative in a cenario where the split() method could be invoked millions of times?
[UPDATE]
I use a 5 mb file only for test purpose, but the system should process much larger files (500Mb ~ 1Gb)
The first and most important thing to say here is, don't worry about it. The JVM is consuming 35MB of RAM because it's configuration says that's a low enough amount. When its highly efficient GC algorithm decides it's time, it will sweep all those objects away, no problem.
If you really want to, you can invoke Java with memory management options (e.g. java -Xmxn=...) -- I suggest it's not worth doing unless you're running on very limited hardware.
However, if you really want to avoid allocating an array of String each time you process a line, there are many ways to do so.
One way is to use a StringTokenizer:
StringTokenizer st = new StringTokenizer(line,"|");
while (st.hasMoreElements()) {
process(st.nextElement());
}
You could also avoid consuming a line at a time. Get your file as a stream, use a StreamTokenizer, and consume one token at a time in this way.
Read the API docs for Scanner, BufferedInputStream, Reader -- there are lots of choices in this area, because you're doing something fundamental.
However, none of these will cause Java to GC sooner or more aggressively. If the JRE doesn't consider itself short of memory, it won't collect any garbage.
Try writing something like this:
public static void main(String[] args) {
Random r = new Random();
Integer x;
while(true) {
x = Integer.valueof(r.nextInt());
}
}
Run it and watch your JVM's heap size as it runs (put a sleep in if the usage shoots up too quickly to see). Each time around the loop, Java creates what you call a 'temporary object' of type Integer. All of these stay in the heap until the GC decides it needs to clear them away. You'll see that it won't do this until it reaches a certain level. But when it reaches that level, it will do a good job of ensuring that its limits are never exceeded.
You should adjust your way of analyzing situations. While the article about the regex compilation under the hood is correct in general, it doesn’t apply here. When you look at the source code of String.split(String), you’ll see that it just delegates to String.split(String,int) which has a special code path for patterns consisting of just one literal character, including escaped ones like your \|.
The only temporary object created within that code path is an ArrayList. The regex package is not involved at all; this fact might help you understanding why precompiling a regex pattern did not improve the performance here.
When you use a Profiler to come to the conclusion that there are too many objects, you should use it also to find out what kinds of objects there are and where they originate, instead of doing wild guessing.
But it’s not clear, why you complain at all. You can configure the JVM to use a certain maximum memory. As long as that maximum has not been reached, the JVM just does what you told it, using that memory rather than wasting CPU cycles just to not using the available memory. Where’s the sense in not using the available memory?

Huge LinkedList is causing GC overhead limit, is there another solution?

here is my code:
public void mapTrace(String Path) throws FileNotFoundException, IOException {
FileReader arq = new FileReader(new File(Path));
BufferedReader leitor = new BufferedReader(arq, 41943040);
Integer page;
String std;
Integer position = 0;
while ((std = leitor.readLine()) != null) {
position++;
page = Integer.parseInt(std, 16);
LinkedList<Integer> values = map.get(page);
if (values == null) {
values = new LinkedList<>();
map.put(page, values);
}
values.add(position);
}
for (LinkedList<Integer> referenceList : map.values()) {
Collections.reverse(referenceList);
}
}
This is the HashMap structure
Map<Integer, LinkedList<Integer>> map = new HashMap<>();
For 50mb - 100mb trace files i don't have any problem, but for bigger files i have:
Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: GC overhead limit exceeded
I don't know if the reverse method is increasing the memory use, if the LinkedList is using more space than other List structure or if the way i'm adding the list to the map is taking more space than it should. Does anyone can tell me what's using so much space?
Does anyone can tell me what's using so much space?
The short answer is that it is probably the space overheads of the data structure you have chosen that is using the space.
By my reckoning, a LinkedList<Integer> on a 64 bit JVM uses about 48 bytes of storage per integer in the list including the integers themselves.
By my reckoning, a Map<?, ?> on a 64 bit machine will use in the region of 48 bytes of storage per entry excluding the space need to represent the key and the value objects.
Now, your trace size estimates are rather too vague for me to plug the numbers in, but I'd expect a 1.5Gb trace file to need a LOT more than 2Gb of heap.
Given the numbers you've provided, a reasonable rule-of-thumb is that a trace file will occupy roughly 10 times its file size in heap memory ... using the data structure that you are currently using.
You don't want to configure a JVM to try to use more memory than the physical RAM available. Otherwise, you are liable to push the machine into thrashing ... and the operating system is liable to start killing processes. So for an 8Gb machine, I wouldn't advise going over -Xmx8g.
Putting that together, with an 8Gb machine you should be able to cope with a 600Mb trace file (assuming my estimates are correct), but a 1.5Gb trace file is not feasible. If you really need to handle trace files that big, my advice would be to either:
design and implement custom collection types for your specific use-case that use memory more efficiently,
rethink your algorithms so that you don't need to hold the entire trace files in memory, or
get a bigger machine.
I did some tests before reading your comment, i put -Xmx14g and processed the 600mb file, it took some minutes(about 10) but it did fine.
The -Xmx14g option sets the maximum heap size. Based on the observed behaviour, I expect that the JVM didn't need anywhere like that much memory ... and didn't request it from the OS. And if you'd looked at memory usage in the task manager, I expect you'd have seen numbers consistent with that.
Then i put -Xmx18g and tried to process the 1,5gb file, and its been running for about 20 minutes. My memory in the task manager is going from 7,80 to 7,90. I wonder if this will finish, how could i use MORE memory than i have? Does it use the HD as virtual memory?
Yes that it is what it does.
Yes, each page of your processes virtual address space corresponds to a page on the hard disc.
If you've got more virtual pages than physical memory pages, at any given time some of those virtual memory pages will live on disk only. When your application tries to use a one of those non-resident pages, the VM hardware generates an interrupt, and the operating system finds an unused page and populates it from the disc copy and then hands control back to your program. But if your application is busy, then it will have had to make that physical memory page by evicting another page. And that may have involved writing the contents of the evicted page to disc.
The net result is that when you try to use significantly more virtual address pages than you have physical memory, the application generates lots of interrupts that result in lots of disc reads and writes. This is known as thrashing. If your system thrashes too badly, the system will spend most of its waiting for disc reads and writes to finish, and performance will drop dramatically. And on some operating systems, the OS will attempt to "fix" the problem by killing processes.
Further to Stephen's quite reasonable answer, everything has its limit and your code simply isn't scalable.
In case where the input is "large" (as in your case), the only reasonable approach is a stream based approach, which while (usually) more complicated to write, uses very little memory/resources. Essentially you hold in memory only what you need to process the current task then release it asap.
You may find that unix command line tools are your best weapon, perhaps using a combination of awk, sed, grep etc to massage your raw data into hopefully a usable "end format".
I once stopped a colleague from writing a java program to read in and parse XML and issue insert statements to a database: I showed him how to use a series of piped commands to produce executable SQL which was then piped directly into the database command line tool. Took about 30 minutes to get it right, but job done. And the file was massive , so in java it would have required a SAC parser and JDBC, which aren't fun.
to build this structure, I would put those data in a key/value datastore like berkeleydb for java.
peusdo-code
putData(db,page,value)
{
Entry key=new Entry();
Entry data=new Entry();
List<Integer> L=new LinkedList<Integer>();;
IntegerBinding.intToEntry(page,key);
if(db.get(key,data)==OperationStatus.SUCCESS)
{
TupleInput t=new TupleInput(data);
int n=t.readInt();
for(i=0;i< n;++n) L.add(n);
}
L.add(value);
TupleOutput out=new TupleOutput();
out.writeInt(L.size());
for(int v: L) out.writeInt(v);
data=new Entry(out.toByteArray());
db.put(key,data);
}

Question about java garbage collection

I have this class and I'm testing insertions with different data distributions. I'm doing this in my code:
...
AVLTree tree = new AVLTree();
//insert the data from the first distribution
//get results
...
tree = new AVLTree();
//inser the data from the next distribution
//get results
...
I'm doing this for 3 distributions. Each one should be tested an average of 14 times, and the 2 lowest/highest values removed from to compute the average. This should be done 2000 times, each time for 1000 elements. In other words, it goes 1000, 2000, 3000, ..., 2000000.
The problem is, I can only get as far as 100000. When I tried 200000, I ran out of heap space. I increased the available heap space with -Xmx in the command line to 1024m and it didn't even complete the tests with 200000. I tried 2048m and again, it wouldn't work.
What I'm thinking is that the garbage collector isn't getting rid of the old trees once I do tree = new AVL Tree(). But why? I thought that the elements from the old trees would no longer be accessible and their memory would be cleaned up.
The garbage collector should have no trouble cleaning up your old tree objects, so I can only assume there's some other allocation that you're doing that's not being cleaned up.
Java has a good tool to watch the GC in progress (or not in your case), JVisualVM, which comes with the JDK.
Just run that and it will show you which objects are taking up the heap, and you can both trigger and see the progress of GC's. Then you can target those for pools so they can be re-used by you, saving the GC the work.
Also look into this option, which will probably stop the error you're getting that stops the program, and you program will finish, but it may take a long time because your app will fill up the heap then run very slowly.
-XX:-UseGCOverheadLimit
Which JVM you are using and what JVM parameters you have used to configure GC?
Your explaination shows there is a memory leak in your code. If you have any tool like jprofiler then use it to find out where is the memory leak.
There's no reason those trees shouldn't be collected, although I'd expect that before you ran out of memory you should see long pauses as the system ran a full GC. As it's been noted here that that's not what you're seeing, you could try running with flags like -XX:-PrintGC, -XX:-PrintGCDetails,-XX:-PrintGCTimeStamps to give you some more information on exactly what's going on, along with perhaps some sort of running count of roughly where you are. You could also explicitly tell the garbage collector to use a different garbage-collection algorithm.
However, it still seems unlikely to me. What other code is running? is it possible there's something in the AVLTree class itself that's keeping its instances from being GC'd? What about manually logging the finalize() on that class to insure that (some of them, at least) are collectible (e.g. make a few and manually call System.gc())?
GC params here, a nice ref on garbage collection from sun here that's well worth reading.
The Java garbage collector isn't guaranteed to garbage collect after each object's refcount becomes zero. So if you're writing code that is only creating and deleting a lot of objects, it's possible to expend all of the heap space before the gc has a chance to run. Alternatively, Pax's suggestion that there is a memory leak in your code is also a strong possibility.
If you are only doing benchmarking, then you may want to use the java gc function (in the System class I think) between tests, or even re-run you program for each distribution.
We noticed this in a server product. When making a lot of tiny objects that quickly get thrown away, the garbage collector can't keep up. The problem is more pronounced when the tiny objects have pointers to larger objects (e.g. an object that points to a large char[]). The GC doesn't seem to realize that if it frees up the tiny object, it can then free the larger object. Even when calling System.gc() directly, this was still a huge problem (both in 1.5 and 1.6 VMs)!
What we ended up doing and what I recommend to you is to maintain a pool of objects. When your object is no longer needed, throw it into the pool. When you need a new object, grab one from the pool or allocate a new one if the pool is empty. This will also save a small amount of time over pure allocation because Java doesn't have to clear (bzero) the object.
If you're worried about the pool getting too large (and thus wasting memory), you can either remove an arbitrary number of objects from the pool on a regular basis, or use weak references (for example, using java.util.WeakHashMap). One of the advantages of using a pool is that you can track the allocation frequency and totals, and you can adjust things accordingly.
We're using pools of char[] and byte[], and we maintain separate "bins" of sizes in the pool (for example, we always allocate arrays of size that are powers of two). Our product does a lot of string building, and using pools showed significant performance improvements.
Note: In general, the GC does a fine job. We just noticed that with small objects that point to larger structures, the GC doesn't seem to clean up the objects fast enough especially when the VM is under CPU load. Also, System.gc() is just a hint to help schedule the finalizer thread to do more work. Calling it too frequently causes a significant performance hit.
Given that you're just doing this for testing purposes, it might just be good housekeeping to invoke the garbage collector directly using System.gc() (thus forcing it to make a pass). It won't help you if there is a memory leak, but if there isn't, it might buy you back enough memory to get through your test.

Java: enough free heap to create an object?

I recently came across this in some code - basically someone trying to create a large object, coping when there's not enough heap to create it:
try {
// try to perform an operation using a huge in-memory array
byte[] massiveArray = new byte[BIG_NUMBER];
} catch (OutOfMemoryError oome) {
// perform the operation in some slower but less
// memory intensive way...
}
This doesn't seem right, since Sun themselves recommend that you shouldn't try to catch Error or its subclasses. We discussed it, and another idea that came up was explicitly checking for free heap:
if (Runtime.getRuntime().freeMemory() > SOME_MEMORY) {
// quick memory-intensive approach
} else {
// slower, less demanding approach
}
Again, this seems unsatisfactory - particularly in that picking a value for SOME_MEMORY is difficult to easily relate to the job in question: for some arbitrary large object, how can I estimate how much memory its instantiation might need?
Is there a better way of doing this? Is it even possible in Java, or is any idea of managing memory below the abstraction level of the language itself?
Edit 1: in the first example, it might actually be feasible to estimate the amount of memory a byte[] of a given length might occupy, but is there a more generic way that extends to arbitrary large objects?
Edit 2: as #erickson points out, there are ways to estimate the size of an object once it's created, but (ignoring a statistical approach based on previous object sizes) is there a way of doing so for yet-uncreated objects?
There also seems to be some debate as to whether it's reasonable to catch OutOfMemoryError - anyone know anything conclusive?
freeMemory isn't quite right. You'd also have to add maxMemory()-totalMemory(). e.g. assuming you start up the VM with max-memory=100M, the JVM may at the time of your method call only be using (from the OS) 50M. Of that, let's say 30M is actually in use by the JVM. That means you'll show 20M free (roughly, because we're only talking about the heap here), but if you try to make your larger object, it'll attempt to grab the other 50M its contract allows it to take from the OS before giving up and erroring. So you'd actually (theoretically) have 70M available.
To make this more complicated, the 30M it reports as in use in the above example includes stuff that may be eligible for garbage collection. So you may actually have more memory available, if it hits the ceiling it'll try to run a GC to free more memory.
You can try to get around this bit by manually triggering a System.GC, except that that's not such a terribly good thing to do because
-it's not guaranteed to run immediately
-it will stop everything in its tracks while it runs
Your best bet (assuming you can't easily rewrite your algorithm to deal with smaller memory chunks, or write to a memory-mapped file, or something less memory intensive) might be to do a safe rough estimate of the memory needed and insure that it's available before you run your function.
There are some kludges that you can use to estimate the size of an existing object; you could adapt some of these to predict the size of a yet-to-be created object.
However, in this case, I think it might be best to catch the Error. First of all, asking for the free memory doesn't account for what's available after garbage collection, which will be performed before raising an OOME. And, requesting a garbage collection with System.gc() isn't reliable. It's often explicitly disabled because it can wreck performance, and if it's not disabled… well, it can wreck performance when used unnecessarily.
It is impossible to recover from most errors. However, recoverability is up to the caller, not the callee. In this case, if you have a strategy to recover from an OutOfMemoryError, it is valid to catch it and fall back.
I guess that, in practice, it really comes down to the difference between the "slow" and "fast" way. If the "slow" method is fast enough, I'd stick with that, as it's safer and simpler. And, it seems to me, allowing it to be used as a fall back means that it is "fast enough." Don't let small optimizations derail the reliability of your application.
The "try to allocate and handle the error" approach is very dangerous.
What if you barely get your memory? A later OOM exception might occur because you brought things too close to the limits. Almost any library call will allocate memory at least briefly.
During your allocation a different thread may receive an OOM exception while trying to allocate a relatively small object. Even if your allocation is destined to fail.
The only viable approach is your second one, with the corrections noted in other answers. But you have to be sure and leave extra "slop space" in the heap when you decide to use your memory intensive approach.
I don't believe that there's a reasonable, generic approach to this that could safely be assumed to be 100% reliable. Even the Runtime.freeMemory approach is vulnerable to the fact that you may actually have enough memory after a garbage collection, but you wouldn't know that unless you force a gc. But then there's no foolproof way to force a GC either. :)
Having said that, I suspect if you really did know approximately how much you needed, and did run a System.gc() beforehand, and your running in a simple single-threaded app, you'd have a reasonably decent shot at getting it right with the .freeMemory call.
If any of those constraints fail, though, and you get the OOM error, your back at square one, and therefore are probably no better off than just catching the Error subclass. While there are some risks associated with this (Sun's VM does not make a lot of guarantees about what happens after an OOM... there's some risk of internal state corruption), there are many apps for which just catching it and moving on with life will leave you with no serious harm.
A more interesting question in my mind, however, is why are there cases where you do have enough memory to do this and others where you don't? Perhaps some more analysis of the performance tradeoffs involved is the real answer?
Definitely catching error is the worst approach. Error happens when there is NOTHING you can do about it. Not even create a log, puff, like "... Houston, we lost the VM".
I didn't quite get the second reason. It was bad because it is hard to relate SOME_MEMORY to the operations? Could you rephrase it for me?
The only alternative I see, is to use the hard disk as the memory ( RAM/ROM as in the old days ) I guess that is what you're pointing in your "else slower, less demanding approach"
Every platform has its limits, java suppport as much as RAM your hardware is willing to give ( well actually you by configuring the VM ) In Sun JVM impl that could be done with the
-Xmx
Option
like
java -Xmx8g some.name.YourMemConsumingApp
For instance
Of course you may end up trying to perform an operation that takes 10 gb of RAM
If that's your case then you should definitely swap to disk.
Additionally, using the strategy pattern could make a nicer code. Although here it looks overkill:
if (isEnoughMemory(SOME_MEMORY)) {
strategy = new InMemoryStrategy();
} else {
strategy = new DiskStrategy();
}
strategy.performTheAction();
But it may help if the "else" involves a lot of code and looks bad. Furthermore if somehow you can use a third approach ( like using a cloud for processing ) you can add a third Strategy
...
strategy = new ImaginaryCloudComputingStrategy();
...
:P
EDIT
After getting the problem with the second approach: If there are some times when you don't know how much RAM is going to be consumed but you do know how much you have left, you could use a mixed approach ( RAM when you have enough, ROM[disk] when you don't )
Suppose this theorical problem.
Suppose you receive a file from a stream and don't know how big it is.
Then you perform some operation on that stream ( encrypt it for instance ).
If you use RAM only it would be very fast, but if the file is large enough as to consume all your APP memory, then you have to perform some of the operation in memory and then swap to file and save temporary data there.
The VM will GC when running out of memory, you get more memory and then you perform the other chunk. And this repeat until you have the big stream processed.
while( !isDone() ) {
if (isMemoryLow()) {
//Runtime.getRuntime().freeMemory() < SOME_MEMORY + some other validations
swapToDisk(); // and make sure resources are GC'able
}
byte [] array new byte[PREDEFINED_BUFFER_SIZE];
process( array );
process( array );
}
cleanUp();

Categories

Resources