Fast jvm start / jvm persistancy - starting jvm with data from heap dump - java

I am developing an in memory data structure, and would like to add persistency.
I am looking for a way to do this fast. I thought about dumpping a heap-dump once in a while.
Is there a way to load this java heap dump as is, into my memory? or is it impossible?
Otherwise, other suggestions for fast write and fast read of the entire information?
(Serialization might take a lot of time)
-----------------edited explination:--------
Since my memory might be full of small pieces of information, referencing each other - and so serialization may require me to in efficeintly scan all my memory. reloading is also possibly problematic.
On the other hand, I can define a gigantic array, and each object I create, I shall put it in the array. Links will be a long number, reperesnting the place in the array. Now, I can just dump this array as is - and also reload it as is.
There are even some jvms like JRockit that utilize the disk space, and so maybe it is possible maybe to dump as is very quickly and to re-load very quicky.
To prove my point, java dump contains all the information of the jvm, and it is produced quickly.
Sorry, but serialization of 4GB isn't even close to being in the seconds dump is.
Also, memory is memory and there are operating systems that allow you a ram memory dump quicky.
https://superuser.com/questions/164960/how-do-i-dump-physical-memory-in-linux
When you think about it... this is quite a good strategy for persistant data structures. There is quite a hype about in-memory data bases in the last decade. But why settel for that? What if I want a fibonacci heap - to be "almost persistant". That is, every 5 minutes I will dump the inforamtion (quickly) and in case of a electrical outage, I have a backup from 5 minutes ago.
-----------------end of edited explination:--------
Thank you.

In general, there is no way to do this on HotSpot.
Objects in the heap have 2 words of header, the second of which points into permgen for the class metadata (known as a klassOop). You would have to dump all of permgen as well, which includes all the pointers to compiled code - so basically the entire process.
There would be no sane way to recover the heap state correctly.
It may be better to explain precisely what you want to build & why already-existing products don't do what you need.

Use Serialization. Implement java.io.Serializable, add serialVersionUID to all of your classes, and you can persist them to any OutputStream (file, network, whatever). Just create a starting object from where all your object are reachable (even indirectly).
I don't think that Serialization would take long time, it's optimized code in the JVM.

You can use jhat or jvisualvm to load your dump to analyze it. I don't know whether the dump file can be loaded and restarted again.

Related

Memory management with Java (how to use the data segment?)

I have very large data structures that I define as static fields in a class. I think they get pushed into the heap because my code fails with that error message (heap memory exceeded). Now, I think I recall there to be a memory segment besides heap and stack that is much larger, called data. Is it possible for me to push the variables in that segment? If so, how is this accomplished? I can't afford to increase the heap size because my program will be used by others.
The only thing you could really possibly mean is the disk -- actually writing things to files.

Java String objects not getting garbage collected on time

I have an interesting problem with Java memory consumption. I have a native C++ application which invokes my Java application.
The Application basically does some language translations\parses a few XML's and responds to network requests. Most of the state of Application doesn't have to be retained so it is full of Methods which take in String arguments and returns string results.
This application continues to take more and more memory with time and there comes a time where it starts to take close to 2 GB memory, which made us suspect that there is a leak somewhere in some Hashtable or static variables. On closer inspection we did not find any leaks. Comparing heap dumps over a period of time, shows the char[] and String objects take huge memory.
However when we inspect these char[], Strings we find that they do not have GC roots which means that they shouldn't be the cause of leak. Since they are a part of heap, it means they are waiting to get garbage collected. After using verious tools MAT\VisualVM\JHat and scrolling through a lot of such objects I used the trial version of yourkit. Yourkit gives the data straightaway saying that 96% of the char[] and String are unreachable. Which means that at the time of taking dump 96% of the Strings in the heap were waiting to get garbage collected.
I understand that the GC runs sparingly but when you check via VisualVM you can actually see it running :-( than how come there are so many unused objects on the heap all time.
IMO this Application should never take more than 400-500 MB memory, which is where it stays for the first 24 hours but than it continues to increase the heap :-(
I am running Java 1.6.0-25.
thanks for any help.
Java doesn't GC when you think it does/should :-) GC is too complex a topic to understand what is going on without spending a couple of weeks really digging into the details. So if you see behavior that you can't explain, that doesn't mean its broken.
What you see can have several reasons:
You are loading a huge String into memory and keep a reference to a substring. That can keep the whole string in memory (Java doesn't always allocate a new char array for substrings - since Strings are immutable, it simply reuses the original char array and remembers the offset and length).
Nothing triggered the GC so far. Some C++ developers believe GC is "evil" (anything that you don't understand must be evil, right?) so they configure Java not to run it unless absolutely necessary. This means the VM will eat memory until it hits the maximum and then, it will do one huge GC run.
build 25 is already pretty old. Try to update to the latest Java build (33, I think). The GC is one of the best tested parts of the VM but it does have bugs. Maybe you hit one.
Unless you see OutOfMemoryException, you don't have a leak. We have an application which eats all the heap you give it. If it gets 16GB of RAM ("just to be safe"), it will use the whole 16GB because we cache what we can. You never see out of memory, because the cache will shrink as needed but system admins routinely freak out "oh god! oh god! It's running out of memory" PANIK No, it's not. Unless Java tells you so, it's not running out of memory. It's just using it efficiently.
Tuning the GC with command line options is one of the best ways to break it. Hundreds of people which know a lot more about the topic than you ever will spent years making the GC efficient. You think you can do better? Good luck. -> Get rid of any "magic" command line options and calls to System.gc() and your problem might go away.
Try decreasing the heap size to 500 Megabytes and see if the software will start garbage collecting or die. Java isnt too fussy about using memory given to it. you might also research GC tuning options which will make the GC more prudent about cleaning stuff up.
String reallyLongString = "this is a really long String";
String tinyString = reallyLongString.substring(2, 3);
reallyLongString = null
The JVM can't collect the memory allocated for the long string in the above case, since there's a reference to part of it.
If you're doing stuff with Strings and you're suffering from memory issues, this might be the cause of your grief.
use tinyString = new String(reallyLongString.substring(2, 3); instead.
There might not be a leak at all - a leak would be if the Strings were reachable. If you've allocated as much as 2GB to the application, there is no reason for the garbage collector to start freeing up memory until you are approaching that limit. If you don't want it taking any more than 500MB, then pass -Xmx 512m when starting the JVM.
You could also try tuning the garbage collector to start cleaning up much earlier.
First of all, stop worrying about those Strings and char[]. In almost every java application I have profiled, they are on the top of memory consumer list. And in almost no of those java application they were the real problem.
If you have not received OutOfMemoryError yet, but do worry that 2GB is too much for your java process, then try to decrease Xmx value you pass to it. If it runs well and good with 512m or 1g, then problem solved, isn't it?
If you get OOM, then one more option you can try is to use Plumbr with your java process. It is memory leak discovery tool, to it can help you if there really is a memory leak.

How do I predict when I'm going to run out of memory

We have a swing based application that does complex processing on data. One of the prerequisites for our software is that any given column cannot have too many unique values. If the number is numeric, the user would need to discretize the data before they could from our tool.
Unfortunately, the algorithms we are using are combinatorially expensive in memory depending on the number of unique values per column. Right now with the wrong dataset, the app would run out of memory very quickly. Before doing one of these operations that would run out of memory, we should be able to calculate roughly how much memory the operation will need. It would be nice if we could check how much memory the app currently is using, estimate if the app is going to run out of memory, and show an error message accordingly rather than running out of memory. Using java.lang.Runtime, we can find the free memory, total memory, and max memory, but is this really helpful? Even if it appears we won't have enough heap space, it could be that if we wait 30 milliseconds the garbage collector will run, and suddenly we have more than enough heap space to run our operation. Is there anyway to really predict if we are going to run out of memory?
I have done something similar for a database application where the number of rows that were loaded could not be estimated. So in the loop that processes the result set I'm calling a "MemorWatcher" method that would check the memory that was free.
If the available memory goes under a certain threshold the watcher would force a garbage collection and re-check. If there still wasn't enough memory the watcher method signals this to the caller with an exception. The caller can gracefully recover from that exception - as opposed to the OutOfMemoryException which sometimes leaves Swing totally unstable.
I don't have expertise on this, but I feel you can take an extra step of bytecode analysis using ASM to preempt bugs like null pointer exception, out of memory exception etc.
Unless you run your application with the maximum amount of memory you need from the outset (using -Xms) I don't think you can achieve anything useful, since other applications will be able to consume memory before your app needs it.
Have you considered using Soft/WeakReferences, and letting garbage collection reap objects that you could possible recalculate/regenerate on the fly ?

Question about java garbage collection

I have this class and I'm testing insertions with different data distributions. I'm doing this in my code:
...
AVLTree tree = new AVLTree();
//insert the data from the first distribution
//get results
...
tree = new AVLTree();
//inser the data from the next distribution
//get results
...
I'm doing this for 3 distributions. Each one should be tested an average of 14 times, and the 2 lowest/highest values removed from to compute the average. This should be done 2000 times, each time for 1000 elements. In other words, it goes 1000, 2000, 3000, ..., 2000000.
The problem is, I can only get as far as 100000. When I tried 200000, I ran out of heap space. I increased the available heap space with -Xmx in the command line to 1024m and it didn't even complete the tests with 200000. I tried 2048m and again, it wouldn't work.
What I'm thinking is that the garbage collector isn't getting rid of the old trees once I do tree = new AVL Tree(). But why? I thought that the elements from the old trees would no longer be accessible and their memory would be cleaned up.
The garbage collector should have no trouble cleaning up your old tree objects, so I can only assume there's some other allocation that you're doing that's not being cleaned up.
Java has a good tool to watch the GC in progress (or not in your case), JVisualVM, which comes with the JDK.
Just run that and it will show you which objects are taking up the heap, and you can both trigger and see the progress of GC's. Then you can target those for pools so they can be re-used by you, saving the GC the work.
Also look into this option, which will probably stop the error you're getting that stops the program, and you program will finish, but it may take a long time because your app will fill up the heap then run very slowly.
-XX:-UseGCOverheadLimit
Which JVM you are using and what JVM parameters you have used to configure GC?
Your explaination shows there is a memory leak in your code. If you have any tool like jprofiler then use it to find out where is the memory leak.
There's no reason those trees shouldn't be collected, although I'd expect that before you ran out of memory you should see long pauses as the system ran a full GC. As it's been noted here that that's not what you're seeing, you could try running with flags like -XX:-PrintGC, -XX:-PrintGCDetails,-XX:-PrintGCTimeStamps to give you some more information on exactly what's going on, along with perhaps some sort of running count of roughly where you are. You could also explicitly tell the garbage collector to use a different garbage-collection algorithm.
However, it still seems unlikely to me. What other code is running? is it possible there's something in the AVLTree class itself that's keeping its instances from being GC'd? What about manually logging the finalize() on that class to insure that (some of them, at least) are collectible (e.g. make a few and manually call System.gc())?
GC params here, a nice ref on garbage collection from sun here that's well worth reading.
The Java garbage collector isn't guaranteed to garbage collect after each object's refcount becomes zero. So if you're writing code that is only creating and deleting a lot of objects, it's possible to expend all of the heap space before the gc has a chance to run. Alternatively, Pax's suggestion that there is a memory leak in your code is also a strong possibility.
If you are only doing benchmarking, then you may want to use the java gc function (in the System class I think) between tests, or even re-run you program for each distribution.
We noticed this in a server product. When making a lot of tiny objects that quickly get thrown away, the garbage collector can't keep up. The problem is more pronounced when the tiny objects have pointers to larger objects (e.g. an object that points to a large char[]). The GC doesn't seem to realize that if it frees up the tiny object, it can then free the larger object. Even when calling System.gc() directly, this was still a huge problem (both in 1.5 and 1.6 VMs)!
What we ended up doing and what I recommend to you is to maintain a pool of objects. When your object is no longer needed, throw it into the pool. When you need a new object, grab one from the pool or allocate a new one if the pool is empty. This will also save a small amount of time over pure allocation because Java doesn't have to clear (bzero) the object.
If you're worried about the pool getting too large (and thus wasting memory), you can either remove an arbitrary number of objects from the pool on a regular basis, or use weak references (for example, using java.util.WeakHashMap). One of the advantages of using a pool is that you can track the allocation frequency and totals, and you can adjust things accordingly.
We're using pools of char[] and byte[], and we maintain separate "bins" of sizes in the pool (for example, we always allocate arrays of size that are powers of two). Our product does a lot of string building, and using pools showed significant performance improvements.
Note: In general, the GC does a fine job. We just noticed that with small objects that point to larger structures, the GC doesn't seem to clean up the objects fast enough especially when the VM is under CPU load. Also, System.gc() is just a hint to help schedule the finalizer thread to do more work. Calling it too frequently causes a significant performance hit.
Given that you're just doing this for testing purposes, it might just be good housekeeping to invoke the garbage collector directly using System.gc() (thus forcing it to make a pass). It won't help you if there is a memory leak, but if there isn't, it might buy you back enough memory to get through your test.

How to handle OutOfMemoryError in Java? [duplicate]

This question already has answers here:
How to deal with "java.lang.OutOfMemoryError: Java heap space" error?
(31 answers)
Closed 9 years ago.
I have to serialize around a million items and I get the following exception when I run my code:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Unknown Source)
at java.lang.String.<init>(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at org.girs.TopicParser.dump(TopicParser.java:23)
at org.girs.TopicParser.main(TopicParser.java:59)
How do I handle this?
I know that the official Java answer is "Oh noes! Out of memories! I give in!". This is all rather frustrating for anyone who has programmed in environments where running out of memory is not allowed to be a fatal error (for example, writing an OS, or writing apps for non-protected OSes).
The willingness to surrender is necessary - you can't control every aspect of Java memory allocation, so you can't guarantee that your program will succeed in low-memory conditions. But that doesn't mean you must go down without a fight.
Before fighting, though, you could look for ways to avoid the need. Perhaps you can avoid Java serialization, and instead define your own data format which does not require significant memory allocation to create. Serialization allocates a lot of memory because it keeps a record of objects it has seen before, so that if they occur again it can reference them by number instead of outputting them again (which could lead to an infinite loop). But that's because it needs to be general-purpose: depending on your data structure, you might be able to define some text/binary/XML/whatever representation which can just be written to a stream with very little need to store extra state. Or you might be able to arrange that any extra state you need is stored in the objects all along, not created at serialization time.
If your application does one operation which uses a lot of memory, but mostly uses much less, and especially if that operation is user-initiated, and if you can't find a way to use less memory or make more memory available, then it might be worth catching OutOfMemory. You could recover by reporting to the user that the problem is too big, and inviting them to trim it down and try again. If they've just spend an hour setting up their problem, you do not want to just bail out of the program and lose everything - you want to give them a chance to do something about it. As long as the Error is caught way up the stack, the excess memory will be unreferenced by the time the Error is caught, giving the VM at least a chance to recover. Make sure you catch the error below your regular event-handling code (catching OutOfMemory in regular event handling can result in busy loops, because you try to display a dialog to the user, you're still out of memory, and you catch another Error). Catch it only around the operation which you've identified as the memory-hog, so that OutOfMemoryErrors you can't handle, that come from code other than the memory hog, are not caught.
Even in a non-interactive app, it might make sense to abandon the failed operation, but for the program itself to carry on running, processing further data. This is why web servers manage multiple processes such that if one page request fails for lack of memory, the server itself doesn't fall over. As I said at the top, single-process Java apps can't make any such guarantees, but they can at least be made a bit more robust than the default.
That said, your particular example (serialization) may not be a good candidate for this approach. In particular, the first thing the user might want to do on being told there's a problem is save their work: but if it's serialization which is failing, it may be impossible to save. That's not what you want, so you might have to do some experiments and/or calculations, and manually restrict how many million items your program permits (based on how much memory it is running with), before the point where it tries to serialize.
This is more robust than trying to catch the Error and continue, but unfortunately it's difficult to work out the exact bound, so you would probably have to err on the side of caution.
If the error is occurring during deserialization then you're on much firmer ground: failing to load a file should not be a fatal error in an application if you can possibly avoid it. Catching the Error is more likely to be appropriate.
Whatever you do to handle lack of resources (including letting the Error take down the app), if you care about the consequences then it's really important to test it thoroughly. The difficulty is that you never know exactly what point in your code the problem will occur, so there is usually a very large number of program states which need to be tested.
Ideally, restructure your code to use less memory. For example, perhaps you could stream the output instead of holding the whole thing in memory.
Alternatively, just give the JVM more memory with the -Xmx option.
You should not handle it in code. OutOfMemory should not be caught and handled. Instead start your JVM with a bigger heapspace
java -Xmx512M
should do the trick.
See here for more details
Everyone else has already covered how to give Java more memory, but because "handle" could conceivably mean catch, I'm going to quote what Sun has to say about Errors:
An Error is a subclass of Throwable
that indicates serious problems that a
reasonable application should not try
to catch. Most such errors are
abnormal conditions.
(emphasis mine)
You get an OutOfMemoryError because your program requires more memory than the JVM has available. There is nothing you can specifically do at runtime to help this.
As noted by krosenvold, your application may be making sensible demands for memory but it just so happens that the JVM is not being started with enough (e.g. your app will have a 280MB peak memory footprint but the JVM only starts with 256MB). In this case, increasing the size allocated will solve this.
If you feel that you are supplying adequate memory at start up, then it is possible that your application is either using too much memory transiently, or has a memory leak. In the situation you have posted, it sounds like you are holding references to all of the million items in memory at once, even though potentially you are dealing with them sequentially.
Check what your references are like for items that are "done" - you should deference these as soon as possible so that they can be garbage collected. If you're adding a million items to a collection and then iterating over that collection, for example, you'll need enough memory to store all of those object instances. See if you can instead take one object at a time, serialise it and then discard the reference.
If you're having trouble working this out, posting a pseudo-code snippet would help.
In addition to some of the tips that have been give to you, as review the memory lacks and
also start the JVM with more memory (-Xmx512M).
Looks like you have a OutOfMemoryError cause your TopicParser is reading a line that probably is pretty big (and here is what you should avoid), you can use the FileReader (or, if the encoding is an issue, an InputStreamReader wrapping a FileInputStream). Use its read(char[]) method with a reasonably sized char[] array as a buffer.
Also finally to investigate a little why is the OutOfMemoryError you can use
-XX:+HeapDumpOnOutOfMemoryError
Flag in the JVM to get a dump heap information to disk.
Good luck!
Interesting - you are getting an out of memory on a readline. At a guess, you are reading in a big file without linebreaks.
Instead of using readline to get the stuff out of the file as one single big long string, write stuff that understands the input a bit better, and handles it in chunks.
If you simply must have the whole file in a single big long string ... well, get better at coding. In general, trying to handle mutimegabyte data by stuffing it all into a single array of byte (or whatever) is a good way to lose.
Go have a look at CharacterSequence.
Use the transient keyword to mark fields in the serialized classes which can be generated from existing data.
Implement writeObject and readObject to help with reconstructing transient data.
After you follow the suggestion of increasing heap space (via -Xmx) but sure to use either JConsole or JVisualVM to profile your applications memory usage. Make sure that memory usage does not continuously grow. If so you'll still get the OutOfMemoryException, it'll just take longer.
You can increase the size of the memory java uses with the -Xmx-option, for instance:
java -Xmx512M -jar myapp.jar
Better is to reduce the memory-footprint of your app. You serialize millions of items? Do you need to keep all of them in memory? Or can you release some of them after using them? Try to reduce the used objects.
Start java with a larger value for option -Xmx, for instance -Xmx512m
There's no real way of handling it nicely. Once it happens you are in the unknown territory. You can tell by the name - OutOfMemoryError. And it is described as:
Thrown when
the Java Virtual Machine cannot allocate an object because it is out of
memory, and no more memory could be made available by the garbage
collector
Usually OutOfMemoryError indicates that there is something seriously wrong with the system/approach (and it's hard to point a particular operation that triggered it).
Quite often it has to do with ordinary running out of heapspace. Using the -verbosegc and mentioned earlier -XX:+HeapDumpOnOutOfMemoryError should help.
You can find a nice and concise summary of the problem at javaperformancetuning
Before taking any dangerous, time-consuming or strategic actions, you should establish exactly what in your program is using up so much of the memory. You may think you know the answer, but until you have evidence in front of you, you don't. There's the possibility that memory is being used by something you weren't expecting.
Use a profiler. It doesn't matter which one, there are plenty of them. First find out how much memory is being used up by each object. Second, step though iterations of your serializer, compare memory snapshots and see what objects or data are created.
The answer will most likely be to stream the output rather than building it in memory. But get evidence first.
I have discovered an alternate, respecting all other views that we should not try to catch the memory out of exception, this is what I've learned in recent time.
catch (Throwable ex){
if (!(ex instanceof ThreadDeath))
{
ex.printStackTrace(System.err);
}}
for your reference: OutOfMemoryError
any feedback is welcome.
Avishek Arang

Categories

Resources