I'm writing an app the generates a big xlsx file using apache-POI.
At a certain time I get OutOfHeapSpace exception.
I want to solve it by writing the content to the xlsx file when I determine that I'll soon be out of space, and thus freeing the memory, and then reading the file from the disk and continuing the writing.
A better solution might be to predefine the number of cells I'll write in each "block" , and then write the block to disk, but in any case this made me wonder if there is a way to determine the heap space that is left at runtime?
Freeing unused memory at runtime is probably not very reliable since the JVM has considerable freedom in deciding when to garbage collect.
POI recently introduced the SXSSF API that uses streaming and thereby significantly reduces memory footprint for writing spreadsheets. This should help even with very big xsls files. There are a couple of downsides which are shown here. But if you can live with them, this should alleviate you of heap related problems.
The MemoryMXBean can give you a fair amount of information about the current memory usage.
public class PrintMemory {
public static void main(String[] args) {
MemoryMXBean memoryMXBean = ManagementFactory.getMemoryMXBean();
long[] array;
for (int i = 0; i < 100; i++) {
array = new long[100000];
System.out.printf("%d%n", memoryMXBean.getHeapMemoryUsage());
}
}
}
You can get the available memory, but it only tells you the most you can allocate without triggering a GC.
Instead you can trigger a GC and see how much memory is free after wards. The problem this is it has a perform overhead.
Another option is to monitor the GC cleanups via JMX and see how much is free after it naturally triggers a GC (if it doesn't run, you don't have a problem)
Using one of the coming with the JVM JMX MBeans is worth considering.
Like this one MemoryPoolMXBean
Related
I'm trying to write a code that will have a minimal impact on resources and I have come across GC behavior I don't understand.
Apparently Strings are not cleared from the memory immediately even though they are not in use anymore.
for(int i = 0; i < 999999999; i++)
System.out.println("Test");
Memory usage graph
according to the graph I assume that a new String object is created on every run of the loop but it is not cleared automatically on the next run of the loop - if that is the case I would like to know why is it happening and in case I'm misreading the situation I would like to know what really is happening "behind the curtains".
When I add Sleep to the code I presented above the graph becomes stable, what is the reason for that?
for(int i = 0; i < 999999999; i++){
System.out.println("Test");
try{
Thread.sleep(1);
}
catch(Exception e){}
}
Stable graph
Also I have a few question about the given case:
Can GC be forced to be more aggressive? I mean shorten the object lifetime and not reducing the memory allocated by JVM?
If I plug in a null value to the variable will it affect the time until it's cleared by the GC?
What is the correct way to work with Strings when I need to run a large number of regex matches on them?
What is the best way to declare a String object "obsolete" so the GC will clear it?
Does the above situation occur because Java does an automatic intern for Strings and if so is there a way to cancel it?
Thank you very much!
I assume that a new String object is created on every run of the loop
No, if it was creating a new String on each iteration you would get far more garbage.
At this garbage rate it could be the profiler which is allocating some objects.
A String literal is create once ever. (In a JVM)
but it is not cleared automatically on the next run of the loop
Correct, even if it was created on each iteration the GC only runs when it needs to, doing it on each iteration would be insanely expensive.
When I add Sleep to the code I presented above the graph becomes stable, what is the reason for that?
You have dramatically slowed down your application.
Can GC be forced to be more aggressive?
You can make the Eden space much smaller, but this would slow down your application.
If I plug in a null value to the variable will it affect the time until it's cleared by the GC?
No, this rarely does anything.
What is the correct way to work with Strings when I need to run a large number of regex matches on them
regex's create a lot of garbage. If you want to reduce allocations and speed up your application, avoid using regex's.
I recently speed up an application by 3x by replacing some commonly used regex with direct String handling.
What is the best way to declare a String object "obsolete" so the GC will clear it?
Use it in a limited scope. When the scope ends so does the reference to it and it can be GCed.
Does the above situation occur because Java does an automatic intern
Once a String is interned it is not recreated.
for Strings and if so is there a way to cancel it?
Sure, force it create a new String each time. This of course creates more garbage and is much slower (and the code is longer) but you can do it if you want.
The Garbage Collector collects when its time to collect, more or less.
Yes, depending on what collector you are using. There's literally dozens of vm properties you can set, some of them influencing each other.
I don't think it does in 'newer' JDK's
Normally you do not care. When it comes to GC, it's more about not loading tons of gigs of data into your memory. One specialty about strings are its its interns, but Strings will be gc'd like other objects, too.
When there's no reference to the string/intern anymore (when you exit the braces)
No, the situation does occur, because java's GC's work this way...
I can explain the GC effects on base on CMS/ParNew (since I know this combo best), it works like this:
The heap is splitted into two regions (i exclude PermGen for now).
Young and Old
Young is split into 'eden' and 'copy' (or survivor)
When you generate a new object, it will go Young->Eden. At some point, the eden will reach its max memory, then not used objects will be removed, objects still having references will be copied to Young->Copy.
As the program keeps running, Young->Copy will reach its max memory. It will be copied again in another Young->Copy memory space.
At some point, it can't do that anymore, so some objects it will be moved from Young->Copy to Old, depending on a copy counter (I think). Same story for the old heap.
So what can you tune? First of all, you normally have throughput (batching) and low-latency (webpages), the ParNew/CMS combo was used for low-latency.
Since I know ParNew/CMS best, I'll explain what you can consider tuning first:
You can tune max memory (more memory means more managing, the less memory an application needs to run, the better... in general)
You can tune heap ration between young and old
You can tune the ratios between eden and copy within young
You can tune the time, when CMS starts its collection cycle
And then there's a lot more. From my personal experience, for large applications, we used in general the following settings:
Fix min and max memory to the same size (no change of max heap)
New Ratio to Old something about 1:4 to 1:7
Disable System.gc()
Log a lot of gc stuff
put an alert on OutOfMemory
do weekly analysis on the log and decide on tuning parameters. (Only one parameter at a time ;)
If you really want to know what's behind everything, I'd recommend reading a book, because there's really, really, really a lot going on.
I have a java application that uses extensively the memory. It keeps a data-structure that grows very fast and is the responsible for the biggest amount of memory used.
In order to avoid an Out Of Memory, I decide to flush the data-structure to a repository (file or db) and post process it.
The problem that I face consists of choosing the time(when the used memory is "close" to reach the maximum allowed) to flush the data-structure into the repository. One way would be to keep track of the data-structure's memory usage on every update.
dataStructure.onUpdate(new CheckMemoryIfReachedMax() {
public void onUpdate(long usedMemory) {
if (usedMemory == MaxMemory) {
datastucture.flushInRepository();
}
}
}
The main problem in this case is that isnt easy to change the data-structure to keep track of the memory.
Another possible solution would be to get the used memory from the JVM and compare it to the maximum memory.
Runtime runtime = Runtime.getRuntime();
long freeMemory = runtime.freeMemory();
if (freeMemory < MaxUsedMemory) {
datastucture.flushInRepository();
}
In this case the problem is that the memory usage just gives a hint of how much memory is used, being that we cannot predict the moment the Garbage Collector removes the objects. This solution would make me flush more often the data-structure to the repository, so the application performance might suffer from this.
Is there any general pattern used in those cases? Do you have any suggestion about which of the solution would be better suited to the problem?
There is no good, universal definition of "memory used by JVM", the best choice is to track the size of the structure yourself, sorry.
The problem is - JVM uses memory both for garbage and actual data, and there is no way to tell one from the other until actual garbage collection occurs. This is by design and actually is a major optimization.
A dirty workaround would be to use JMX to track the amount of memory freed during the last collection (this will be the size of the short-lived garbage). This has two drawbacks:
you will get false positives once the long-lived garbage accumulates;
you will depend on a specific garbage collector and garbage collector settings (for example I have no idea if this scheme is achievable with G1).
I think you should use the first approach because comparing runtime memory is dependent on many things. Hence for using first approach and to get usedMemory by an object you can use
long getObjectSize(Object objectToSize)
from Instrumentation Interface to get an approximate size of your object.
Refer http://docs.oracle.com/javase/6/docs/api/java/lang/instrument/Instrumentation.html#getObjectSize(java.lang.Object)
Hello i have one aplication that use java.swing.timer and this is in loop. The problem is that my windows memory process still glow up, and dont stop. I tried to clean my variables, use System.gc() etc... and dont work. I maked a sample to test this with thread, timerstack and swing timer, im adding itens inside a jcombobox and the memory is still raising.
Here comes the code:
//My Timers
#Action
public void botao_click1() {
jLabel1.setText("START");
timer1 = new java.util.Timer();
timer1.schedule(new TimerTask() {
#Override
public void run() {
adicionarItens();
limpar();
}
}, 100, 100);
}
#Action
public void botao_click2() {
thread = new Thread(new Runnable() {
public void run() {
while (true) {
adicionarItens();
try {
Thread.sleep(100);
limpar();
} catch (InterruptedException ex) {
Logger.getLogger(MemoriaTesteView.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
});
thread.start();
}
private void limpar() { // CleanUp array and jcombobox
texto = null;
jComboBox1.removeAllItems();
jComboBox1.setVisible(false);
//jComboBox1 = null;
System.gc();
}
private void adicionarItens() { //AddItens
texto = new String[6];
texto[0] = "HA";
texto[1] = "HA";
texto[2] = "HA";
texto[3] = "HA";
texto[4] = "HA";
texto[5] = "HA";
//jComboBox1 = new javax.swing.JComboBox();
jComboBox1.setVisible(true);
for (int i = 0; i < texto.length; i++) {
jComboBox1.addItem(texto[i].toString());
}
System.out.println("System Memory: "
+ Runtime.getRuntime().freeMemory() + " bytes free!");
}
well help please !!! =(
It isn't clear that you actually have a problem from the small snippet of code you posted.
Either way, you can't control what you want to control
-Xmx only controls the Java Heap, it doesn't control consumption of native memory by the JVM, which is consumed completely differently based on implementation.
From the following article Thanks for the Memory ( Understanding How the JVM uses Native Memory on Windows and Linux )
Maintaining the heap and garbage collector use native memory you can't control.
More native memory is required to maintain the state of the
memory-management system maintaining the Java heap. Data structures
must be allocated to track free storage and record progress when
collecting garbage. The exact size and nature of these data structures
varies with implementation, but many are proportional to the size of
the heap.
and the JIT compiler uses native memory just like javac would
Bytecode compilation uses native memory (in the same way that a static
compiler such as gcc requires memory to run), but both the input (the
bytecode) and the output (the executable code) from the JIT must also
be stored in native memory. Java applications that contain many
JIT-compiled methods use more native memory than smaller applications.
and then you have the classloader(s) which use native memory
Java applications are composed of classes that define object structure
and method logic. They also use classes from the Java runtime class
libraries (such as java.lang.String) and may use third-party
libraries. These classes need to be stored in memory for as long as
they are being used. How classes are stored varies by implementation.
I won't even start quoting the section on Threads, I think you get the idea that
-Xmx doesn't control what you think it controls, it controls the JVM heap, not everything
goes in the JVM heap, and the heap takes up way more native memory that what you specify for
management and book keeping.
I don't see any mention of OutOfMemoryExceptions anywhere.
What you are concerned about you can't control, not directly anyway
What you should focus on is what in in your control, which is making sure you don't hold on to references longer than you need to, and that you are not duplicating things unnecessarily. The garbage collection routines in Java are highly optimized, and if you learn how their algorithms work, you can make sure your program behaves in the optimal way for those algorithms to work.
Java Heap Memory isn't like manually managed memory in other languages, those rules don't apply
What are considered memory leaks in other languages aren't the same thing/root cause as in Java with its garbage collection system.
Most likely in Java memory isn't consumed by one single uber-object that is leaking ( dangling reference in other environments ).
Intermediate objects may be held around longer than expected by the garbage collector because of the scope they are in and lots of other things that can vary at run time.
EXAMPLE: the garbage collector may decide that there are candidates, but because it considers that there is plenty of memory still to be had that it might be too expensive time wise to flush them out at that point in time, and it will wait until memory pressure gets higher.
The garbage collector is really good now, but it isn't magic, if you are doing degenerate things, it will cause it to not work optimally. There is lots of documentation on the internet about the garbage collector settings for all the versions of the JVMs.
These un-referenced objects may just have not reached the time that the garbage collector thinks it needs them to for them to be expunged from memory, or there could be references to them held by some other object ( List ) for example that you don't realize still points to that object. This is what is most commonly referred to as a leak in Java, which is a reference leak more specifically.
EXAMPLE: If you know you need to build a 4K String using a StringBuilder create it with new StringBuilder(4096); not the default, which is like 32 and will immediately start creating garbage that can represent many times what you think the object should be size wise.
You can discover how many of what types of objects are instantiated with VisualVM, this will tell you what you need to know. There isn't going to be one big flashing light that points at a single instance of a single class that says, "This is the big memory consumer!", that is unless there is only one instance of some char[] that you are reading some massive file into, and this is not possible either, because lots of other classes use char[] internally; and then you pretty much knew that already.
I don't see any mention of OutOfMemoryError
You probably don't have a problem in your code, the garbage collection system just might not be getting put under enough pressure to kick in and deallocate objects that you think it should be cleaning up. What you think is a problem probably isn't, not unless your program is crashing with OutOfMemoryError. This isn't C, C++, Objective-C, or any other manual memory management language / runtime. You don't get to decide what is in memory or not at the detail level you are expecting you should be able to.
Java, in theory, is immune to "leaks" of the sort that C-based languages can have. But it's still quite easy to design a data structure that grows in a more or less unbounded fashion, whether or not you intended that.
And, of course, if you schedule timer-based tasks and the like, they will exist until the time has expired and the task has completed (or cancelled), even if you don't retain a reference to them.
Also, some Java environments (Android is notorious for this) allocate images and the like in a way that is not subject to ordinary GC action and can cause heap to grow in an unbounded fashion.
here is my code:
public void mapTrace(String Path) throws FileNotFoundException, IOException {
FileReader arq = new FileReader(new File(Path));
BufferedReader leitor = new BufferedReader(arq, 41943040);
Integer page;
String std;
Integer position = 0;
while ((std = leitor.readLine()) != null) {
position++;
page = Integer.parseInt(std, 16);
LinkedList<Integer> values = map.get(page);
if (values == null) {
values = new LinkedList<>();
map.put(page, values);
}
values.add(position);
}
for (LinkedList<Integer> referenceList : map.values()) {
Collections.reverse(referenceList);
}
}
This is the HashMap structure
Map<Integer, LinkedList<Integer>> map = new HashMap<>();
For 50mb - 100mb trace files i don't have any problem, but for bigger files i have:
Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: GC overhead limit exceeded
I don't know if the reverse method is increasing the memory use, if the LinkedList is using more space than other List structure or if the way i'm adding the list to the map is taking more space than it should. Does anyone can tell me what's using so much space?
Does anyone can tell me what's using so much space?
The short answer is that it is probably the space overheads of the data structure you have chosen that is using the space.
By my reckoning, a LinkedList<Integer> on a 64 bit JVM uses about 48 bytes of storage per integer in the list including the integers themselves.
By my reckoning, a Map<?, ?> on a 64 bit machine will use in the region of 48 bytes of storage per entry excluding the space need to represent the key and the value objects.
Now, your trace size estimates are rather too vague for me to plug the numbers in, but I'd expect a 1.5Gb trace file to need a LOT more than 2Gb of heap.
Given the numbers you've provided, a reasonable rule-of-thumb is that a trace file will occupy roughly 10 times its file size in heap memory ... using the data structure that you are currently using.
You don't want to configure a JVM to try to use more memory than the physical RAM available. Otherwise, you are liable to push the machine into thrashing ... and the operating system is liable to start killing processes. So for an 8Gb machine, I wouldn't advise going over -Xmx8g.
Putting that together, with an 8Gb machine you should be able to cope with a 600Mb trace file (assuming my estimates are correct), but a 1.5Gb trace file is not feasible. If you really need to handle trace files that big, my advice would be to either:
design and implement custom collection types for your specific use-case that use memory more efficiently,
rethink your algorithms so that you don't need to hold the entire trace files in memory, or
get a bigger machine.
I did some tests before reading your comment, i put -Xmx14g and processed the 600mb file, it took some minutes(about 10) but it did fine.
The -Xmx14g option sets the maximum heap size. Based on the observed behaviour, I expect that the JVM didn't need anywhere like that much memory ... and didn't request it from the OS. And if you'd looked at memory usage in the task manager, I expect you'd have seen numbers consistent with that.
Then i put -Xmx18g and tried to process the 1,5gb file, and its been running for about 20 minutes. My memory in the task manager is going from 7,80 to 7,90. I wonder if this will finish, how could i use MORE memory than i have? Does it use the HD as virtual memory?
Yes that it is what it does.
Yes, each page of your processes virtual address space corresponds to a page on the hard disc.
If you've got more virtual pages than physical memory pages, at any given time some of those virtual memory pages will live on disk only. When your application tries to use a one of those non-resident pages, the VM hardware generates an interrupt, and the operating system finds an unused page and populates it from the disc copy and then hands control back to your program. But if your application is busy, then it will have had to make that physical memory page by evicting another page. And that may have involved writing the contents of the evicted page to disc.
The net result is that when you try to use significantly more virtual address pages than you have physical memory, the application generates lots of interrupts that result in lots of disc reads and writes. This is known as thrashing. If your system thrashes too badly, the system will spend most of its waiting for disc reads and writes to finish, and performance will drop dramatically. And on some operating systems, the OS will attempt to "fix" the problem by killing processes.
Further to Stephen's quite reasonable answer, everything has its limit and your code simply isn't scalable.
In case where the input is "large" (as in your case), the only reasonable approach is a stream based approach, which while (usually) more complicated to write, uses very little memory/resources. Essentially you hold in memory only what you need to process the current task then release it asap.
You may find that unix command line tools are your best weapon, perhaps using a combination of awk, sed, grep etc to massage your raw data into hopefully a usable "end format".
I once stopped a colleague from writing a java program to read in and parse XML and issue insert statements to a database: I showed him how to use a series of piped commands to produce executable SQL which was then piped directly into the database command line tool. Took about 30 minutes to get it right, but job done. And the file was massive , so in java it would have required a SAC parser and JDBC, which aren't fun.
to build this structure, I would put those data in a key/value datastore like berkeleydb for java.
peusdo-code
putData(db,page,value)
{
Entry key=new Entry();
Entry data=new Entry();
List<Integer> L=new LinkedList<Integer>();;
IntegerBinding.intToEntry(page,key);
if(db.get(key,data)==OperationStatus.SUCCESS)
{
TupleInput t=new TupleInput(data);
int n=t.readInt();
for(i=0;i< n;++n) L.add(n);
}
L.add(value);
TupleOutput out=new TupleOutput();
out.writeInt(L.size());
for(int v: L) out.writeInt(v);
data=new Entry(out.toByteArray());
db.put(key,data);
}
I've a very simple class which has one integer variable. I just print the value of variable 'i' to the screen and increment it, and make the thread sleep for 1 second. When I run a profiler against this method, the memory usage increases slowly even though I'm not creating any new variables. After executing this code for around 16 hours, I see that the memory usage had increased to 4 MB (initially 1 MB when I started the program). I'm a novice in Java. Could any one please help explain where am I going wrong, or why the memory usage is gradually increasing even when there are no new variables created? Thanks in advance.
I'm using netbeans 7.1 and its profiler to view the memory usage.
public static void main(String[] args)
{
try
{
int i = 1;
while(true)
{
System.out.println(i);
i++;
Thread.sleep(1000);
}
}
catch(InterruptedException ex)
{
System.out.print(ex.toString());
}
}
Initial memory usage when the program started : 1569852 Bytes.
Memory usage after executing the loop for 16 hours : 4095829 Bytes
It is not necessarily a memory leak. When the GC runs, the objects that are allocated (I presume) in the System.out.println(i); statement will be collected. A memory leak in Java is when memory fills up with useless objects that can't be reclaimed by the GC.
The println(i) is using Integer.toString(int) to convert the int to a String, and that is allocating a new String each time. That is not a leak, because the String will become unreachable and a candidate for GC'ing once it has been copied to the output buffer.
Other possible sources of memory allocation:
Thread.sleep could be allocating objects under the covers.
Some private JVM thread could be causing this.
The "java agent" code that the profiler is using to monitor the JVM state could be causing this. It has to assemble and send data over a socket to the profiler application, and that could well involve allocating Java objects. It may also be accumulating stuff in the JVM's heap or non-heap memory.
But it doesn't really matter so long as the space can be reclaimed if / when the GC runs. If it can't, then you may have found a JVM bug or a bug in the profiler that you are using. (Try replacing the loop with one very long sleep and see if the "leak" is still there.) And it probably doesn't matter if this is a slow leak caused by profiling ... because you don't normally run production code with profiling enabled for that long.
Note: calling System.gc() is not guaranteed to cause the GC to run. Read the javadoc.
I don't see any memory leak in this code. You should see how Garbage collector in Java works and at its strategies. Very basically speaking GC won't clean up until it is needed - as indicated in particular strategy.
You can also try to call System.gc().
The objects are created probably in the two Java Core functions.
It's due to the text displayed in the console, and the size of the integer (a little bit).
Java print functions use 8-bit ASCII, therefor 56000 prints of a number, at 8 bytes each char will soon rack up memory.
Follow this tutorial to find your memory leak: Analyzing Memory Leak in Java Applications using VisualVM. You have to make a snapshot of your application at the start and another one after some time. With VisualVM you can do this and compare these to snapshots.
Try setting the JVM upper memory limit so low that the possible leak will cause it to run out of memory.
If the used memory hits that limit and continues to work away happily then garbage collection is doing its job.
If instead it bombs, then you have a real problem...
This does not seem to be leak as the graphs of the profiler also tell. The graph drops sharply after certain intervals i.e. when GC is performed. It would have been a leak had the graph kept climbing steadily. The heap space remaining after that must be used by the thread.sleep() and also (as mentioned in one of answers above) from the some code of the profiler.
You can try running VisualVM located at %JAVA_HOME%/bin and analyzing your application therein. It also gives you the option of performing GC at will and many more options.
I noted that the more features of VisualVM I used more memory was being consumed (upto 10MB). So this increase, it has to be from your profiler as well but it still is not a leak as space is reclaimed on GC.
Does this occur without the printlns? In other words, perhaps keeping the printlns displayed on the console is what is consuming the memory.