Lambda capturing closure with Integer in endless stream throws OutOfMemoryError - java

The code below 100% throws java.lang.OutOfMemoryError
Set<Integer> set = new HashSet<>();
new Random().ints(10_000_000,
Integer.MIN_VALUE,
Integer.MAX_VALUE)
.forEach(
v -> {
if(set.contains(v)){
System.out.println(v);
}else{
set.add(v);
}
}
);
AFAIK it's because lambda capture Integer with context? Could anybody explain what exactly happens here?

In your code you have Set which is accessible from main method. Local stack is a GC root, so it could not be collected by GC.
So, you're adding elements to the set which can't be collected. And set requires extra memory to store elements.
On my PC it's required about 600 MB of heap to run this program without OOM.
Here is heap dump of this program running on my pc.
I've tried the same code with vanilla for loop and got the same results.
So you just need to add more memory for your application. For example -Xmx1g will set your heap size to 1 gigabyte.

Related

Surviving generations keep increasing while running Solr query

I am testing a query with jSolr (7.4) because I believe it is causing a memory leak in my program. But I am not sure it is indeed a memory leak, so I call for advices!
This method is called several times during the running time of my indexing program (should be able to run weeks / months without any problems). That's why I am testing it in a loop that I profile with Netbeans Profiler.
If I simply retrieve the id from all documents (there are 33k) in a given index :
public class MyIndex {
// This is used as a cache variable to avoid querying the index everytime the list of documents is needed
private List<MyDocument> listOfMyDocumentsAlreadyIndexed = null;
public final List<MyDocument> getListOfMyDocumentsAlreadyIndexed() throws SolrServerException, HttpSolrClient.RemoteSolrException, IOException {
SolrQuery query = new SolrQuery("*:*");
query.addField("id");
query.setRows(Integer.MAX_VALUE); // we want ALL documents in the index not only the first ones
SolrDocumentList results = this.getSolrClient().
query(query).getResults();
/**
* The following was commented for the test,
* so that it can be told where the leak comes from.
*
*/
// listOfMyDocumentsAlreadyIndexed = results.parallelStream()
// .map((doc) -> { // different stuff ...
// return myDocument;
// })
// .collect(Collectors.toList());
return listOfMyDocumentsAlreadyIndexed;
/** The number of surviving generations
* keeps increasing whereas if null is
* returned then the number of surviving
* generations is not increasing anymore
*/
}
I get this from the profiler (after nearly 200 runs that could simulate a year of runtime for my program) :
The object that is most surviving is String :
Is the growing number of surviving generations the expected behaviour while querying for all documents in the index ?
If so is it the root cause of the "OOM Java heap space" error that I get after some time on the production server as it seems to be from the stacktrace :
Exception in thread "Timer-0" java.lang.OutOfMemoryError: Java heap space
at org.noggit.CharArr.resize(CharArr.java:110)
at org.noggit.CharArr.reserve(CharArr.java:116)
at org.apache.solr.common.util.ByteUtils.UTF8toUTF16(ByteUtils.java:68)
at org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:868)
at org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:857)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:266)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readSolrDocument(JavaBinCodec.java:541)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:305)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:747)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:272)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readSolrDocumentList(JavaBinCodec.java:555)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:307)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:200)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:274)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:178)
at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:50)
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:614)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:957)
Would increasing the heap space ("-Xmx") from 8GB to anything greater solve the problem definitely or would it just postpone it ? What can be done to workaround this ?
Edit some hours later
If null is returned from the method under test (getListOfMyDocumentsAlreadyIndexed) then the number of surviving generations remains stable throughout the test :
So even though I was NOT using the result of the query for this test (because I wanted to focuse only on where the leak happened) it looks like returning an instance variable (even though it was null) is not a good idea. I will try to remove it.
Edit even later
I noticed that the surviving generations were still increasing in the telemetry tab when I was profiling "defined classes" ("focused (instrumented)") whereas it was stable when profiling "All classes" ("General (sampled)"). So I am not sure it solved the problem :
Any hint greatly appreciated :-)
The problem stems from the following line :
query.setRows(Integer.MAX_VALUE);
This should not be done according to this article :
The rows parameter for Solr can be used to return more than the default of 10 rows. I have seen users successfully set the rows parameter to 100-200 and not see any issues. However, setting the rows parameter higher has a big memory consequence and should be avoided at all costs.
So problem have been solved by retrieving the documents by chunks of 200 docs following this solr article on pagination :
SolrQuery q = (new SolrQuery(some_query)).setRows(r).setSort(SortClause.asc("id"));
String cursorMark = CursorMarkParams.CURSOR_MARK_START;
boolean done = false;
while (! done) {
q.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
QueryResponse rsp = solrServer.query(q);
String nextCursorMark = rsp.getNextCursorMark();
doCustomProcessingOfResults(rsp);
if (cursorMark.equals(nextCursorMark)) {
done = true;
}
cursorMark = nextCursorMark;
}
Please note : you should not exceed 200 documents in setRows otherwise the memory leak still happens (e.g. for 500 it does happen).
Now the profiler gives much better results regarding surviving generations as they do not increase over time anymore.
However the method is much slower.

HashMap<Long,Long> needs more memory

I wrote this code:
public static void main(String[] args) {
// TODO Auto-generated method stub
HashMap<Long,Long> mappp = new HashMap<Long, Long>(); Long a = (long)55; Long c = (long)12;
for(int u = 1;u<=1303564/2 + 1303564/3;u++){
mappp.put(a, c);
a = a+1;
c = c+1;
}
System.out.println(" " + mappp.size());
}
And it does not finish, beacause the progrm stops with the message in the console:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
I calculated how memory I need to have such HashMAp in memory and in my opinion my computer memory is enough. I have 1024 RAM on my computer.
I use eclipse. Also have set the parameters :
i am starting eclipse from the command line with this:'eclipse -vmargs -Xms512m -Xmx730m'
And second from Run Configurations i have set the tab Arguments with this:'-Xmx730m'
And this still gives java.lang.OutOfMemoryError.
What is the reason for this?
ps. Just to add some strange fact - in the bottom right corner of eclipse is shown the heap memory usage, and it is written 130M of 495M.
Well, when the HashMap mappp increases in size, doesn't this info '130M of 495M' have to change,for example '357M of 495M', 1 second later to be '412M of 495M' and so on in order to reach this 495M?In my case this 130M stays the same, just a little changes, from 130M to 131M or to 132M.
Strange
Java does not allows map of primitive data types. So if you are using Hashmap you will have to pay for boxing/unboxing and overhead of object references.
To avoid the overhead you can write your custom hashmap or use existing implementation from one of those libs.
boxing and unboxing in java
You should not put millions of items in a map. A Long is an object containing an 8 byte long field, plus some object overhead. Then you use two instances per map entry.
Since the key is numeric, you could (if the maximum key value is low enough) use an array as the 'map'.
long[] mappp = new long[4000000]; // takes 4M * 8 = 32M memory
If you need to know whether a value is 'not in the map', use 0 for that value. If 0 needs to be in your map, you can do some tricks like increasing all values by 1 (if the values are always positive).

How to find memory used by a function in java

I am calling a function from my code (written in java) and I want to know how much memory that function is using, and do keep in mind that I cannot add any code to the function(which I am calling).
for eg-
//my code starts
.
.
.
.
myfunc();
//print memory used by myfunc() here
.
.
// my code ends
How to do this?
What you're trying to do is basically pointless. There is no such thing as memory used by a function. Your idea of comparing the total memory usage "before" and "after" function call does not have any sense: the function may change global state (which may decrease or increase total memory usage, and in some cases (e.g. cache filling) you probably won't want to consider the increase to count as "memory used by a function), or the garbage collector may run while you're inside of myfunc and decrease the total used memory.
The good question often contains the large part of an answer. What you should do is to correctly ask the question.
I was successful with this code (it's not guaranteed to work, but try it and there's a good chance it'll give you what you need):
final Runtime rt = Runtime.getRuntime();
for (int i = 0; i < 3; i++) rt.gc();
final long startSize = rt.totalMemory()-rt.freeMemory();
myFunc();
for (int i = 0; i < 3; i++) rt.gc();
System.out.println("Used memory increased by " +
rt.totalMemory()-rt.freeMemory()-startSize);
But its wrong, because ideally it should always be zero after subraction.
Actually its right because normally each thread has a TLAB (Thread Local Allocation Buffer) which means it allocates in blocks (so each thread can allocate concurrently)
Turn this off with -XX:-UseTLAB and you will see ever byte allocated.
You should run this multiple times because other things could be running when you use this function.

How to make OutOfMemoryError occur on Linux JVM 64bit

in my unit test I deliberately trying to raise an OutOfMemoryError exception. I use a simple statement like the following:
byte[] block = new byte[128 * 1024 * 1024 * 1024];
The code works on Win7 64bit with jdk6u21 64bit. But when I run this on Centos 5 64bit with jdk6u21 no OutOfMemoryError thrown, even when I make the size of the array bigger.
Any idea?
Linux doesn't always allocate you all the memory you ask for immediately, since many real applications ask for more than they need. This is called overcommit (it also means sometimes it guesses wrong, and the dreaded OOM killer strikes).
For your unittest, I would just throw OutOfMemoryError manually.
If you just want to consume all the memory do the following:
try {
List<Object> tempList = new ArrayList<Object>();
while (true) {
tempList.add(new byte[128 * 1024 * 1024 * 1024]);
}
} catch (OutOfMemoryError OME) {
// OK, Garbage Collector will have run now...
}
128*1024*1024*1024=0 because int is 32-bit. Java doesn't support arrays larger than 4Gb.
ulimit -v 102400
ulimit -d 102400
unitTest.sh
The above should limit your unit test to 1M of virtual memory, and 1M data segment size. When you reach either of those, your process should get ENOMEM. Careful, these restrictions apply for the process / shell where you called them exits; you might want to run them in a subshell.
man 2 setrlimit for details on how that works under the hood. help ulimit for the ulimit command.
You could deliberately set the maximum heap size of your JVM to a small amount by using the -Xmx flag.
Launch the following program:
public final class Test {
public static void main(final String[] args) {
final byte[] block = new byte[Integer.MAX_VALUE];
}
}
with the following JVM argument: -Xmx8m
That will do the trick:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at Test.main(Test.java:4)
Minor point but allocating new long[Integer.MAX_VALUE] will use up memory 8x faster. (~16 GB each)
The reason for no OutofMemoryError is that the memory is being allocated in a uncommitted state, with no page.
If you write a non-zero byte into each 4K of the array, that will then cause the memory to be allocated.

Copying a java text file into a String

I run into the following errors when i try to store a large file into a string.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
at java.lang.StringBuffer.append(StringBuffer.java:306)
at rdr2str.ReaderToString.main(ReaderToString.java:52)
As is evident, i am running out of heap space. Basically my pgm looks like something like this.
FileReader fr = new FileReader(<filepath>);
sb = new StringBuffer();
char[] b = new char[BLKSIZ];
while ((n = fr.read(b)) > 0)
sb.append(b, 0, n);
fileString = sb.toString();
Can someone suggest me why i am running into heap space error? Thanks.
You are running out of memory because the way you've written your program, it requires storing the entire, arbitrarily large file in memory. You have 2 options:
You can increase the memory by passing command line switches to the JVM:
java -Xms<initial heap size> -Xmx<maximum heap size>
You can rewrite your logic so that it deals with the file data as it streams in, thereby keeping your program's memory footprint low.
I recommend the second option. It's more work but it's the right way to go.
EDIT: To determine your system's defaults for initial and max heap size, you can use this code snippet (which I stole from a JavaRanch thread):
public class HeapSize {
public static void main(String[] args){
long kb = 1024;
long heapSize = Runtime.getRuntime().totalMemory();
long maxHeapSize = Runtime.getRuntime().maxMemory();
System.out.println("Heap Size (KB): " + heapSize/1024);
System.out.println("Max Heap Size (KB): " + maxHeapSize/1024);
}
}
You allocate a small StringBuffer that gets longer and longer. Preallocate according to file size, and you will also be a LOT faster.
Note that java is Unicode, the string likely not, so you use... twice the size in memory.
Depending on VM (32 bit? 64 bit?) and the limits set (http://www.devx.com/tips/Tip/14688) you may simply not have enough memory available. How large is the file actually?
In the OP, your program is aborting while the StringBuffer is being expanded. You should preallocate that to the size you need or at least close to it. When StringBuffer must expand it needs RAM for the original capacity and the new capacity. As TomTom said too, your file is likely 8-bit characters so will be converted to 16-bit unicode in memory so it will double in size.
The program has not even encountered yet the next doubling - that is StringBuffer.toString() in Java 6 will allocate a new String and the internal char[] will be copied again (in some earlier versions of Java this was not the case). At the time of this copy you will need double the heap space - so at that moment at least 4 times what your actual files size is (30MB * 2 for byte->unicode, then 60MB * 2 for toString() call = 120MB). Once this method is finished GC will clean up the temporary classes.
If you cannot increase the heap space for your program you will have some difficulty. You cannot take the "easy" route and just return a String. You can try to do this incrementally so that you do not need to worry about the file size (one of the best solutions).
Look at your web service code in the client. It may provide a way to use a different class other than String - perhaps a java.io.Reader, java.lang.CharSequence, or a special interface, like the SAX related org.xml.sax.InputSource. Each of these can be used to build an implementation class that reads from your file in chunks as the callers needs it instead of loading the whole file at once.
For instance, if your web service handling routes can take a CharSequence then (if they are written well) you can create a special handler to return just one character at a time from the file - but buffer the input. See this similar question: How to deal with big strings and limited memory.
Kris has the answer to your problem.
You could also look at java commons fileutils' readFileToString which may be a bit more efficient.
Although this might not solve your problem, some small things you can do to make your code a bit better:
create your StringBuffer with an initial capacity the size of the String you are reading
close your filereader at the end: fr.close();
By default, Java starts with a very small maximum heap (64M on Windows at least). Is it possible you are trying to read a file that is too large?
If so you can increase the heap with the JVM parameter -Xmx256M (to set maximum heap to 256 MB)
I tried running a slightly modified version of your code:
public static void main(String[] args) throws Exception{
FileReader fr = new FileReader("<filepath>");
StringBuffer sb = new StringBuffer();
char[] b = new char[1000];
int n = 0;
while ((n = fr.read(b)) > 0)
sb.append(b, 0, n);
String fileString = sb.toString();
System.out.println(fileString);
}
on a small file (2 KB) and it worked as expected. You will need to set the JVM parameter.
Trying to read an arbitrarily large file into main memory in an application is bad design. Period. No amount of JVM settings adjustments/etc... are going to fix the core issue here. I recommend that you take a break and do some googling and reading about how to process streams in java - here's a good tutorial and here's another good tutorial to get you started.

Categories

Resources