Surviving generations keep increasing while running Solr query

Surviving generations keep increasing while running Solr query - java

I am testing a query with jSolr (7.4) because I believe it is causing a memory leak in my program. But I am not sure it is indeed a memory leak, so I call for advices!
This method is called several times during the running time of my indexing program (should be able to run weeks / months without any problems). That's why I am testing it in a loop that I profile with Netbeans Profiler.
If I simply retrieve the id from all documents (there are 33k) in a given index :
public class MyIndex {
// This is used as a cache variable to avoid querying the index everytime the list of documents is needed
private List<MyDocument> listOfMyDocumentsAlreadyIndexed = null;
public final List<MyDocument> getListOfMyDocumentsAlreadyIndexed() throws SolrServerException, HttpSolrClient.RemoteSolrException, IOException {
SolrQuery query = new SolrQuery("*:*");
query.addField("id");
query.setRows(Integer.MAX_VALUE); // we want ALL documents in the index not only the first ones
SolrDocumentList results = this.getSolrClient().
query(query).getResults();
/**
* The following was commented for the test,
* so that it can be told where the leak comes from.
*
*/
// listOfMyDocumentsAlreadyIndexed = results.parallelStream()
// .map((doc) -> { // different stuff ...
// return myDocument;
// })
// .collect(Collectors.toList());
return listOfMyDocumentsAlreadyIndexed;
/** The number of surviving generations
* keeps increasing whereas if null is
* returned then the number of surviving
* generations is not increasing anymore
*/
}
I get this from the profiler (after nearly 200 runs that could simulate a year of runtime for my program) :
The object that is most surviving is String :
Is the growing number of surviving generations the expected behaviour while querying for all documents in the index ?
If so is it the root cause of the "OOM Java heap space" error that I get after some time on the production server as it seems to be from the stacktrace :
Exception in thread "Timer-0" java.lang.OutOfMemoryError: Java heap space
at org.noggit.CharArr.resize(CharArr.java:110)
at org.noggit.CharArr.reserve(CharArr.java:116)
at org.apache.solr.common.util.ByteUtils.UTF8toUTF16(ByteUtils.java:68)
at org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:868)
at org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:857)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:266)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readSolrDocument(JavaBinCodec.java:541)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:305)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:747)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:272)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readSolrDocumentList(JavaBinCodec.java:555)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:307)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:200)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:274)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:178)
at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:50)
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:614)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:957)
Would increasing the heap space ("-Xmx") from 8GB to anything greater solve the problem definitely or would it just postpone it ? What can be done to workaround this ?
Edit some hours later
If null is returned from the method under test (getListOfMyDocumentsAlreadyIndexed) then the number of surviving generations remains stable throughout the test :
So even though I was NOT using the result of the query for this test (because I wanted to focuse only on where the leak happened) it looks like returning an instance variable (even though it was null) is not a good idea. I will try to remove it.
Edit even later
I noticed that the surviving generations were still increasing in the telemetry tab when I was profiling "defined classes" ("focused (instrumented)") whereas it was stable when profiling "All classes" ("General (sampled)"). So I am not sure it solved the problem :
Any hint greatly appreciated :-)

The problem stems from the following line :
query.setRows(Integer.MAX_VALUE);
This should not be done according to this article :
The rows parameter for Solr can be used to return more than the default of 10 rows. I have seen users successfully set the rows parameter to 100-200 and not see any issues. However, setting the rows parameter higher has a big memory consequence and should be avoided at all costs.
So problem have been solved by retrieving the documents by chunks of 200 docs following this solr article on pagination :
SolrQuery q = (new SolrQuery(some_query)).setRows(r).setSort(SortClause.asc("id"));
String cursorMark = CursorMarkParams.CURSOR_MARK_START;
boolean done = false;
while (! done) {
q.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
QueryResponse rsp = solrServer.query(q);
String nextCursorMark = rsp.getNextCursorMark();
doCustomProcessingOfResults(rsp);
if (cursorMark.equals(nextCursorMark)) {
done = true;
}
cursorMark = nextCursorMark;
}
Please note : you should not exceed 200 documents in setRows otherwise the memory leak still happens (e.g. for 500 it does happen).
Now the profiler gives much better results regarding surviving generations as they do not increase over time anymore.
However the method is much slower.

Related

Store large DefaultFeatureCollection

I have to update org.geotools.feature.DefaultFeatureCollection at every 1 seconds till the app is running (more than an hour).
I have created DefaultFeatureCollection lineCollection = new DefaultFeatureCollection(); as a class member. Adding points to it at every 1 second lineCollection.add(feature);
public void addLines(Coordinate[] coords) {
try {
line = geometryFactory.createLineString(coords);
featureBuilder.add(line);
feature = featureBuilder.buildFeature(null);
lineCollection.add(feature);
}
catch(Exception e) {
e.printStackTrace();
}
}
However, the collection gets huge and heap memory increases gradually, resulting in high CPU usage and app lagging.
Is there a way to free memory once line is displayed on map ?

You've tagged your question [memory-leaks] - is there any evidence you are leaking memory? You can use jmap to check. If so the developers would love to hear about it with the evidence.
It seems more likely that you are just drawing 3600 lines (60*60) after an hour with out having any indexing. If your dataset was constant I'd recommend a SpatialIndexFeatureCollection but as yours changes I would suggest you use a GeoPackage or other database based store (if you already have one) that will manage a spatial index for your lines.

Java , add half a million objects to ArrayList from sql Query

I have a query with a resultset of half a million records, with each record I'm creating an object and trying to add it into an ArrayList.
How can I optimize this operation to avoid memory issues as I'm getting out of heap space error.
This is a fragment o code :
while (rs.next()) {
lista.add(sd.loadSabanaDatos_ResumenLlamadaIntervalo(rs));
}
public SabanaDatos loadSabanaDatos_ResumenLlamadaIntervalo(ResultSet rs)
{
SabanaDatos sabanaDatos = new SabanaDatos();
try {
sabanaDatos.setId(rs.getInt("id"));
sabanaDatos.setHora(rs.getString("hora"));
sabanaDatos.setDuracion(rs.getInt("duracion"));
sabanaDatos.setNavegautenticado(rs.getInt("navegautenticado"));
sabanaDatos.setIndicadorasesor(rs.getInt("indicadorasesor"));
sabanaDatos.setLlamadaexitosa(rs.getInt("llamadaexitosa"));
sabanaDatos.setLlamadanoexitosa(rs.getInt("llamadanoexitosa"));
sabanaDatos.setTipocliente(rs.getString("tipocliente"));
} catch (SQLException e) {
logger.info("dip.sabana.SabanaDatos SQLException : "+ e);
e.printStackTrace();
}
return sabanaDatos;
}
NOTE: The reason of using list is that this is a critic system, and I just can make a call every 2 hours to the bd. I don't have permission to do more calls to the bd in short times, but I need to show data every 10 minutes. Example : first query 10 rows, I show 1 rows each minute after the sql query.
I dont't have permission to create local database, write file or other ... Just acces to memory.

First Of All - It is not a good practice to read half million objects
You can think of breaking down the number of records to be read into small chunks
As a solution to this you can think of following options
1 - use of CachedRowSetImpl - it is same resultSet , it is a bad practice to keep resultSet open (as it is a Database connection property) If you use ArrayList - then you are again performing operations and utilizing the memory
For more info on cachedRowSet you can go to
https://docs.oracle.com/javase/tutorial/jdbc/basics/cachedrowset.html
2 - you can think of using an In-Memory Database, such as HSQLDB or H2. They are very lightweight and fast, provide the JDBC interface you can run the SQL queries as well
For HSQLDB implementation you can check
https://www.tutorialspoint.com/hsqldb/

It might help to have Strings interned, have for two occurrences of the same string just one single object.
public class StringCache {
private Map<String, String> identityMap = new Map<>();
public String cached(String s) {
if (s == null) {
return null;
}
String t = identityMap.get(s);
if (t == null) {
t = s;
identityMap.put(t, t);
}
return t;
}
}
StringCache horaMap = new StringCache();
StringCache tipoclienteMap = new StringCache();
sabanaDatos.setHora(horaMap.cached(rs.getString("hora")));
sabanaDatos.setTipocliente(tipoclienteMap .cached(rs.getString("tipocliente")));
Increasing memory is already said.
A speed-up is possible by using column numbers; if needed gotten from the column name once before the loop (rs.getMetaData()).

Option1:
If you need all the items in the list at the same time you need to increase the heap space of the JVM, adding the argument -Xmx2G for example when you launch the app (java -Xmx2G -jar yourApp.jar).
Option2:
Divide the sql in more than one call

Some of your options:
Use a local database, such as SQLite. That's a very lightweight database management system which is easy to install – you don't need any special privileges to do so – its data is held in a single file in a directory of your choice (such as the directory that holds your Java application) and can be used as an alternative to a large Java data structure such as a List.
If you really must use an ArrayList, make sure you take up as little space as possible. Try the following:
a. If you know the approximate number of rows, then construct your ArrayList with an appropriate initialCapacity to avoid reallocations. Estimate the maximum number of rows your database will grow to, and add another few hundred to your initialCapacity just in case.
b. Make sure your SabanaDatos objects are as small as they can be. For example, make sure the id field is an int and not an Integer. If the hora field is just a time of day, it can be more efficiently held in a short than a String. Similarly for other fields, e.g. duracion - perhaps it can even fit into a byte, if its range allows it to? If you have several flag/Boolean fields, they can be packed into a single byte or short as bits. If you have String fields that have a lot of repetitions, you can intern them as per Joop's suggestion.
c. If you still get out-of-memory errors, increase your heap space using the JVM flags -Xms and -Xmx.

HashMap<Long,Long> needs more memory

I wrote this code:
public static void main(String[] args) {
// TODO Auto-generated method stub
HashMap<Long,Long> mappp = new HashMap<Long, Long>(); Long a = (long)55; Long c = (long)12;
for(int u = 1;u<=1303564/2 + 1303564/3;u++){
mappp.put(a, c);
a = a+1;
c = c+1;
}
System.out.println(" " + mappp.size());
}
And it does not finish, beacause the progrm stops with the message in the console:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
I calculated how memory I need to have such HashMAp in memory and in my opinion my computer memory is enough. I have 1024 RAM on my computer.
I use eclipse. Also have set the parameters :
i am starting eclipse from the command line with this:'eclipse -vmargs -Xms512m -Xmx730m'
And second from Run Configurations i have set the tab Arguments with this:'-Xmx730m'
And this still gives java.lang.OutOfMemoryError.
What is the reason for this?
ps. Just to add some strange fact - in the bottom right corner of eclipse is shown the heap memory usage, and it is written 130M of 495M.
Well, when the HashMap mappp increases in size, doesn't this info '130M of 495M' have to change,for example '357M of 495M', 1 second later to be '412M of 495M' and so on in order to reach this 495M?In my case this 130M stays the same, just a little changes, from 130M to 131M or to 132M.
Strange

Java does not allows map of primitive data types. So if you are using Hashmap you will have to pay for boxing/unboxing and overhead of object references.
To avoid the overhead you can write your custom hashmap or use existing implementation from one of those libs.
boxing and unboxing in java

You should not put millions of items in a map. A Long is an object containing an 8 byte long field, plus some object overhead. Then you use two instances per map entry.
Since the key is numeric, you could (if the maximum key value is low enough) use an array as the 'map'.
long[] mappp = new long[4000000]; // takes 4M * 8 = 32M memory
If you need to know whether a value is 'not in the map', use 0 for that value. If 0 needs to be in your map, you can do some tricks like increasing all values by 1 (if the values are always positive).

Hibernate - java.lang.OutOfMemoryError: Java heap space

I get this exception:
Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuilder.append(StringBuilder.java:119)
at java.util.AbstractMap.toString(AbstractMap.java:493)
at org.hibernate.pretty.Printer.toString(Printer.java:59)
at org.hibernate.pretty.Printer.toString(Printer.java:90)
at org.hibernate.event.def.AbstractFlushingEventListener.flushEverythingToExecutions(AbstractFlushingEventListener.java:97)
at org.hibernate.event.def.DefaultAutoFlushEventListener.onAutoFlush(DefaultAutoFlushEventListener.java:35)
at org.hibernate.impl.SessionImpl.autoFlushIfRequired(SessionImpl.java:969)
at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1114)
at org.hibernate.impl.QueryImpl.list(QueryImpl.java:79)
At this code:
Query query = null;
Transaction tx= session.beginTransaction();
if (allRadio.isSelected()) {
query = session.createQuery("select d from Document as d, d.msg as m, m.senderreceivers as s where m.isDraft=0 and d.isMain=1 and s.organization.shortName like '" + search + "' and s.role=0");
} else if (periodRadio.isSelected()) {
query = session.createQuery("select d from Document as d, d.msg as m, m.senderreceivers as s where m.isDraft=0 and d.isMain=1 and s.organization.shortName like '" + search + "' and s.role=0 and m.receivingDate between :start and :end");
query.setParameter("start", start);
query.setParameter("end", end);
}
final List<Document> documents = query.list();
query = session.createQuery("select o from Organization as o");
List<Organization> organizations = query.list(); <---AT THIS LINE
tx.commit();
Im making 2 consecutive queries. If i comment out 1 of them the other works fine.
if i remove transaction thingy exception dissappears. What's going on? Is this a memory leak or something? Thanks in advance.

A tip I picked up from many years of pain with this sort of thing: the answer is usually carefully hidden somewhere in first 10 lines of the stack trace. Always read the stack trace several times, and if that doesn't give enough help, read the source code of the methods where the failure happens.
In this case the problem comes from somewhere in Hibernate's pretty printer. This is a logging feature, so the problem is that Hibernate is trying to log some enormous string. Notice how it fails while trying to increase the size of a StringBuilder.
Why is it trying to log an enormous string? I can't say from the information you've given, but I'm guessing you have something very big in your Organization entity (maybe a BLOB?) and Hibernate is trying to log the objects that the query has pulled out of the database. It may also be a mistake in the mapping, whereby eager fetching pulls in many dependent objects - e.g. a child collection that loads the entire table due to a wrong foreign-key definition.
If it's a mistake in the mapping, fixing the mapping will solve the problem. Otherwise, your best bet is probably to turn off this particular logging feature. There's an existing question on a very similar problem, with some useful suggestions in the answers.

While such an error might be an indicator for a memory leak, it could also just result from high memory usage in your program.
You could try to amend it by adding the following parameter to your command line (which will increase the maximum heap size; adapt the 512m according to your needs):
java -Xmx512m yourprog
If it goes away that way, your program probably just needed more than the default size (which depends on the platform); if it comes again (probably a little later in time), you have a memory leak somewhere.

You need to increase the JVM heap size. Start it with -Xmx256m command-line param.

OutOfMemoryErrors even after using WeakReference's for keys and values

Below is a small test I've coded to educate myself on references API. I thought this would never throw OOME but it is throwing it. I am unable to figure out why. appreciate any help on this.
public static void main(String[] args)
{
Map<WeakReference<Long>, WeakReference<Double>> weak = new HashMap<WeakReference<Long>, WeakReference<Double>>(500000, 1.0f);
ReferenceQueue<Long> keyRefQ = new ReferenceQueue<Long>();
ReferenceQueue<Double> valueRefQ = new ReferenceQueue<Double>();
int totalClearedKeys = 0;
int totalClearedValues = 0;
for (long putCount = 0; putCount <= Long.MAX_VALUE; putCount += 100000)
{
weak(weak, keyRefQ, valueRefQ, 100000);
totalClearedKeys += poll(keyRefQ);
totalClearedValues += poll(valueRefQ);
System.out.println("Total PUTs so far = " + putCount);
System.out.println("Total KEYs CLEARED so far = " + totalClearedKeys);
System.out.println("Total VALUESs CLEARED so far = " + totalClearedValues);
}
}
public static void weak(Map<WeakReference<Long>, WeakReference<Double>> m, ReferenceQueue<Long> keyRefQ,
ReferenceQueue<Double> valueRefQ, long limit)
{
for (long i = 1; i <= limit; i++)
{
m.put(new WeakReference<Long>(new Long(i), keyRefQ), new WeakReference<Double>(new Double(i), valueRefQ));
long heapFreeSize = Runtime.getRuntime().freeMemory();
if (i % 100000 == 0)
{
System.out.println(i);
System.out.println(heapFreeSize / 131072 + "MB");
System.out.println();
}
}
}
private static int poll(ReferenceQueue<?> keyRefQ)
{
Reference<?> poll = keyRefQ.poll();
int i = 0;
while (poll != null)
{
//
poll.clear();
poll = keyRefQ.poll();
i++;
}
return i;
}
}
And below is the log when ran with 64MB of heap
Total PUTs so far = 0
Total KEYs CLEARED so far = 77982
Total VALUESs CLEARED so far = 77980
100000
24MB
Total PUTs so far = 100000
Total KEYs CLEARED so far = 134616
Total VALUESs CLEARED so far = 134614
100000
53MB
Total PUTs so far = 200000
Total KEYs CLEARED so far = 221489
Total VALUESs CLEARED so far = 221488
100000
157MB
Total PUTs so far = 300000
Total KEYs CLEARED so far = 366966
Total VALUESs CLEARED so far = 366966
100000
77MB
Total PUTs so far = 400000
Total KEYs CLEARED so far = 366968
Total VALUESs CLEARED so far = 366967
100000
129MB
Total PUTs so far = 500000
Total KEYs CLEARED so far = 533883
Total VALUESs CLEARED so far = 533881
100000
50MB
Total PUTs so far = 600000
Total KEYs CLEARED so far = 533886
Total VALUESs CLEARED so far = 533883
100000
6MB
Total PUTs so far = 700000
Total KEYs CLEARED so far = 775763
Total VALUESs CLEARED so far = 775762
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at Referencestest.weak(Referencestest.java:38)
at Referencestest.main(Referencestest.java:21)

from http://weblogs.java.net/blog/2006/05/04/understanding-weak-references
I think your use of HashMap is likely to be the issue. You might want to use WeakHashMap
To solve the "widget serial number"
problem above, the easiest thing to do
is use the built-in WeakHashMap class.
WeakHashMap works exactly like
HashMap, except that the keys (not the
values!) are referred to using weak
references. If a WeakHashMap key
becomes garbage, its entry is removed
automatically. This avoids the
pitfalls I described and requires no
changes other than the switch from
HashMap to a WeakHashMap. If you're
following the standard convention of
referring to your maps via the Map
interface, no other code needs to even
be aware of the change.

The heart of the problem is probably that you're filling your heap with WeakReference-objects, the weak references are cleared when you're getting low on memory, but the reference objects themselves are not, so your hashmap is filling up with boat-load if WeakReference objects (not to mention the object array the hashmap uses, which will grow indefinitely), all pointing to null.
The solution, as already pointed out, is a weak hashmap, which will clear out those objects if they're no longer in use (this is done during put).
EDIT:
As Kevin pointed out, you already have your reference-queue logic worked out (I didn't pay close enough attention), a solution using your code is to just clear it out of the map at the point where the key has been collected. This is exactly how weak hash map works (where the poll is simply triggered on insert).

Even when your weak references let go of the things they are referencing, they still do not get recycled themselves.
So eventually your hash will fill up with references to nothing and crash.
What you would need (if you wanted to do it this way) would be to have an event triggered by object deletion that went in and removed the reference from the hash. (which would cause threading issues you need to be aware of as well)

I'm not a java expert at all, but I know in .NET when doing a lot of large object memory allocation you can get heap fragmentation to the point where only small pieces of contiguous memory are available for allocation even though much more memory appears as "free".
A quick google search on "java heap fragmentation" brings up some seemingly relevant result although I haven't taken a good look at them.

Other's have correctly pointed out what the problem is; e.g. #roe, #Bill K.
But another way to solve this kind problem (apart from scratching your head, asking on SO, etc), is to look and see how the Sun recommended approach works. In this case, you can find it in the source code for the WeakHashMap class.
There are a few ways to find Java source code:
If you have a decent Java IDE to running, it should be able to show you the source code of any class in the class library.
Most J2SE JDK downloads include source JAR files for (at least) the public API classes.
You can specifically download full source distributions for the OpenJDK-based releases of Java.
But the ZERO EFFORT approach is to do a Google search, using the fully qualified name of the class with ".java.html" tacked on the end. For example, searching for "java.util.WeakHashMap.java.html" gives this link in the first page of search results.
And the source will tell you that the standard WeakHashMap implementation explicitly polls its reference queue to expunge stale (i.e. broken) weak references from the map's key set. In fact, it does this every time you access or update the map, or even just ask for its size.

An other problem might be that Java for some reason don't always activate its garbadge collecter when running out of memmory, so you might need to insert explicit calls to activate the collector. Try something like
if( (putCount%1000)===0)
Runtime.getRuntime().gc();
in your loop.
Edit: It seems that the new java implementations from sun now does call the garbadge collector before throwing OutOfMemmoryException, but I am pretty sure that the following program would throw OutOfMemmoryException with jre1.3 or 1.4
public class Test {
public static void main(String args[]) {
while(true) {
byte []data=new byte[1000000];
}
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Surviving generations keep increasing while running Solr query - java

Related

Store large DefaultFeatureCollection

Java , add half a million objects to ArrayList from sql Query

HashMap<Long,Long> needs more memory

Hibernate - java.lang.OutOfMemoryError: Java heap space

OutOfMemoryErrors even after using WeakReference's for keys and values

Categories

Resources