Hibernate - java.lang.OutOfMemoryError: Java heap space - java

I get this exception:
Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
at java.lang.StringBuilder.append(StringBuilder.java:119)
at java.util.AbstractMap.toString(AbstractMap.java:493)
at org.hibernate.pretty.Printer.toString(Printer.java:59)
at org.hibernate.pretty.Printer.toString(Printer.java:90)
at org.hibernate.event.def.AbstractFlushingEventListener.flushEverythingToExecutions(AbstractFlushingEventListener.java:97)
at org.hibernate.event.def.DefaultAutoFlushEventListener.onAutoFlush(DefaultAutoFlushEventListener.java:35)
at org.hibernate.impl.SessionImpl.autoFlushIfRequired(SessionImpl.java:969)
at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1114)
at org.hibernate.impl.QueryImpl.list(QueryImpl.java:79)
At this code:
Query query = null;
Transaction tx= session.beginTransaction();
if (allRadio.isSelected()) {
query = session.createQuery("select d from Document as d, d.msg as m, m.senderreceivers as s where m.isDraft=0 and d.isMain=1 and s.organization.shortName like '" + search + "' and s.role=0");
} else if (periodRadio.isSelected()) {
query = session.createQuery("select d from Document as d, d.msg as m, m.senderreceivers as s where m.isDraft=0 and d.isMain=1 and s.organization.shortName like '" + search + "' and s.role=0 and m.receivingDate between :start and :end");
query.setParameter("start", start);
query.setParameter("end", end);
}
final List<Document> documents = query.list();
query = session.createQuery("select o from Organization as o");
List<Organization> organizations = query.list(); <---AT THIS LINE
tx.commit();
Im making 2 consecutive queries. If i comment out 1 of them the other works fine.
if i remove transaction thingy exception dissappears. What's going on? Is this a memory leak or something? Thanks in advance.

A tip I picked up from many years of pain with this sort of thing: the answer is usually carefully hidden somewhere in first 10 lines of the stack trace. Always read the stack trace several times, and if that doesn't give enough help, read the source code of the methods where the failure happens.
In this case the problem comes from somewhere in Hibernate's pretty printer. This is a logging feature, so the problem is that Hibernate is trying to log some enormous string. Notice how it fails while trying to increase the size of a StringBuilder.
Why is it trying to log an enormous string? I can't say from the information you've given, but I'm guessing you have something very big in your Organization entity (maybe a BLOB?) and Hibernate is trying to log the objects that the query has pulled out of the database. It may also be a mistake in the mapping, whereby eager fetching pulls in many dependent objects - e.g. a child collection that loads the entire table due to a wrong foreign-key definition.
If it's a mistake in the mapping, fixing the mapping will solve the problem. Otherwise, your best bet is probably to turn off this particular logging feature. There's an existing question on a very similar problem, with some useful suggestions in the answers.

While such an error might be an indicator for a memory leak, it could also just result from high memory usage in your program.
You could try to amend it by adding the following parameter to your command line (which will increase the maximum heap size; adapt the 512m according to your needs):
java -Xmx512m yourprog
If it goes away that way, your program probably just needed more than the default size (which depends on the platform); if it comes again (probably a little later in time), you have a memory leak somewhere.

You need to increase the JVM heap size. Start it with -Xmx256m command-line param.

Related

Surviving generations keep increasing while running Solr query

I am testing a query with jSolr (7.4) because I believe it is causing a memory leak in my program. But I am not sure it is indeed a memory leak, so I call for advices!
This method is called several times during the running time of my indexing program (should be able to run weeks / months without any problems). That's why I am testing it in a loop that I profile with Netbeans Profiler.
If I simply retrieve the id from all documents (there are 33k) in a given index :
public class MyIndex {
// This is used as a cache variable to avoid querying the index everytime the list of documents is needed
private List<MyDocument> listOfMyDocumentsAlreadyIndexed = null;
public final List<MyDocument> getListOfMyDocumentsAlreadyIndexed() throws SolrServerException, HttpSolrClient.RemoteSolrException, IOException {
SolrQuery query = new SolrQuery("*:*");
query.addField("id");
query.setRows(Integer.MAX_VALUE); // we want ALL documents in the index not only the first ones
SolrDocumentList results = this.getSolrClient().
query(query).getResults();
/**
* The following was commented for the test,
* so that it can be told where the leak comes from.
*
*/
// listOfMyDocumentsAlreadyIndexed = results.parallelStream()
// .map((doc) -> { // different stuff ...
// return myDocument;
// })
// .collect(Collectors.toList());
return listOfMyDocumentsAlreadyIndexed;
/** The number of surviving generations
* keeps increasing whereas if null is
* returned then the number of surviving
* generations is not increasing anymore
*/
}
I get this from the profiler (after nearly 200 runs that could simulate a year of runtime for my program) :
The object that is most surviving is String :
Is the growing number of surviving generations the expected behaviour while querying for all documents in the index ?
If so is it the root cause of the "OOM Java heap space" error that I get after some time on the production server as it seems to be from the stacktrace :
Exception in thread "Timer-0" java.lang.OutOfMemoryError: Java heap space
at org.noggit.CharArr.resize(CharArr.java:110)
at org.noggit.CharArr.reserve(CharArr.java:116)
at org.apache.solr.common.util.ByteUtils.UTF8toUTF16(ByteUtils.java:68)
at org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:868)
at org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:857)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:266)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readSolrDocument(JavaBinCodec.java:541)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:305)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:747)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:272)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readSolrDocumentList(JavaBinCodec.java:555)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:307)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:200)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:274)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:178)
at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:50)
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:614)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:957)
Would increasing the heap space ("-Xmx") from 8GB to anything greater solve the problem definitely or would it just postpone it ? What can be done to workaround this ?
Edit some hours later
If null is returned from the method under test (getListOfMyDocumentsAlreadyIndexed) then the number of surviving generations remains stable throughout the test :
So even though I was NOT using the result of the query for this test (because I wanted to focuse only on where the leak happened) it looks like returning an instance variable (even though it was null) is not a good idea. I will try to remove it.
Edit even later
I noticed that the surviving generations were still increasing in the telemetry tab when I was profiling "defined classes" ("focused (instrumented)") whereas it was stable when profiling "All classes" ("General (sampled)"). So I am not sure it solved the problem :
Any hint greatly appreciated :-)
The problem stems from the following line :
query.setRows(Integer.MAX_VALUE);
This should not be done according to this article :
The rows parameter for Solr can be used to return more than the default of 10 rows. I have seen users successfully set the rows parameter to 100-200 and not see any issues. However, setting the rows parameter higher has a big memory consequence and should be avoided at all costs.
So problem have been solved by retrieving the documents by chunks of 200 docs following this solr article on pagination :
SolrQuery q = (new SolrQuery(some_query)).setRows(r).setSort(SortClause.asc("id"));
String cursorMark = CursorMarkParams.CURSOR_MARK_START;
boolean done = false;
while (! done) {
q.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
QueryResponse rsp = solrServer.query(q);
String nextCursorMark = rsp.getNextCursorMark();
doCustomProcessingOfResults(rsp);
if (cursorMark.equals(nextCursorMark)) {
done = true;
}
cursorMark = nextCursorMark;
}
Please note : you should not exceed 200 documents in setRows otherwise the memory leak still happens (e.g. for 500 it does happen).
Now the profiler gives much better results regarding surviving generations as they do not increase over time anymore.
However the method is much slower.

Java , add half a million objects to ArrayList from sql Query

I have a query with a resultset of half a million records, with each record I'm creating an object and trying to add it into an ArrayList.
How can I optimize this operation to avoid memory issues as I'm getting out of heap space error.
This is a fragment o code :
while (rs.next()) {
lista.add(sd.loadSabanaDatos_ResumenLlamadaIntervalo(rs));
}
public SabanaDatos loadSabanaDatos_ResumenLlamadaIntervalo(ResultSet rs)
{
SabanaDatos sabanaDatos = new SabanaDatos();
try {
sabanaDatos.setId(rs.getInt("id"));
sabanaDatos.setHora(rs.getString("hora"));
sabanaDatos.setDuracion(rs.getInt("duracion"));
sabanaDatos.setNavegautenticado(rs.getInt("navegautenticado"));
sabanaDatos.setIndicadorasesor(rs.getInt("indicadorasesor"));
sabanaDatos.setLlamadaexitosa(rs.getInt("llamadaexitosa"));
sabanaDatos.setLlamadanoexitosa(rs.getInt("llamadanoexitosa"));
sabanaDatos.setTipocliente(rs.getString("tipocliente"));
} catch (SQLException e) {
logger.info("dip.sabana.SabanaDatos SQLException : "+ e);
e.printStackTrace();
}
return sabanaDatos;
}
NOTE: The reason of using list is that this is a critic system, and I just can make a call every 2 hours to the bd. I don't have permission to do more calls to the bd in short times, but I need to show data every 10 minutes. Example : first query 10 rows, I show 1 rows each minute after the sql query.
I dont't have permission to create local database, write file or other ... Just acces to memory.
First Of All - It is not a good practice to read half million objects
You can think of breaking down the number of records to be read into small chunks
As a solution to this you can think of following options
1 - use of CachedRowSetImpl - it is same resultSet , it is a bad practice to keep resultSet open (as it is a Database connection property) If you use ArrayList - then you are again performing operations and utilizing the memory
For more info on cachedRowSet you can go to
https://docs.oracle.com/javase/tutorial/jdbc/basics/cachedrowset.html
2 - you can think of using an In-Memory Database, such as HSQLDB or H2. They are very lightweight and fast, provide the JDBC interface you can run the SQL queries as well
For HSQLDB implementation you can check
https://www.tutorialspoint.com/hsqldb/
It might help to have Strings interned, have for two occurrences of the same string just one single object.
public class StringCache {
private Map<String, String> identityMap = new Map<>();
public String cached(String s) {
if (s == null) {
return null;
}
String t = identityMap.get(s);
if (t == null) {
t = s;
identityMap.put(t, t);
}
return t;
}
}
StringCache horaMap = new StringCache();
StringCache tipoclienteMap = new StringCache();
sabanaDatos.setHora(horaMap.cached(rs.getString("hora")));
sabanaDatos.setTipocliente(tipoclienteMap .cached(rs.getString("tipocliente")));
Increasing memory is already said.
A speed-up is possible by using column numbers; if needed gotten from the column name once before the loop (rs.getMetaData()).
Option1:
If you need all the items in the list at the same time you need to increase the heap space of the JVM, adding the argument -Xmx2G for example when you launch the app (java -Xmx2G -jar yourApp.jar).
Option2:
Divide the sql in more than one call
Some of your options:
Use a local database, such as SQLite. That's a very lightweight database management system which is easy to install – you don't need any special privileges to do so – its data is held in a single file in a directory of your choice (such as the directory that holds your Java application) and can be used as an alternative to a large Java data structure such as a List.
If you really must use an ArrayList, make sure you take up as little space as possible. Try the following:
a. If you know the approximate number of rows, then construct your ArrayList with an appropriate initialCapacity to avoid reallocations. Estimate the maximum number of rows your database will grow to, and add another few hundred to your initialCapacity just in case.
b. Make sure your SabanaDatos objects are as small as they can be. For example, make sure the id field is an int and not an Integer. If the hora field is just a time of day, it can be more efficiently held in a short than a String. Similarly for other fields, e.g. duracion - perhaps it can even fit into a byte, if its range allows it to? If you have several flag/Boolean fields, they can be packed into a single byte or short as bits. If you have String fields that have a lot of repetitions, you can intern them as per Joop's suggestion.
c. If you still get out-of-memory errors, increase your heap space using the JVM flags -Xms and -Xmx.

Load Neo4J in memory on demand for heavy computations

How could I load Neo4J into memory on demand?
On different stages of my long running jobs I'm persisting nodes and relationships to Neo4J. So Neo4J should be on disk, since it may consume too much memory and I don't know when I gonna run read queries against it.
But at some point (only once) I will want to run pretty heavy read query against my Neo4J server, and it have very poor performance (hours). As a solution I want to load all Neo4J to RAM for better performance.
What is the best option for it? Should I use run disk or there are any better solutions?
P.S.
Query with [r:LINK_REL_1*2] works pretty fast, [r:LINK_REL_1*3] works 17 seconds, [r:LINK_REL_1*4] works more than 5 minutes, even do not know how much, since I have 5 minutes timeout. But I need [r:LINK_REL_1*2..4] query to perform in reasonable time.
My heavy query explanation
PROFILE
MATCH path = (start:COLUMN)-[r:LINK_REL_1*2]->(col:COLUMN)
WHERE start.ENTITY_ID = '385'
WITH path UNWIND NODES(path) AS col
WITH path,
COLLECT(DISTINCT col.DATABASE_ID) as distinctDBs
WHERE LENGTH(path) + 1 = SIZE(distinctDBs)
RETURN path
Updated query with explanation (got the same performance in tests)
PROFILE
MATCH (start:COLUMN)
WHERE start.ENTITY_ID = '385'
MATCH path = (start)-[r:LINK_REL_1*2]->(col:COLUMN)
WITH path, REDUCE(dbs = [], col IN NODES(path) |
CASE WHEN col.DATABASE_ID in dbs
THEN dbs
ELSE dbs + col.DATABASE_ID END) as distinctDbs
WHERE LENGTH(path) + 1 = SIZE(distinctDbs)
RETURN path
APOC procedures has apoc.warmup.run(), which may get much of Neo4j into cached memory. See if that will make a difference.
It looks like you're trying to create a query in which the path contains only :Persons from distinct countries. Is this right?
If so, I think we can find a better query that can do this without hanging.
First, let's go for low-hanging fruit and see if avoiding the UNWIND can make a difference.
PROFILE or EXPLAIN the query and see if any numbers look significantly different compared to the original query.
MATCH (start:PERSON)
WHERE start.ID = '385'
MATCH path = (start)-[r:FRIENDSHIP_REL*2..5]->(person:PERSON)
WITH path, REDUCE(countries = [], person IN NODES(path) |
CASE WHEN person.country in countries
THEN countries
ELSE countries + person.COUNTRY_ID END) as distinctCountries
WHERE LENGTH(path) + 1 = SIZE(distinctCountries)
RETURN path

"Exceeded maximum allocated IDs"-Exception when allocating KeyRange (AppEngine Objectify)

I am migrating some entities from an old appengine application to a new one (one reason beside others is to upgrade to objectify 5).
There are also some entities that have a automatically generated long id. Now I have to re-allocate the ids (see also the javadoc and this discussion) on the new datastore.
This are the important lines:
Long id = anEntity.getId();
com.google.appengine.api.datastore.KeyRange keyRange = new com.google.appengine.api.datastore.KeyRange(null, AnEntity.class.getName(), id-1l, id+1l);
KeyRange<AnEntity> keys = new KeyRange<>(keyRange);
OfyService.ofy().factory().allocateIdRange(keys);
However, this does not work as the exception below is thrown:
java.lang.IllegalArgumentException: Exceeded maximum allocated IDs
at com.google.appengine.api.datastore.DatastoreApiHelper.translateError(DatastoreApiHelper.java:54)
at com.google.appengine.api.datastore.DatastoreApiHelper$1.convertException(DatastoreApiHelper.java:127)
at com.google.appengine.api.utils.FutureWrapper.get(FutureWrapper.java:96)
at com.google.appengine.api.utils.FutureWrapper.get(FutureWrapper.java:88)
at com.google.appengine.api.datastore.FutureHelper.getInternal(FutureHelper.java:76)
at com.google.appengine.api.datastore.FutureHelper.quietGet(FutureHelper.java:36)
at com.google.appengine.api.datastore.DatastoreServiceImpl.allocateIdRange(DatastoreServiceImpl.java:121)
at com.googlecode.objectify.ObjectifyFactory.allocateIdRange(ObjectifyFactory.java:335)
I tested it further an found that the line
KeyRange(null, AnEntity.class.getName(), 1, 1000000000l);
would work. However, my already generated ids are in the range of 4503908059709440 to 6754883239673856 which is thus too high, obviously.
Did I made some mistake or does the allocateIdRange-method not support such big ids (As hinted here a long ago)?
If the latter, how could I go around this problem?
(I thought of generating new ids in a certain range, but here they say that the legacy-way will soon be removed... Besides I don't like the idea of generating it by myself.)
System:
- Java 7
- AppengineSDK 1.9.21
- objectify version: 4.1.3 (as said, I am migrating)
Thanks for any help.
I've changed the code so that it assigns a String and generate This String with UUID.
This solves the problem but still it is a strange behavior of allocateIdRange...
See also this issue.

JQPL createQuery vs Entity object loop

I am working on some inherited code and I am not use to the entity frame work. I'm trying to figure out why a previous programmer coded things the way they did, sometimes mixing and matching different ways of querying data.
Deal d = _em.find(Deal.class, dealid);
List<DealOptions> dos = d.getDealOptions();
for(DealOptions o : dos) {
if(o.price == "100") {
//found the 1 item i wanted
}
}
And then sometimes i see this:
Query q = _em.createQuery("select count(o.id) from DealOptions o where o.price = 100 and o.deal.dealid = :dealid");
//set parameters, get results then check result and do whatver
I understand what both pieces of code do, and I understand that given a large dataset, the second way is more efficient. However, given only a few records, is there any reason not to do a query vs just letting the entity do the join and looping over your recordset?
Some reasons never to use the first approach regardless of the number of records:
It is more verbose
The intention is less clear, since there is more clutter
The performance is worse, probably starting to degrade with the very first entities
The performance of the first approach will degrade much more with each added entity than with the second approach
It is unexpected - most experienced developers would not do it - so it needs more cognitive effort for other developers to understand. They would assume you were doing it for a compelling reason and would look for that reason without finding one.

Categories

Resources